Madhava Krishna, Professor and Head of the Robotics Research Centre and the Kohli Center for Intelligent Systems (KCIS) at IIIT Hyderabad, provides an overview of the institute’s self-driving car and its underlying technology.
The self-driving car developed by IIIT Hyderabad is an electric vehicle designed for autonomous point-to-point driving with collision avoidance capabilities. The vehicle is equipped with 3D LIDAR, depth cameras, GPS systems, and an Attitude and Heading Reference System (AHRS), which uses sensors to estimate spatial orientation. It can process open-set natural language commands to navigate to specified destinations.
The car utilises SLAM-based point cloud mapping for creating a map of the campus environment. LIDAR-guided real-time state estimation supports localisation while driving. Trajectory optimisation, based on Model Predictive Control, ensures the generation of optimal trajectories in real-time. These trajectories can be initialised using data-driven models to improve inference time. Research on autonomous driving at the institute has been featured in academic publications and conferences.
Human navigation often relies on contextual cues and verbal instructions using landmarks, such as “Turn right at the white building” or “Stop near the entrance.” Similarly, autonomous systems require precise localisation, typically achieved using high-resolution GPS or high-definition (HD) maps. However, these methods can be computationally intensive.
Open-source topological maps, such as OpenStreetMaps (OSM), are sometimes used as an alternative for geolocation. Although lightweight, these maps can have a positional inaccuracy of about 6–8 meters and may lack details of dynamic features like open parking spaces. IIIT Hyderabad is working on methods to enhance localisation by using real-world landmarks, aiming to replicate human-like navigation processes.
The institute employs foundational models that incorporate a general semantic understanding of the world for navigation tasks. By adding language-based landmarks to open-source topological maps, such as “a bench” or “a football field,” the system achieves flexibility in identifying untrained locations. This approach enables the system to generalise to unfamiliar environments. The Robotics Research Centre has combined established methodologies with newer techniques to address challenges in localisation and navigation, demonstrated through a prototype developed in-house.
Autonomous navigation systems involve mapping, localisation, and planning. While traditional systems use modular pipelines or end-to-end architectures, incorporating language processing is increasingly common for improving interpretability.
These systems can follow natural language navigation instructions, such as “Take a right turn and stop near the food stall,” with an emphasis on collision-free planning. Conventional systems often separate the objectives of prediction, perception, and planning, which can lead to inconsistencies and dependency on perception models.
To address this, IIIT Hyderabad employs end-to-end training that aligns predictions with planning objectives. A lightweight vision-language model has been developed to combine visual scene understanding with natural language commands. This model predicts goal locations based on the vehicle’s perspective view and encoded instructions.
Challenges can arise in ensuring predictions align with real-world constraints. For example, an instruction like “Park behind the red car” might suggest a non-drivable or overlapping area. To address this, a perception module is integrated with a custom planner designed within a neural network framework. The planner’s differentiable nature allows for gradient-based training, improving both prediction accuracy and planning outcomes. This approach aligns all system components for better performance in real-world conditions.