By James R. Doty, MD and Blair LaCorte
For over three decades, I’ve studied and performed surgery on the human brain. I have always been fascinated by the power, plasticity and adaptability of the brain and by how much of this amazing capacity is dedicated to processing and interpreting data we receive from our senses. With the rapid ascension of Artificial Intelligence (AI), I began to wonder how developers would integrate the complex, multi-layers of human perception to enhance AI’s capabilities. I have been especially interested in how this integration would be applied to robots and autonomous vehicles. It became clear the artificial intelligence that will be needed to drive these vehicles will require artificial perception that is modeled after the greatest perception engine on the planet — the human visual cortex. These vehicles will need to think like a robot, but perceive like a human.
To learn more and to better understand how this level of artificial perception will be created, I recently became advisor to AEye, a company developing cutting edge artificial perception and self-driving technologies, to help them use knowledge of the human brain to better inform their systems. This is known as biomimicry: the concept of learning from and then replicating natural strategies from living systems and beings (plants, animals, humans, etc.) to better adapt design and engineering. Essentially, biomimicry allows us to fit into our existing environment and evolve in the way life has successfully done for the past few billion years. But why is incorporating biomimicry and aspects of human perception integral to the development and success of autonomous vehicles?
Because nothing can take in more information and process it faster and more accurately than the human perception system. Humans classify complex objects at speeds up to 27 Hz, with the brain processing 580 megapixels of data in as little as 13 milliseconds. If we continue using conventional sensor data collection methods, we are more than 25 years away from having AI achieve the capabilities of the human brain in robots and autonomous vehicles. Therefore, to facilitate self-driving cars to safely move independently in crowded urban environments or at highway speeds, we must develop new approaches and technologies to meet or exceed the performance of the human brain. The next question is: how?
Orthogonal data matters
(Creating an advanced, multi-dimensional data type)
Orthogonal data refers to complimentary data sets which ultimately give you more quality information about an object or situation than each would alone, allowing us to make efficient judgements about what in our world is important, and what is not. Orthogonality concepts for high information quality are well understood and rooted in disciplines such as quantum physics where linear algebra is employed and orthogonal basis sets are the minimum pieces of information one needs to represent more complex states without redundancy. When it comes to perception of moving objects, two types of critical orthogonal data sets are often required — spatial and temporal. Spatial data specifies where an object exists in the world, while temporal is where an object exists in time. By integrating these data sets along with other complementary data sets such as color, temperature, sound, smell, etc. our brains generate a real-time model of the world around us, defining how we experience it.
The human brain takes in all kinds of orthogonal data naturally, decoupling and reassembling information instantaneously, without us even realizing it. For example, if you see that a baseball is flying through the air towards you, your brain is gathering all types of information about it, such as spatial (the direction of where the ball is headed) and temporal (how fast it’s moving). While this data is being processed by your visual cortex “in the background” all you’re ultimately aware of is the action you need to take, which might be to duck. The AI perception technology that is able to successfully adopt the manner by which the human brain captures and processes these types of data sets will dominate the market.
Existing robotic sensory data acquisition systems have focused only on single sensor modalities (camera, LiDAR, radar) and only with fixed scan patterns and intensity. Unlike humans, these systems have not learned nor have the ability to efficiently process and optimize 2D and 3D data in real-time while both the sensor and detected objects are in motion. Therefore, they cannot use real-time orthogonal data to learn, prioritize, and focus. To effectively replicate the multi-dimensional sensory processing power of the human visual cortex will require a new approach to thinking about how to capture and process sensory data.
AEye is pioneering one such approach. AEye calls its unique biomimetic system iDAR (Intelligent Detection and Ranging). AEye’s iDAR is an intelligent artificial perception system that physically fuses a unique, agile LiDAR with a hi-res camera to create a new data type they call Dynamic Vixels. These Dynamic Vixels are one of the ways in which AEye acquires orthogonal data. By capturing x, y, z, r, g, b data (along with SWIR intensity), these patented Dynamic Vixels are uniquely created to biomimic the data structure of the human visual cortex. Like the human visual cortex, the intelligence of the Dynamic Vixels is then integrated in the central perception engine and motion planning system which is the functional brain of the vehicle. They are dynamic because as they actively interrogate a scene and adjust to changing conditions, such as increasing the power level of the sensor to cut through rain, or revisiting suspect objects in the same frame to identify obstacles. Better data drives more actionable information.
Not all objects are created equal
(See everything, and focus on what is important)
Humans continuously analyze their environment, always scanning for new objects, then in parallel and as appropriate focus in on elements that are either interesting, engaging, or potentially pose a threat. We process at the visual cortex fast, with incredible accuracy, and with very little of the brain’s immense processing power. If a human brain functioned as autonomous vehicles do today, we would not have survived as a species.
In his book The Power of Fifty Bits, Bob Nease writes of the ten million bits of information the human brain processes each second, but how only fifty bits are devoted to conscious thought. This is due to multiple evolutionary factors, including our adaptation to ignore autonomic processes like our heart beating, or our visual cortex screening out less relevant information in our surroundings (like the sky) to survive. It is an intelligent system design.
This is the nature of our intelligent vision. So, while our eyes are always scanning and searching to identify new objects entering a scene, we focus our attention on objects that matter as they move into areas of concern, allowing us to track them over time. In short, we search a scene, consciously acquire the objects that matter, and track them as required.
As discussed, current autonomous vehicle sensor configurations utilize a combination of LiDAR, cameras, ultrasonics, and radar as their “senses” that are serial collection (one way) and are limited to fixed patterns of search. These “senses” collect as much data as possible, which is then aligned, processed, and analyzed long after the fact. This post-processing is slow and does not allow for situational changes to how sensory data is captured in real-time. Because these sensors don’t intelligently interrogate, up to 95% of the sensory data currently being collected is thrown out as it is either irrelevant or redundant at the time it is processed. This act of triage also itself comes with a latency penalty. At highway speeds, this latency results in a car moving more than 20 feet before the sensor data has been fully processed. Throwing away data you don’t need with the goal of being efficient is inefficient. A better approach exists.
The overwhelming task of sifting through this data — every tree, curb, parked vehicle, the sky, the road, leaves on trees, and other static objects — also requires immense power and data processing resources, which slows down the entire system significantly, and introduces risk. These systems’ goal is to focus on everything and then try to analyze each item in their environment, after the fact, at the expense of timely action. This is the exact opposite of how humans process spatial and temporal data in situations that we associate with driving.
AEye’s iDAR teaches autonomous vehicles to “search, acquire, and track” objects as we do. By defining new data and sensor types that more efficiently communicate actionable information while maintaining the intelligence to analyze this data as quickly and accurately as possible. AEye’s iDAR enables this through its unique foundational solid-state agile LiDAR. Unlike standard LiDAR, AEye’s agile LiDAR is situationally adaptive so that it can modify scan patterns and trade resources such as update rate, resolution, and max range detection among others. This enables iDAR to dynamically adjust as it optimally searches a scene, conserve power and apply that power to efficiently identify and acquire critical objects, and track these objects over time. iDAR’s unique ability to intelligently use power to search, acquire, and track scenes helps identify that the object is a child walking into the street or that it is a car entering the intersection and accelerating high speed. Doing this in real-time is the difference between a safe journey and an avoidable tragedy.
Humans Learn Intuitively
(Feedback loops enable intelligence)
As we have discussed, the human visual cortex can scan ay 27Hz (much faster than current sensors on autonomous vehicles, which average around 10Hz). The brain naturally gather..