Deconstructing Two Conventional LiDAR Metrics, Part 2

Deconstructing Two Conventional LiDAR Metrics, Part 2

Executive Summary

Conventional metrics for evaluating LiDAR systems designed for autonomous driving are problematic because they often fail to adequately or explicitly address real-world scenarios. Therefore, AEye, the developer of iDAR™ technology, proposes a number of new metrics to better assess the safety and performance of advanced automotive LiDAR sensors.

In Part 1 of this series, two metrics (frame rate and fixed [angular] resolution over a fixed Field-of-View) were discussed in relation to the more meaningful metrics of object revisit rate and instantaneous (angular) resolution. Now in Part 2, we’ll explore the metrics of detection range and velocity, and propose two new corresponding metrics for consideration: object classification range and time to true velocity.

Download “Deconstructing Two Conventional LiDAR Metrics, Part 2” [pdf]

Introduction

How is the effectiveness of an autonomous vehicle’s perception system measured? Performance metrics matter because they ultimately determine how designers and engineers approach problem-solving. Defining problems accurately makes them easier to solve, saving time, money, and resources.

When it comes to measuring how well automotive LiDAR systems perceive the space around them, manufacturers commonly agree that it’s valuable to determine their detection range. To optimize safety, the on-board computer system should detect obstacles as far ahead as possible. The speed with which they can do so theoretically determines whether control systems can plan and perform timely, evasive maneuvers. However, AEye believes that detection range is not the most important measurement in this scenario. Ultimately, it’s the control system’s ability to classify an object (here we refer to low level classification [e.g., blob plus dimensionality]) that enables it to decide on a basic course of action.

What matters most then, is how quickly an object can be identified and classified and how quickly a decision can be made about an object so an appropriate response can be calculated. In other words, it is not simply enough to quantify a distance at which a potential object can be detected at the sensor. One must also quantify the latency from the actual event to the sensor detection — plus the latency from the sensor detection to the CPU decision.

Similarly, the conventional metric of velocity has limitations. Today, some lab prototype frequency modulated continuous wave (FMCW) LiDAR systems can determine the radial velocity of nearby objects by interrogating them continuously for a period of time sufficient to observe a discernible change in position. However, this has two disadvantages: 1) the beam must remain locked on in fixed position for a certain period of time, and 2) only velocity in the radial direction can be discerned. Lateral velocity must be calculated with the standard update in position method. Exploration of these disadvantages will illustrate why, to achieve the highest degree of safety, time to true velocity is a much more useful metric. In other words, how long does it take a system to determine the velocity — in any direction — of a newly identified or appearing object?

Both object classification range and time to true velocity are more relevant metrics for assessing what a LiDAR system can and should achieve in tomorrow’s autonomous vehicles. In this white paper, we examine how these new metrics better measure and define the problems solved by more advanced LiDAR systems, such as AEye’s iDAR (Intelligent Detection and Ranging).

Conventional Metric #1: Detection Range

A single point detection — where the LiDAR registers one detect on a new object or person entering the scene — is indistinguishable from noise. Therefore, we will use a common industry definition for detection which involves persistence in adjacent shots per frame and/or across frames. For example, we might require 5 detects on an object per frame (5 points at the same range) and/or from frame-to-frame (1 single related point in 5 consecutive frames) to declare that a detection is a valid object.

It is a widely held belief that a detection range of 200+ meters at highway speeds is the required range for vehicles to effectively react to changing road conditions and surroundings. Conventional LiDAR sensors scan and collect data about the occupancy grid in a uniform pattern without discretion. This forms part of a constant stream of gigabytes of data sent to the vehicle’s on-board controller in order to detect objects. This design puts a massive strain on resources. Anywhere from 70 to 90+ percent of data is redundant or useless, which means it’s discarded.

Under these conditions, even a system that’s able to operate at a 10-30 Hz frame rate will struggle to deliver low latency while supporting high frame rates and high performance. And if latency for newly appearing objects is even 0.25 seconds, the frame rate hardly matters — by the time the data is made available to the central compute platform in some circumstances, it’s practically worthless. On the road, driving conditions can change dramatically in a tenth of a second. After 0.1 seconds, two cars closing in at a mutual speed of 200 km/hour are 18 feet closer. While predictive algorithms work well to counter this latency in structured, well-behaved environments, there are several examples where they don’t. One such example is the fast, “head-on” approaching small object. Here, a newly appearing object appears “head-on” with a single LiDAR point and it requires N consecutive single LiDAR point detects before it can be classified as an object. In this example, it’s easy to see that detection range and object classification range are two vastly different things.

With a variety of factors influencing the domain controller’s processing speed, measuring the efficacy of a system by its detection range is problematic. Without knowledge of latency or other pertinent factors, unwarranted trust is put on the controller’s ability to manage competing priorities. While it is generally assumed that LiDAR manufacturers are not supposed to know or care about how the domain controller classifies (or how long classification takes), we propose that ultimately, this leaves designers vulnerable to very dangerous situations.

Check!

AEye’s Metric
Object Classification Range

Currently, classification takes place somewhere in the domain controller. It’s at this point that objects are labeled as such and eventually, more clearly identified. At some level of identification, this data is used to predict known behavior patterns or trajectories. It is obviously extremely important and therefore, AEye argues that a better measurement for assessing an automotive LiDAR’s capability is its object classification range. This metric reduces the unknowns — such as latency associated with noise suppression (e.g., N of M detections) — early in the perception stack, pinpointing the salient information about whether a LiDAR system is capable of operating at optimal safety.

As a relatively new field, the definition of how much data is necessary for classification in automotive LiDAR has not yet been defined. Thus, AEye proposes that adopting perception standards used by video classification provides a valuable provisional definition. According to video standards, enabling classification begins with a 3×3 pixel grid of an object. Under this definition, an automotive LiDAR system might be assessed by how fast it’s able to generate a high quality, high-resolution 3×3 point cloud that enables the domain controller to comprehend objects and people in a scene.

Generating a 3×3 point cloud is a struggle for conventional LiDAR systems. While many tout an ability to manifest point clouds comprised of half a million or more points in one second, there is a lack of uniformity in these images. Point clouds created by most LiDAR systems display a fine degree of high-density horizontal lines coupled with very poor density vertical spacing, or in general, low overall density. Regardless, these fixed angular sampling patterns can be difficult for classification routines because the domain controller has to grapple with half a million points per second that are, in many cases, out of balance with the resolution required for the critical sampling of the object in question. Such an askew “mish-mash” of points means it needs to do additional interpretation, putting extra strain on CPU resources.

A much more efficient approach would be to gather about 10 percent of this data, focusing solely on Special Regions of Interest (e.g., moving vehicles and pedestrians) while keeping tabs on the background scene (trees, parked cars, buildings, etc.). Collecting only the salient data in the scene significantly speeds up classification. AEye’s agile iDAR is a LiDAR system integrated with AI that can intelligently accelerate shots only in a Region of Interest (ROI). This comes from its ability to selectively revisit points twice in 10’s of microseconds — an improvement of 3 orders of magnitude over conventional 64-line systems that can only hit an object once per frame (every 100 milliseconds). Future white papers will discuss various methods of using iDAR to ensure that we do not discount important background information by correctly employing the concepts of Search, Acquisition, and Tracking. This is similar to how humans perceive.

In summary, one can move low-level object detection to the sensor level by employing, as an example, a dense 3×3 voxel grid every time a significant detection occurs more or less in real-time. This happens before the data is sent to the central controller, allowing for higher instantaneous resolution than a fixed pattern system can offer and, ultimately, better object classification ranges when using video detection range analogies.

Real-World Applications: Imagine that an autonomous vehicle is driving on a desolate highway. Ahead, the road appears empty. Suddenly, the sensor perceives something at 200 meters. Now, imagine a conventional LiDAR system passively collecting this data with scans of every 0.1-second. To officially count as a detection, the system needs repeated detections. Let’s say that 5 out of 10 detections is considered sufficient to declare a target present, but 4 out of 10 is rejected as clutter/noise. Meaning specifically, in our context, that a second later, the system will have taken 10 looks, five of which must be detects. Therefore, a sequence of ‘yes, no, yes, yes, no, yes, no, no, no, yes’, leads to detection, taking about one second. It’s a constant, invariable rate because it’s a passive process. At this point, if the car was traveling at 100kph and mutual closing speed with the detected object was 200kph, then the object is now 55 meters closer. Following this, the data is sent to the next level of the perception stack. Thus, it could take one second and 55 meters to declare there is something there, but the system still has no idea what it is.

To be clear — here detection just means a point is reported in the point cloud. Basic Object Classification means, at a minimum, that we have determined that it is indeed a valid object, and we have located it sufficiently accurately to at least determine if it is within our anticipated motion path.

In Figure 1, we compare this scenario (“Scan 1”) to one in which an AEye sensor (“Scan 2”), detects the object ahead. At this point we have only one detect, so we don’t have the repeated detections that are needed to have a confident, reliable, actionable, and confirmatory target presence decision, as discussed above.

The AEye innovation is both simple and profound. Normally, one waits to obtain multiple detects by passively scanning and unassertively building up evidence over time. Instead, we propose to swiftly find out if the detection is valid, as both detection and basic object classification (potentially higher object classification) can be done before the completion of the next “frame.” Rather than passively scanning the object along with the rest of the surroundings, as per “Scan 1,” “Scan 2” proceeds as follows: the sensor flags the detect and can immediately schedule a follow up scan in the ROI quickly following the detection. Within microseconds to milliseconds, depending on the use case, it can then schedule 9 more interrogations to generate a dense 3×3 pixel grid around it. Generating these 9 looks quickly provides much better data for classification when using the “5 of 10 detect rule” as part of the ROI process. If this initial process leads to a positive detect confirmation, it can use the cluster of shots to get further information on the object, possibly applying perception algorithms for higher level classification.

Bottom line: if the sensor can collect data fast enough to build a degree of certainty that it’s the same object, it can achieve classification and detection almost simultaneously. In a millisecond, two vehicles mutually closing at 200kph move a few centimeters, so it is easy to ensure that the system can generate 9 extra shots with minimum range difference.

Velocity Fig1
Figure 1. Packing a dense 3×3 grid around a detect allows the collection of more useful data and greatly speeds up classification. We have left a single detect on a vehicle. Rather than wait for the next frame to resample this vehicle (as is the traditional mode in LiDAR) we instead quickly form a dedicated dense ROI, as indicated in “Scan 2” on the right. This is done almost immediately after the initial single detect, and before completing the next scan.

To review: “Scan 2” takes up to a few milliseconds to both detect and classify an object that is on the road ahead that is kinematically a safety hazard. Whereas, the conventional LiDAR of “Scan 1” takes 1 full second just to detect. Thus, this method can shave off 3 orders of magnitude in the timeline to classify.

Note that it is not required to allocate dedicated and persistent shots for every target. Once we have determined that a target is valid, and is potentially encroaching on our planned trajectory, we can allocate more shots to it as required. Alternatively, and more commonly, if we determine it is not a threat, we can simply maintain track on it with a few shots per scan.

To summarize, AEye’s agile system can enable classification in far less time than conventional LiDAR systems would require to merely register a detection. This brings to light the difference between detection range and basic object classification range.

The AEye Advantage: LiDAR sensors embedded with AI for perception are very different than those that passively collect data. AEye’s software architecture is designed to be agile enough that perception-engineering teams can use its dynamic object revisit capabilities to create LiDAR systems markedly faster than conventional systems. The AEye platform accelerates revisit rate by allowing for intelligent shot scheduling, including the capability to interrogate a target position or object multiple times before the traditional classic frame is completed.

Assuming that the purpose of metrics like detection range are used for accurately scoring how LiDAR systems contribute to autonomous vehicle safety, then evaluators should also consider how long it takes these systems to identify hazards. Thus, classification range is a more meaningful metric.

Conventional Metric #2: Velocity

Understanding the speed and direction (velocity) of objects and people in the environment is a fundamentally critical factor in maneuvering any autonomous vehicle safely. However, due to the way conventional LiDAR systems interrogate scenes, it’s also one of their most problematic functions. Currently some LiDAR — specifically, frequency modulated continuous wave (FMCW) LiDAR — claim velocity calculations simultaneously with range.

Vehicle environmental worthiness aside, in order to calculate velocity with FMCW, an object must move some distance while under continuous observation, nominally within 1 millisecond but at a minimum several microseconds. This matters because: 1) a LiDAR scanning system that “stares” at an object will suffer from sluggish frame rates and scan agility (in contrast, an AEye scanner stares for only a few nanoseconds, more than two orders of magnitude less time) and 2) FMCW cannot detect lateral velocity at all. This latter effect is a critical flaw because transverse velocity situations are considerably more common and potentially fatal in comparison to those that are radial.

Furthermore, FMCW and traditional Time-of-flight (TOF) or direct-detect systems are unable to prioritize certain objects over others in terms of collection time. In other words: every object in a scene is treated equally. Thus, the time it takes to assess velocity is constant, regardless of whether some objects pose a higher safety risk than others. In dense traffic, only one percent of the occupancy grid may contain relevant detections.

Direct-detect LiDARs, in contrast to FMCW, require two consecutive measurements for velocity computation and three for acceleration. This means 2-3 frames for relevant tracking information. FMCW can do this in one frame but only for radial velocity. For lateral/transverse velocity, it must also use position updates that require multiple frames. However, AEye’s direct-detect LiDARs have the advantage of being able to compute both radial and lateral components as quickly as one frame. The two or three measurements require we resample the same scene areas somewhere in the range of microseconds to a few milliseconds, which is a capability we uniquely have.

As a metric, velocity simply indicates that a LiDAR system has an ability to measure an object or person and track its speed. It says nothing about how long it takes the domain controller to determine velocity. Therefore, time to true velocity, for all object trajectories in a frame, is a much more useful metric because it better accounts for system safety. The faster a system can calculate true velocity (radial and lateral), the faster it can react to hazards.

Check!

AEye’s Metric
Time to True Velocity

The time between the first measurement of an object and the second is critical. When there’s less time between two measurements, processing times for advanced algorithms — which must correlate between multiple moving objects in a scene — can be kept low. At high speeds, vehicles move far enough that it becomes difficult for LiDAR systems to track which returns are associated with which vehicles. Even the best algorithms that correlate multiple moving objects can be confused when many objects are in the scene and the time between measurements is high. Too long of a pause at fast velocities can be the difference between detecting an object before it’s too late and loss of life.

And, as mentioned earlier, determining velocity is not equally essential for every object within an occupancy grid. In a highway scenario, it’s critical to determine the speed of surrounding and approaching vehicles. On city streets, pedestrians and bicycles moving lateral to the vehicle are important, while stationary objects and surroundings need only be tracked.

Taking these considerations into account, a LiDAR system must have several elements in place to maximize time to true velocity. First, it needs to be able to prioritize objects of interest in its surroundings to avoid oversampling extraneous data (like parked cars) to keep maximum compute resources available. Second, it should collect 3×3 pixel grids for efficient classification of moving objects and people, thereby directing resources where they are needed on a frame-by-frame basis. Third (and to effectively accomplish these goals), there must be intelligence at the sensor level to interpret the data at the edge of the network, thus amplifying the system’s speed.

Only allocating shots to extract velocity and acceleration on important detections — rather than allocating repeat shots everywhere in the frame — vastly reduces the required number of wasted shots per frame. This can only be achieved with a LiDAR system that maximizes time to true velocity by intelligently sensing and directing resources where they are needed. In dense traffic, where only one percent of the scene contains detections, generating velocity only requires a second detection on one percent of the occupancy grid. This effectively reduces the number of required shots from 100 percent to one. Measuring time to true velocity means speed and shot saliency are recognized. This ultimately makes autonomous vehicles safer because ambiguity is eliminated and downstream processing resources are added to the equation.

Real-World Applications: Let’s say an autonomous vehicle is approaching an intersection with a four-way stop. Suddenly, the system detects an off-schedule object entering the intersection in a transverse direction to the vehicle’s path. Is the object a car moving at 45 mph or a person on a bicycle going 5 mph? Will the detection be out of the intersection before the autonomous vehicle arrives? Or should the brakes be quickly engaged? Conventional LiDAR systems will experience difficulty recognizing the threat or reacting in time.

In the same scenario (Figure 2), a vehicle equipped with iDAR will be able to dynamically change the sampling density for Special Regions of Interest, focusing the sensors’ attention on the detected object. Scanning this object at a much higher rate than the rest of the scene, the system can gather all useful data, including critical information like lateral velocity. This design effectively mimics how the human visual cortex conceptually focuses on and evaluates the environment around the vehicle, except that it can focus on every object of interest with equal intensity within an entire scene.

Figure 2 iDAR Velocity

Figure 2. The above figure overlays the images of a biker and a car taken at two different points in time, Frame 1, at Time 1, and Frame 2 at Time 2. The blue dots show the LiDAR returns from the car and biker (Frame 1) and the pink dots show the returns from these objects at a later time, Frame 2.

The AEye Advantage: AEye has patent applications for its intelligent iDAR technology, which can estimate intra-frame motion for objects within a Field-of-View using a tight cluster of pulses. Our system is architected so that the system can revisit areas in microseconds to milliseconds. Comparatively, conventional 10 Hz LiDAR systems can get a pair of shots in 200 milliseconds. Advanced fixed frame LiDARs can achieve 20-30 Hz, but even that is far too slow to avoid collisions in fast pop up target scenarios. We lay out here an architecture and framework to achieve actionable intelligence at kHz rates, a qualitatively faster speed. Thus, iDAR delivers performance that’s more than two orders of magnitude greater than typical systems. At the same time, it also provides better quality information about where an object exists in three-dimensional space, as well as how fast it’s traveling.

iDAR sensors can achieve a scan rate in excess of 100 Hz (3x human vision) with a detection range of one kilometer (5x current LiDAR sensors). This ensures priority-warning data can be sent at low latency into the motion-planning loop of an autonomous vehicle’s path planning system.

Conclusion

It’s not enough to measure how far autonomous vehicles can detect objects ahead or how quickly they can determine the objects radial velocity. Many factors impact a domain controller’s ability to use data collected by sensors and apply algorithms for safe and efficient path planning. Ultimately, it’s the speed with which a vehicle can make decisions that matters most when it comes to operating at the highest degree of safety. To this point, it’s more important to quantify at what range a vehicle controller can classify objects (or alternatively, the latency after initial detection) with a particular sensor rather than simply detect them. Similarly, velocity is a less salient metric than time to true velocity—if calculating velocity takes too long, the hazard will arrive before the vehicle can make a decision about how to react to it.

AEye’s iDAR system enables a more advanced approach to artificial perception for autonomous vehicles. iDAR is able to target objects of interest by scheduling multiple shots in quick succession, meaning, it can both classify and determine velocity in the same period of time that traditional LiDAR sensors would only perceive the objects. This makes it 2 to 3 orders of magnitude faster than conventional systems.

As perception technology evolves, it’s imperative that metrics used to measure its capabilities are continually evaluated. With safety of paramount importance, these metrics should not only indicate what LiDAR systems are capable of achieving, but also how those capabilities bring vehicles closer to optimal safety conditions in real-world driving scenarios. Throughout this series of white papers, AEye will continue to propose new metrics with the goal of proffering enlightened methods of considering what makes a LiDAR system effective.

Sign up to receive AEye updates

Deconstructing Two Conventional LiDAR Metrics, Part 2 —

Original Article

Leave a comment

Your email address will not be published. Required fields are marked *