Inside Waymo’s strategy to grow the best brains for self-driving cars

Right now, a minivan with no one behind the steering wheel is driving through a suburb of Phoenix, Arizona. And while that may seem alarming, the company that built the “brain” powering the car’s autonomy wants to assure you that it’s totally safe. Waymo, the self-driving unit of Alphabet, is the only company in the world to have fully driverless vehicles on public roads today. That was made possible by a sophisticated set of neural networks powered by machine learning about which very is little is known — until now.

For the first time, Waymo is lifting the curtain on what is arguably the most important (and most difficult-to-understand) piece of its technology stack. The company, which is ahead in the self-driving car race by most metrics, confidently asserts that its cars have the most advanced brains on the road today. That’s thanks to a head start in AI investment, some strategic acquisitions by sister company Google, and a close working relationship with the tech giant’s in-house team of AI researchers.

Anyone can buy a bunch of cameras and LIDAR sensors, slap them on a car, and call it autonomous. But training a self-driving car to behave like a human driver, or, more importantly, to drive better than a human, is on the bleeding edge of artificial intelligence research. Waymo’s engineers are modeling not only how cars recognize objects in the road, for example, but how human behavior affects how cars should behave. And they’re using deep learning to interpret, predict, and respond to data accrued from its 6 million miles driven on public roads and 5 billion driven in simulation.

Anca Dragan, one of Waymo’s newest employees, is at the forefront of this project. She just joined the company in January after running the InterACT Lab at the University of California Berkeley, which focuses on human-robot interactions. (A photo on the Berkeley website features Dragan smiling broadly while a robot arm pours her a steaming cup of coffee.) Her role is to ensure our interactions with Waymo’s self-driving cars — as pedestrians, as passengers, as fellow drivers — are wholly positive. Or to put it another way: she’s our backstop against the inevitable robot revolution.

Dragan has to strike a balance. While we don’t want robot overlords, neither do we want milquetoast robot drivers. For instance, if you’re barreling down a busy highway at 65 mph and you want to merge into the left lane, you may just nudge your way in until the other drivers eventually make space for you. A self-driving car that’s been trained to follow the rules of the road may struggle to do that. A video recently appeared on Twitter showing one of Waymo’s minivans trying to merge onto a busy highway and pretty much failing at it.

“How can we make it adapt to the drivers that it’s sharing the road with?” Dragan says. “How do you tailor it to be more comfortable or drive more naturally? Those are the subtle improvements that if you want those to work, you really need a system that fricking works.”

For an innovation that’s supposed to save us from traffic fatalities, it’s been an extremely discouraging few months. In March, a 49-year-old woman was struck and killed by a self-driving Uber vehicle while crossing the street in Tempe, Arizona. A few weeks later, the owner of a Tesla Model X died in a gruesome crash while using Autopilot, the automaker’s semi-autonomous driver assist system. And just last week, a self-driving Waymo minivan was T-boned by a Honda sedan that had swerved into oncoming traffic.

Meanwhile, the public is growing increasingly skeptical. Regulators are starting to rethink the free pass they were considering giving to companies to build and test fully driverless vehicles. In the midst of all this uncertainty, Waymo invited me out to its headquarters in Mountain View, California, for a series of in-depth interviews with the company’s top humans in artificial minds.

Waymo is housed within X, Google’s high-risk research and development laboratory, which is located a few miles from the main Googleplex campus. (In 2015, when Google restructured itself into a conglomerate called Alphabet, X dropped the Google from its name.) A year later, Google’s self-driving car project “graduated” and became an independent company called Waymo. The self-driving team is still housed in the mother ship, though, alongside the employees working on delivery drones and internet balloons.

The building, a former shopping mall, is Bay Area bland. The only thing to distinguish it is the pair of self-driving Chrysler Pacifica minivans tooling around in the parking lot that occasionally pull over so employees can take selfies in front of them. In Googleland, the celebrities are the cars.

Waymo already has a huge lead over its competitors in the field of autonomous driving. It has driven the most miles — 6 million on public roads, and 5 billion in simulation — and has collected vast stores of valuable data in the process. It has partnerships with two major automakers, Fiat Chrysler and Jaguar Land Rover, with several more in the pipeline. Its test vehicles are on the road in Texas, California, Michigan, Arizona, Washington, and Georgia. And it plans to launch a fully driverless commercial taxi service in Arizona later this year.

Now, the company wants its advantages in the still-growing field of AI to be more widely known. Waymo CEO John Krafcik gave a presentation at the company’s annual I/O developer conference this week. And the message was clear: our cars can see further, perceive better, and make snap decisions faster than anyone else.

“It’s a really hard problem if you are working on a fully self-driving vehicle… because of capability requirements and accuracy requirements,” Dmitri Dolgov, Waymo’s chief technology officer and vice president of engineering, tells me. “And experience really matters.”

[embedded content]

Deep learning, which is a type of machine learning that uses lots of layers in a neural network to analyze data at different abstractions, is the perfect tool for improving the perception and behavior of self-driving cars, Dolgov says. “And we started pretty early on it … just as the revolution was happening right here, next door.”

AI specialists from the Google Brain team regularly collaborate with Dolgov and his fellow engineers at Waymo on methods to improve the accuracy of its self-driving cars. Lately, they’ve been working together on some of buzzier elements of AI research like “automated machine learning,” in which neural nets are used to train other neural nets. Waymo may be its own company, but when it comes to projecting an aura of invulnerability, it helps to have your older and much tougher brother at your back.

The sudden interest by Waymo in burnishing its AI credentials is tied with its high-stakes effort to deploy vehicles that don’t require someone in the driver’s seat. To date, Waymo is the only company to take on this risk. The rest of the industry is rushing to catch up, buying up tiny startups in an effort to jump-start their own autonomy efforts. Moreover, key members of Google’s self-driving team have left to hang their own shingle, lured by big possibilities and lots of money, and leaving the tech giant to wrestle with news stories about “attrition” and “brain drain.”

Former members of Google’s self-driving team and outside experts concede that Waymo indeed appears to have a major head start in the field, but acknowledge that its competitors are likely to catch up eventually. After all, Waymo doesn’t have a monopoly on machines with brains.

“As strong as Google is,” says Dave Ferguson, a former lead engineer on Google’s self-driving team who has since left to start his own company, “the field is stronger.”

This wasn’t always the case. Back in the early 2000s, the field was fairly weak.

Neural networks, a type of machine learning where programmers build models that sift through vast stores of data and look for together patterns, weren’t hot yet. A big shift was going from neural nets that were quite shallow (two or three layers) to the deep nets (double-digit layers). While the concept dates back to the 1950s — at the birth of AI research — most computers weren’t powerful enough to process all the data needed. All that changed with the ImageNet competition in 2009.

ImageNet started out as a poster from Princeton University researchers, displayed at a 2009 conference on computer vision and pattern recognition in Florida. (Posters are a typical way of sharing information at these types of machine learning conferences.) From there, it grew into an image dataset, then a competition to see who could create an algorithm that could identify the most images with lowest error rate. The dataset was “trimmed” from around 10,000 images to just a thousand image categories or “classes,” including plants, buildings, and 90 of the 120 dog breeds. Around 2011, the error rate was about 25 percent, meaning one in four images were being identified incorrectly by the teams’ algorithms.

Help came from an unexpected place: powerful graphics processing units (GPUs) typically found in the video game world. “People started realizing that those devices could actually be used to do machine learning,” says Vincent Vanhoucke, a former voice researcher at Google who now serves as the company’s technical lead for AI. “And they were particularly well-suited to run neural networks.”

The biggest breakthrough was in 2012, when AI researcher Geoffrey Hinton and his two graduate students, Ilya Sutskever and Alex Krizhevsky, showed a new way to attack the problem: a deep convolutional neural network to the ImageNet Challenge that could detect pictures of everyday objects. Their neural net embarrassed the competition — reducing the error rate on image recognition to 16 percent, from 25 percent the other methods produced.

“I believe that was the first time that a deep learning, neural net-based approach beat the pants off more standard approach,” says Ferguson, the former Google engineer. “And since then, we’ve never looked back.”

Krizhevsky takes a more circumspect approach to his role in the 2012 ImageNet Challenge. “I guess we were at the right place at the right time,” he tells me. He attributes their success to his hobby of programming GPUs to run code for the team’s neural net, enabling them to run experiments that would normally take months in just a matter of days. And Sutskever made the connection to apply the technique to the ImageNet competition, he says.

Hinton and his team’s success “triggered a snowball effect,” Vanhoucke says. “A lot of innovation came from that.” An immediate result was Google acquiring Hinton’s company DNNresearch, which included Sutskever, and Krizhevsky, for an undisclosed sum. Hinton stayed in Toronto, and Sutskever and Krizhevsky moved to Mountain View. Krizhevsky joined Vanhoucke’s team at Google Brain. “And that’s when we started thinking about applying those things to Waymo,” Vanhoucke says.

Another Google researcher, Anelia Angelova, was the first to reach out to Krizhevsky about applying their work to Google’s car project. Neither officially worked on that team, but the opportunity was too good to ignore. They created an algorithm that could teach a computer to learn what a pedestrian looked like — by analyzing thousands of street photos — and identify the visual patterns that define a pedestrian. The method was so effective that Google began applying the technique to other parts of the project, including prediction and planning.

Problems emerged almost immediately. The new system was making too many errors, mislabeling cars, traffic signals, and pedestrians. It also wasn’t fast enough to run in real-time. So Vanhoucke and his team combed through the images, where they discovered most of the errors were mistakes made by human labelers. Google brought them in to provide a baseline, or “ground truth,” to measure the algorithm’s success rate — and they’d instead added mistakes. The problem with autonomous cars, it turned out, was still people.

After correcting for human error, Google still struggled to modify the system until it could recognize images instantly. Working closely with Google’s self-driving car team, the AI researchers decided to incorporate more traditional machine learning approaches, like decision trees and cascade classifiers, with the neural networks to achieve “the best of both worlds,” Vanhoucke recalls.

“It was a very, very exciting time for us to actually show that those techniques that have been used to find cat pictures and interesting things on the Web,” he says. “Now they were actually being used for improving safety in driverless cars.”

Krizhevsky eventually left Google several years later, saying he “lost interest” in the work. “I got depressed for a while,” he admits. His departure baffled his colleagues at Google, and he has since taken on a bit of a mythical status. (Ferguson called him an “AI whisperer.”) Today, Krizhevsky wonders whether these early successes will be enough to give Google an insurmountable lead in the field of autonomy. Other car and tech companies have already caught on to the importance of machine learning. And Waymo’s data may be too specific to extrapolate to a global scale.

“I think Tesla has the unique advantage of being able to collect data from a very wide variety of environments because there are Tesla owners with self-driving hardware all over the world,” he told me. “This is very important for machine learning algorithms to generalize. So I would guess that at least from the data side, if not the algorithmic side, Tesla might be ahead.”

AI and machine learning are essential to self-driving cars. But some of Waymo’s competitors — which includes former members of Google’s self-driving team — wonder how much longer the company’s advantages will last.

Sterling Anderson is the ex-director of Autopilot at Tesla and co-founder of Aurora Innovation which he started with the former head of Google’s self-driving car program, Chris Urmson. He says that a natural consequence of improvements in AI is that big head-starts like Waymo’s are “less significant as they have been.” In other words, everyone working on self-driving cars in 2018 is already using deep learning and neural nets from the outset. The shine is off. And like an old piece of fruit, a lot of that data from the early days has grown mushy and inedible. A mile driven in 2010 is not a mile driven in 2018.

“Data gets left on the floor after a number of years,” Anderson says. “It becomes useful for learning and becomes useful for evolving the architecture and evolving the approach, but at some point, claiming that I’ve got X million miles or X billion miles, or whatever it is, becomes less significant.”

Waymo’s engineers agree. “For the sake of machine learning specifically there’s such a thing as a point of diminishing return,” says Sacha Arnoud, head of the company’s machine learning and perception division. “Driving 10X more will not necessarily give you much greater datasets because what matters is the uniqueness of the examples you find.”

In other words, each additional mile that Waymo accrues needs to be interesting for it to be relevant to the process of training the company’s neural networks. When the cars encounter edge cases or other unique scenarios, like jaywalkers or parallel-parking cars, those are filtered through Waymo’s simulator to be transformed into thousands of iterations that can be used for further training.

Robots can also be tricked. Adversarial images, or pictures engineered to fool machine vision software, can be used to undermine or even crash self-driving cars. Stickers can be applied to a stop sign to confuse a machine vision system into thinking it’s a 45 mph sign.

A neural network trained by Google to identify everyday objects was recently tricked into thinking a 3D-printed turtle was actually a gun. Waymo’s engineers say they are building redundancy into their system to address these possibilities. Add this to the long list of concerns surrounding self-driving cars, which includes hacking, ransomware, and privacy breaches.

“Tell me the difference between a cat and a dog.”

Dolgov is sitting in one of X’s conference rooms, whiteboard marker in hand, MacBook Pro splayed before him, asking me to describe to him the difference between Garfield and Odie.

Before I can stammer out a reply, Dolgov keeps going: “If I give you a picture and I ask you is it a cat or a dog, you will know very quickly, right? But if I ask you to describe to me how you came to that conclusion, it would not trivial. You think it has something to do with the size of the thing, the number of legs is the same, the number of tails is the same, usually same number of ears. But it’s not obvious.”

This type of question is really well-suited for deep learning algorithms, Dolgov says. It’s one thing to come up with a bunch of basic rules and parameters, like red means stop, green means go, and teaching a computer to distinguish between different types of traffic signs. Teaching a computer to pick a pedestrian out of an ocean of sensor data is easier than describing the difference, or even encoding it.

Waymo uses an automated process and human labelers to train its neural nets. After they’ve been trained, these giant datasets also need to be pruned and shrunk so they can be deployed in the real world in Waymo’s vehicles. This process, akin to compressing a digital image, is key building the infrastructure to scale to a global system.

If you look at images captured by the cars’ cameras and place those alongside the same scene built from the vehicle’s laser sensor data, you start to see the enormity of the problem that Waymo is trying to address. If you’ve never seen a LIDAR rendering, the best way to describe it is Google’s Street View as a psychedelic blacklight poster.

These images provide a birds-eye view of the self-driving car and what it “sees” around it. Pedestrians are depicted as yellow rectangles, other vehicles are purple boxes, and so forth. Waymo has categories for “dog cat” and “bird squirrel,” among others, that it uses for animals. (Turns out, the differences between a dog and a cat aren’t entirely relevant to autonomous vehicles.) But behind that, Waymo is training its algorithms to perceive atypical actors in the environment: a construction worker waist deep in a manhole, someone in a horse costume, a man standing on the corner spinning an arrow-shaped sign.

To take a human driver out of the equation, the car needs to be adaptive to the weirder elements of a typical drive. “Rare events really, really matter,” Dolgov tells me, “especially if you are talking about removing a driver.”

Programming the car to respond to someone crossing the street at daytime in one thing, but getting it to perceive and react to a jaywalker is entirely different. What if that jaywalker stops at a median? Waymo’s self-driving cars will react cautiously since pedestrians often walk up to a median and wait. What if there is no median? The car recognizes this as unusual behavior and slows down enough to allow the pedestrian to cross. Waymo has built models using machine learning to recognize and react to both normal and unusual behavior.

Neural nets need a surfeit of data to train. That means Waymo has amassed “hundreds of millions” of vehicle labels alone. To help put that in context, Waymo’s head of perception Arnoud estimated that a person labeling a car every second would take 20 years to reach 100 million. Operating every hour of every day of every week, and hitting 10 labels a second, it still takes Waymo’s machines four months to scroll through that entire dataset during its training process, Arnoud says.

It takes more than a good algorithm to break free of the geofenced test sites of the Phoenix suburbs. If Waymo wants its driverless cars to be smart enough to operate in any environment and under any conditions — defined as Level 5 autonomy — it needs a powerful enough infrastructure to scale its self-driving system. Arnoud calls this the “industrialization” or “productionization” of AI.

As part of Alphabet, Waymo uses Google’s data centers to train its neural nets. Specifically, it uses a high-powered cloud computing hardware system called “tensor processing units,” which underpins some of the company’s most ambitious and far-reaching technologies. Typically, this work is done using commercially available GPUs, often from Nvidia. But Google has opted over the last few years to build some of this hardware itself and optimize for its own software. TPUs are “orders of magnitude” faster than CPUs, Arnoud says.

The future of AI at Waymo isn’t sentient vehicles (sorry, Knight Rider fans). It’s in cutting-edge research like automated-machine learning, in which the process of building machine learning models is automated. “Essentially the idea that you have AI machine learning that’s creating other AI models that actually solve the problem you’re trying to solve,” Arnoud says.

This becomes extremely useful for driving in areas with unclear lane markings. These days, the most challenging driving environments require self-driving cars to make guidance decisions without white lines, Botts Dots, or clear demarcations at the edge of the road. If Waymo can build machine learning models to train its neural nets to drive on streets with unclear markings, than Waymo’s self-driving cars can put the Phoenix suburbs in its rear view and eventually hit the open road.