Without a lifetime of experience to build on like humans have (and totally take for granted), robots that want to learn a new skill often have to start from scratch. Reinforcement learning is a technique that lets robots learn new skills through trial and error, but especially in the case of learning end-to-end vision based control policies, it takes a lot of time because the real world is a weirdly-lit friction-filled obstacle-y mess that robots can’t understand without a frequently impractical amount of effort.
Roboticists at UC Berkeley have vastly sped up this process by doing the same kind of cheating that humans do—instead of starting from scratch, you start with some previous experience that helps get you going. By leveraging a “foundation model” that was pre-trained on robots driving themselves around, the researchers were able to get a small-scale robotic rally car to teach itself to race around indoor and outdoor tracks, matching human performance after just 20 minutes of practice.
[embedded content]
That first pre-training stage happens at your leisure, by manually driving a robot (that isn’t necessarily the robot that will be doing the task that you care about) around different environments. The goal of doing this isn’t to teach the robot to drive fast around a course, but instead to teach it the basics of not running into stuff.
With that pre-trained “foundation model” in place, when you then move over to the little robotic rally car, it no longer has to start from scratch. Instead, you can plop it onto the course you want it to learn, drive it around once slowly to show it where you want it to go, and then let it go fully autonomous, training itself to drive faster and faster. With a low-resolution, front-facing camera and some basic state estimation, the robot attempts to reach the next checkpoint on the course as quickly as possible, leading to some interesting emergent behaviors:
The system learns the concept of a “racing line,” finding a smooth path through the lap and maximizing its speed through tight corners and chicanes. The robot learns to carry its speed into the apex, then brakes sharply to turn and accelerates out of the corner, to minimize the driving duration. With a low-friction surface, the policy learns to over-steer slightly when turning, drifting into the corner to achieve fast rotation without braking during the turn. In outdoor environments, the learned policy is also able to distinguish ground characteristics, preferring smooth, high-traction areas on and around concrete paths over areas with tall grass that impedes the robot’s motion.
The other clever bit here is the reset feature, which is necessary in real world training. When training in simulation, it’s super easy to reset a robot that fails, but outside of simulation, a failure can (by definition) end the training if the robot gets itself stuck somehow. That’s not a big deal if you want to spend all your time minding the robot while it learns, but if you have something better to do, the robot needs to be able to train autonomously from start to finish. In this case, if the robot hasn’t moved at least 0.5 meters in the previous three seconds, it knows that it’s stuck, and will execute a simple behavior of turning randomly, backing up, and then trying to drive forward again, which gets it unstuck eventually.
During indoor and outdoor experiments, the robot was able to learn aggressive driving comparable to a human expert after just 20 minutes of autonomous practice, which the researchers say “provides strong validation that deep reinforcement learning can indeed be a viable tool for learning real-world policies even from raw images, when combined with appropriate pre-training and implemented in the context of an autonomous training framework.” It’s going to take a lot more work to implement this sort of thing safely on a larger platform, but this little car is taking the first few laps in the right direction just as quickly as it possibly can.
FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing, by Kyle Stachowicz, Arjun Bhorkar, Dhruv Shah, Ilya Kostrikov, and Sergey Levine from UC Berkeley, is available on arXiv.
From Your Site Articles
Related Articles Around the Web