Waymo carried paying passengers more than 23,000 miles in California between September and November of last year, up from about 13,000 miles over a similar timeframe just six months earlier, according to disclosures to state regulators.
Data caps in some cases have factored in the priorities of autonomous vehicle companies. With some negotiation allowed, Chatham’s team allots quarterly storage allowances to groups of engineers working on different tasks, such as developing AI to identify what’s around a vehicle (perception) or testing planned software updates against past rides (evaluation). Those teams decide what’s worth keeping—say, data on the actions of emergency vehicles—and an automated system filters out everything else. “That becomes a business decision,” Chatham says. “Is snow or rain data more important to the business?”
Snow has won out for now, because Waymo so far has only limited data from driving in it. “We’re keeping every piece,” Chatham says. Rain has gotten less interesting. “We’ve gotten better at rain, so we don’t need to go to infinity.” Being data-thrifty can sometimes prompt creativity or valuable discoveries, he says. Waymo learned at one point that its rain data needlessly included all the sensor readings its cars had collected while parked.
Across self-driving projects, data from busier, crazier times has the best chance of surviving. “Rare objects and unusual scenarios, such as obstacles in the roadway or cyclists with surfboards,” says Balajee Kannan, vice president of autonomy at driverless tech maker Motional, a joint venture between Hyundai and automotive supplier Aptiv.
The quickly-growing Cruise has said that less than 1 percent of the data it generates from driving in San Francisco contains what its teams view as useful information, so it too doesn’t store all of them now. Its autonomous Chevy Bolt cars drove paying passengers over 13,000 miles in the city last fall, compared with 3,400 miles when it kicked off service during the summer. With its deployment growing, Cruise is working on improvements to its data storage systems that make it easier and more affordable to expand service, though spokesperson Rachel Holm declines to share details.
Deletion isn’t the only solution. Moving data to “cold” storage, which at AWS costs as little as one-tenth of a cent per gigabyte per month, can also shed costs, but they can only be accessed slowly, limiting their usefulness.
Aurora, which is testing driverless trucks on freeways in Texas, uses an automated system to sort the terabytes of data generated by driving about 50 loads per week for pilot customers across the state. Engineers flag crucial data, such as recent incidents involving dangerous road debris or aggressive drivers, to ensure it is saved in regular storage. Anything unprotected or unused is automatically put on a death watch, moving to successively colder storage every month until, after three months, a substantial amount starts getting deleted. Measurements calculated from the raw data are the only bits kept.
“It’s like trimming your fingernails,” says Tim Kelton, who runs Aurora’s infrastructure. “You have to do it every week. It’s not something you can ignore.” The company also ditches data from sessions when its technology is driving really well or running on outdated sensors, because there’s less to learn from. Overall, only about 15 percent of Aurora’s data are in its most accessible tier of storage.
Not everyone is at their limits just yet. TuSimple, another driverless trucking company, has collected, compressed, cataloged, and stored all the data from each of the tens of thousands of drives since its founding in 2015. But the company, which conducted its first driverless route in December 2021, is keeping an eye on its 50 petabytes of capacity, and moves most data to cold storage after four years, says Robert Rossi, its vice president of operations.
AI software that can extract valuable data from compressed files could eventually help companies keep more logs without breaking the data bank, says Weisong Shi, a computer scientist at the University of Delaware who has worked with automakers to cut data storage and transmission.
But he points out that if Waymo and its competitors finally manage to reach wide deployment, with large fleets of vehicles, they’ll have to junk a lot more data. “Once you go into mass production, cost will be a big deal,” Shi says. “We haven’t reached the point where we desperately need more storage, but this day will be coming soon.”