Machine learning systems have become increasingly popular in the world of scientific research, and it’s easy to understand why.
The algorithms — designed to examine complex, wide-ranging datasets for patterns that could predict future outcomes — can save a great deal of person-hours, and many hope they’ll even be able to find patterns that humans, through more traditional methods of data analysis, can’t.
Impressive, yes. But machine learning models are so complex that it’s notoriously difficult even for their creators to explain their outputs. In fact, they’ve even been known to cheat in order to arrive at a tidy solution.
Add that reality to the fact that many scientists now leveraging the tech aren’t experts in machine learning, and you have a recipe for scientific disaster. As Princeton professor Arvind Narayanan and his PhD student Sayash Kapoor explained to Wired, a surprising number of scientists using these systems may be making grave methodological errors — and if that trend continues, the ripple effects in academia could be pretty severe.
According to Wired, the duo became concerned when they came across a political science study that, using machine learning-produced data, claimed it could predict the next civil war with a staggering 90 percent accuracy. But when Narayanan and Kapoor took a closer look, they discovered that the paper was riddled with false outcomes — a result, the researchers say, of something called “data leakage.”
In short, data leakage occurs when a machine learning system is using numbers that it shouldn’t. It usually happens when users mishandle data pools, skewing the way the model “learns.”
After discovering the data leakage in the civil war paper, the Princeton researchers started searching for similar machine learning mistakes in other published studies — and the results, which they published in their own yet-to-be-peer-reviewed paper, were striking. They found data leakage in a grand total of 329 papers across a number of fields, including medicine and social sciences.
“They were claiming near-perfect accuracy, but we found that in each of these cases, there was an error in the machine learning pipeline,” Kapoor told the magazine.
As they explain in the paper, the proliferation of machine learning is resulting in something they’re calling a “reproducibility crisis,” which basically means that the results of a study can’t be reproduced by followup research.
The claim raises the specter that a sequel could be looming to another serious replication crisis that’s shaken the scientific establishment over the past decade, in which researchers misused statistics to arrive at sweeping conclusions that amounted to nothing more than statistical noise in large datasets.
If it holds up to further scrutiny, it’d be an extremely concerning revelation. Dead spider robots aside, most research isn’t done for no reason. The goal of most science is to eventually apply it to something, in some way, whether it’s used to carry out some kind of immediate action or to inform future study. A mistake in an information pipeline anywhere will frequently lead to followup errors down the road — and as it probably goes without saying, that could have some pretty devastating consequences.
According to Wired, Narayanan and Kapoor believe that the prevalence of machine learning mistakes in scientific research can be attributed to two things: the hype around the systems, and the lack of training provided to those who use them. The AI industry has been marketing machine-learning software that promises ever-increasing levels of ease and efficiency — and as Narayanan and Kapoor point out, that’s not necessarily a good thing.
“The idea that you can take a four-hour-long online course and then use machine learning in your scientific research has become so overblown,” Kapoor says. “People have not stopped to think about where things can potentially go wrong.”
Of course, scientists can make mistakes without AI. It also doesn’t help that machine learning can, in a way, feel pretty difficult to question, especially when both ease and efficiency are part of the sales pitch — after all, it’s just crunching numbers, right? But as it stands, it sounds like researchers are likely making some serious errors not just with machine learning, but because of it.
That’s not to say that AI can’t be useful for scientific study. We’re sure that in many cases it has been, and it will probably continue to be. Clearly, though, researchers who use it need to be careful, and really ask themselves if they actually know what they’re doing. Because in the end, these aren’t machine errors — they’re human ones.
To echo every math teacher ever: maybe show your work next time.
READ MORE: Sloppy Use of Machine Learning Is Causing a ‘Reproducibility Crisis’ in Science [Wired]
More on machine learning: Ambitious Researchers Want to Use AI to Talk to All Animals