
Your pesky remote freelancers demanding more money as inflation soars? You could try replacing them with AI agents instead — but it probably won’t work out well.
New research highlighted by Wired shows how these AI models designed to automate tasks — if not entire jobs — turn out to be incredibly unproductive compared to the humans they’re replacing.
Conducted by researchers at the nonprofit Center for AI Safety (CAIS) and the massive data annotation firm Scale AI, whose army of freelancers performs much of the grunt work underpinning the AI industry, the tests involved giving six leading AI agents various simulated freelance tasks.
The outcome of those tests, detailed in a new paper, was damning. Not a single AI agent was able to perform more than 3 percent of the work, making just $1,810 out of a possible $143,991.
“I should hope this gives much more accurate impressions as to what’s going on with AI capabilities,” DAIS director Dan Hendrycks told Wired.
For the tests, the researchers developed their own benchmark called the Remote Labor Index, which uses a wide range of real-world remote projects to evaluate the bots’ ability to perform economically valuable work in industries ranging from game development to data analysis.
The top performer, they found, was an AI agent from the Chinese startup Manus with an automation rate of just 2.5 percent, meaning it was only able to complete 2.5 percent of the projects it was assigned at a level that would be acceptable as commissioned work in a real-world freelancing job, the researchers said.
Second place was a tie, at 2.1 percent, between Elon Musk’s Grok 4 and Anthropic’s Claude Sonnet 4.5, which the company claims is the “best coding model in the world” and the “strongest model for building complex agents.”
OpenAI’s newest GPT-5 model and its purported “PhD level” intelligence came next at 1.7 percent. CEO Sam Altman has claimed that GPT-5 is a “significant step along the path to AGI,” or artificial general intelligence, a hypothetical AI system that most define as exceeding human cognitive abilities in virtually all aspects. (OpenAI considers AGI to be “highly autonomous systems that outperform humans at most economically valuable work,” something that the RLI benchmark shows GPT-5 is nowhere close to doing.)
Ironically, OpenAI’s actual AI agent, with the exciting brand name of ChatGPT Agent, was the second worst performer of the whole bunch, barely cracking 1.3 percent. But the absolute bottom of the barrel choice turned out to be Google’s Gemini 2.5 Pro, with a dismal 0.8 percent showing.
Selling AI agents to employers has been the obsession of the AI industry as leading players like OpenAI struggle to capitalize on the popularity of their AI chatbots, many of which are free to use. But despite many CEOs eagerly culling their workforces and embracing AI, it remains to be seen if automation is able to actually increase productivity, let alone make up for the shortfall of human talent it’s replacing.
“We have debated AI and jobs for years, but most of it has been hypothetical or theoretical,” director of research at Scale AI Bing Lie told Wired.
Anecdotally, many bosses who replaced their employees with AI have been forced to rehire them after discovering the AI tools weren’t up to snuff, and a slew of research is painting a similarly damning picture. One MIT study found that 95 percent of companies that piloted AI initiatives saw no meaningful growth in revenue. Another demonstrated that introducing AI tools into employee workflows resulted in a deluge of low quality “workslop,” which not only bogged everything down because of its need to be heavily revised for errors, but created tension between coworkers who resented being forced to correct such lazy work.
Hendrycks pointed out some of the flaws still plaguing AI agents, despite the field’s rapid advances. “They don’t have long-term memory storage and can’t do continual learning from experiences. They can’t pick up skills on the job like humans,” he told Wired.
So far, though, these glaring flaws haven’t seemed to slow down the freight train of AI related firings. If anything, they’re still picking up steam.
More on AI: After Bringing Down Internet, Amazon Announces Biggest Mass Firing in Its History