
Dilara Irem Sancar/Anadolu via Getty Images
ZDNET’s key takeaways
- The botched rollout of GPT-5 doesn’t suggest superintelligence.
- GPT-5 represents incremental technical progress.
- Scholars are debunking AI hype with detailed analyses.
Nearly a year ago, OpenAI CEO Sam Altman declared artificial “superintelligence” was “just around the corner.”
Also: Sam Altman says the Singularity is imminent – here’s why
Then, last June, he trumpeted the arrival of superintelligence, writing in a blog post: “We have recently built systems that are smarter than people in many ways.” But this rhetoric clashes with what is rapidly shaping up to be a rather botched debut of the much-anticipated GPT-5 model from Altman’s AI company, OpenAI.
(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
A very underwhelming rollout
In the days since it was released, the new AI model has received a fair amount of negative feedback and negative press — surprising given that, the week before, the reception to the company’s first open-source models in six years was widely acclaimed.
“OpenAI’s GPT-5 model was meant to be a world-changing upgrade to its wildly popular and precocious chatbot,” writes Wired’s Will Knight. “But for some users, last Thursday’s release felt more like a wrenching downgrade, with the new ChatGPT presenting a diluted personality and making surprisingly dumb mistakes.”
Also: OpenAI’s GPT-5 is now free for all: How to access and everything else we know
There were simple technical snafus, such as a broken mechanism for switching between GPT-5 and GPT-4o, and users complaining of “sluggish responses, hallucinations, and surprising errors.”
As Knight points out, hype has been building for GPT-5 since the impressive debut of its predecessor, GPT-4, in March 2023. That year, Altman emphasized the massive technical challenge, lending the impression of a kind of moon shot with GPT-5.
“The number of things we’ve gotta figure out before we make a model that we’ll call GPT-5 is still a lot,” said Altman in a press conference that year following the company’s first-ever developer conference, which took place in San Francisco.
Progress, but no moon shot
What has been delivered appears to be an improvement, but nothing like a moon shot.
Also: OpenAI CEO sees uphill struggle to GPT-5, potential for new kind of consumer hardware
On one of the most respected benchmark tests of artificial intelligence, called the “Abstraction and Reasoning Corpus for Artificial General Intelligence,” or ARC-AGI-2, GPT-5 has scored better than some predecessors but also below the recently introduced Grok-4 developed by Elon Musk’s xAI, according to ARC-AGI’s creator on X, Francois Chollet.
Grok 4 is still state-of-the-art on ARC-AGI-2 among frontier models.
15.9% for Grok 4 vs. 9.9% for GPT-5. pic.twitter.com/wSezrsZsjw
— François Chollet (@fchollet) August 7, 2025
On the older model of the AGI test, ARC-AGI-1, GPT-5 scored 67.5% correct, Chollet wrote, which is below the 76% that an older OpenAI model, o3, scored in December.
GPT-5 on ARC-AGI Semi Private Eval
GPT-5
* ARC-AGI-1: 65.7%, $0.51/task
* ARC-AGI-2: 9.9%, $0.73/taskGPT-5 Mini
* ARC-AGI-1: 54.3%, $0.12/task
* ARC-AGI-2: 4.4%, $0.20/taskGPT-5 Nano
* ARC-AGI-1: 16.5%, $0.03/task
* ARC-AGI-2: 2.5%, $0.03/task pic.twitter.com/KNl7ToFYEf— ARC Prize (@arcprize) August 7, 2025
In coding, each new AI model generally shows some progress.
ZDNET’s David Gewirtz relates in his testing that GPT-5 is actually a step backward. David concedes GPT-5 did “provide a jump” in the analysis of code repositories but adds that it wasn’t “a game-changer.”
What’s happening here? The hype of Altman and others about superintelligence has yielded to mere progress.
“Overdue, overhyped and underwhelming,” wrote the relentless Gen AI critic Gary Marcus on his Substack. “But this time, the reaction was different. Because expectations were through the roof, a huge number of people viewed GPT-5 as a major letdown.”
AI scholars are pushing back on the hype
For all the negative press, it’s unlikely Altman and others will abandon the rhetoric about superintelligence. However, the lack of a true “cognitive” breakthrough in GPT-5, after so much expectation, may fuel closer scrutiny of terms often tossed around, such as “thinking” and “reasoning.”
The press release for GPT-5 from OpenAI emphasizes how the model excels at what has come to be called reasoning, where AI models generate verbose output about the process of arriving at an answer to a prompt.
“When using reasoning, GPT-5 is comparable to or better than experts in roughly half the cases,” the company states.
Also: OpenAI returns to its open-source roots with new open-weight AI models, and it’s a big deal
The industry’s research teams have recently pushed back on claims of reasoning.
In a widely cited research paper from Apple last month, the company’s researchers concluded that so-called large reasoning models, LRMs, do not consistently “reason” in any sense that one would expect of the colloquial term. Instead, the programs tend to become erratic in how they approach increasingly complex problems.
“LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across scales and problems,” wrote lead author Parshin Shojaee and team.
As a consequence, “Frontier LRMs face a complete accuracy collapse beyond certain complexities.”
Similarly, Arizona State University researchers Ghengshuai Zhao and team write in a report last week that “chain-of-thought,” the string of verbose output produced by the LRMs, “often leads to the perception that they engage in deliberate inferential processes.” But, they conclude, the reality is in fact “more superficial than it appears.”
Also: This free GPT-5 feature is flying under the radar – but it’s a game changer for me
Such apparent reasoning is “a brittle mirage that vanishes when it is pushed beyond training distributions,” Zhao and team conclude after studying the models’ results and their training data.
Such technical assessments are challenging the hyperbole from Altman and others that exploits notions of intelligence with casual, unsubstantiated assertions.
It would behoove the average individual to also debunk the hyperbole and to pay very close attention to the cavalier way that terms such as superintelligence are tossed around. It may make for more reasonable expectations whenever GPT-6 arrives.