
Follow ZDNET: Add us as a preferred source on Google.
ZDNET's key takeaways
- Even the best AI coding models succeed less than 23% of the time.
- AI isn't falling short of its potential; it's being oversold.
- AI advocates need to show the positive and negative sides.
There has been a lot of debate about the success rates of artificial intelligence, with consternation that rising investments in AI tools and infrastructure are falling short of delivering the highly anticipated results that vendors and consultants often promise.
For technology teams working in the trenches to integrate and incorporate AI into their technology stacks, the challenge has been daunting, a new survey shows. The BlueOptima AI Refactoring Evaluation (BARE) reports that even the best AI coding models succeeded less than 23% of the time when working on real production code. What's more, benchmark scores don't reflect real-world performance. Most models scored above 85% on popular benchmarks, but averaged just 17% success on production maintainability tasks.
Also: OpenAI upgrades Codex to automate your workflows - and compete better with Claude Code
The study benchmarked 57 LLMs on maintainability-oriented refactoring tasks drawn from 4,276 real source-code files spanning nine programming languages (C, C++, C#, Go, Java, JavaScript, PHP, Python, TypeScript), yielding 243,732 model-file evaluation pairs.
AI coding ROI varied dramatically by language and task. Success rates ranged from 32% in JavaScript to just 4% in C, and dropped as low as 1.5% on complex architectural tasks, the BARE study also shows.
So, is AI falling short of its potential, or is it just being oversold? Again, the study serves as a reality check: dropping AI into an operation will not deliver results without work behind the scenes, including on maintainability.
Also: This AI expert says the job apocalypse isn't coming, even if you're a coder - here's why
"To count as successful, AI-generated code needed to meet strict criteria," the report's authors explained. The code "needs to compile and run correctly; preserve behavior with no regressions; and improve maintainability that is measured, not assumed."
Much of the glowing praise for AI from vendors, consultants, and others conveniently glosses over the hard work that goes on at the backend of AI. In short, the classic maxim often applies to AI marketing: 'If it sounds too good to be true, it probably is.'
As a result, AI is being vastly oversold, said David Linthicum, a leading voice of common sense in technology for many years. In a recent video, he urged managers to beware of those "eager to capitalize on the technology's glamour. Only with a clear-eyed, evidence-driven perspective can we move past the hype and ensure that technology serves business, not the other way around."
Also: 6 reasons why autonomous enterprises are still more a vision than reality
The biggest risk with AI tools and platforms is that they may "cost 10 to 20 times that of traditional systems," said Linthicum. Too many of today's AI promotions are "backed by robust PR campaigns that outpace the depth of actual understanding," he continued. The risk grows as AI becomes a boardroom priority.
"Decisions about organizational strategy, investment, and innovation may hinge on the advice of those whose technical grasp doesn't extend beneath the surface." Ill-informed guidance may lead to "costly overspending and strategic blunders," he warned.
Evidence suggests you can also add misuse of AI buzzwords to this mix. "While most audiences lack the technical background to interrogate bold claims, self-styled experts deploy sophisticated language to obscure their limitations," Linthicum cautioned.
Also: Claude Code made an astonishing $1B in 6 months - and my own AI-coded iPhone app shows why
"Social media and the broader digital conversation compound the problem, rewarding those with show-stopping stories and unfounded optimism instead of those who admit trade-offs and advocate for nuanced progress. Companies often value captivating storytellers over the implementers who truly understand the terrain."
The stakes are high, Linthicum continued: "Today's AI systems are complex and expensive, far beyond most traditional solutions. Blind adoption, fueled by unchecked optimism, risks both resources and organizational futures."
Also: 7 AI coding techniques I use to ship real, reliable products - fast
Professionals should develop a sharp eye for "true expertise," he urged: "Separating the qualified from the crowd -- those who appreciate AI's limits as well as its potential -- is vital for any business navigating this high-stakes landscape. Leaders must seek out those who embrace both sides of the AI equation: the promise and the pitfalls, the opportunities and the inherent risks."
A key element of the successful formula is to "make sure the people driving your AI strategy are going to make good decisions," said Linthicum. "We need to understand what they know and what they don't know. And we need to understand how they should be making decisions, including engaging people who know the upsides and downsides of using this technology, knowing how to build this stuff to make sure they don't make mistakes."
He suggested a balanced perspective is important, as people need to hear about the downsides of using technology: "The reality is that unless you consider the upsides and the downsides together, you're not going to have a viable solution for the business and eventually you're going to push the business off a cliff."









English (US) ·