AI-authored code contains worse bugs than software crafted by humans

Generating code using AI increases the number of issues that need to be reviewed and the severity of those issues.

CodeRabbit, an AI-based code review platform, made that determination by looking at 470 open source pull requests for its State of AI vs Human Code Generation report.

The report finds that AI-generated code contains significantly more defects of logic, maintainability, security, and performance than code created by people.

On average, AI-generated pull requests (PRs) include about 10.83 issues each, compared with 6.45 issues in human-generated PRs. That's about 1.7x more when AI is involved, meaning longer code reviews and increased risk of defects.

Problems caused by AI-generated PRs also tend to be more severe than human-made messes. AI-authored PRs contain 1.4x more critical issues and 1.7x more major issues on average than human-written PRs, the report says.

Machine-generated code therefore seems to require reviewers to deal with a large volume of issues that are more severe than those present in human-generated code.

These findings echo a report issued last month by Cortex, maker of an AI developer portal. The company's Engineering in the Age of AI: 2026 Benchmark Report [PDF] found that PRs per author increased 20 percent year-over-year even as incidents per pull request increased by 23.5 percent, and change failure rates rose around 30 percent.

The CodeRabbit report found that AI-generated code falls short of meatbag-made code across the major issue categories. The bots created more logic and correctness errors (1.75x), more code quality and maintainability errors (1.64x), more security findings (1.57x), and more performance issues (1.42x).

In terms of specific security concerns, AI-generated code was 1.88x more likely to introduce improper password handling, 1.91x more likely to make insecure object references, 2.74x more likely to add XSS vulnerabilities, and 1.82x more likely to implement insecure deserialization than human devs.

One area where AI outshone people was spelling – spelling errors were 1.76x more common in human PRs than machine-generated ones. Also, human-authored code had 1.32x more testability issues than AI stuff.

"These findings reinforce what many engineering teams have sensed throughout 2025," said David Loker, director of AI at CodeRabbit, in a statement. "AI coding tools dramatically increase output, but they also introduce predictable, measurable weaknesses that organizations must actively mitigate."

CodeRabbit cautions that its methodology has limitations, such as its inability to be certain that PRs labeled as human-authored actually were exclusively authored by humans.

Other studies based on different data have come to different conclusions.

For example, an August 2025 paper by University of Naples researchers, "Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity," found that AI-generated Python and Java code "is generally simpler and more repetitive, yet more prone to unused constructs and hardcoded debugging, while human-written code exhibits greater structural complexity and a higher concentration of maintainability issues."

Back in January 2025, researchers from Monash University (Australia) and University of Otago (New Zealand) published a paper titled "Comparing Human and LLM Generated Code: The Jury is Still Out!"

"Our results show that although GPT-4 is capable of producing coding solutions, it frequently produces more complex code that may need more reworking to ensure maintainability," the southern hemisphere boffins wrote. "On the contrary, however, our outcomes show that a higher number of test cases passed for code generated by GPT-4 across a range of tasks than code that was generated by humans."

As to the impact of AI tools on developer productivity, researchers from Model Evaluation & Threat Research (METR) reported in July that "AI tooling slowed developers down."

Your mileage may vary.

We note that Microsoft patched 1,139 CVEs in 2025, according to Trend Micro researcher Dustin Childs, who claims that's the second-largest year for CVEs by volume after 2020.

Microsoft says 30 percent of code in certain repos was written by AI and Copilot Actions comes with a caution about "the security implications of enabling an agent on your computer."

"As Microsoft's portfolio continues to increase and as AI bugs become more prevalent, this number is likely to go higher in 2026," Childs wrote in his post.

But at least we can expect fewer typos in code comments. ®

Source: The register

Home

AI-authored code contains worse bugs than software crafted by humans