OpenAI Identifies Why AI Chatbots Hallucinate — and the Fix Could Change the Game

OpenAI Identifies Why AI Chatbots Hallucinate — and the Fix Could Change the Game
X

OpenAI reveals chatbot hallucinations stem from “bluffing” during training, proposing a fix by changing evaluation methods to reward uncertainty.

If you’ve ever asked an AI a simple question and received a confidently wrong response — like recommending glue on pizza — you’ve witnessed what researchers call a “hallucination.” From OpenAI’s GPT-5 to Anthropic’s Claude, nearly every large language model (LLM) has been guilty of such mistakes. Now, OpenAI says it has finally figured out why.

In a newly published paper, the company explains that these errors are not caused by forgetfulness or randomness. Instead, chatbots hallucinate because they’ve been trained to bluff.

Why Chatbots Pretend to Know Answers

According to the report, models are not programmed to lie, but they are indirectly rewarded for guessing. As OpenAI explains, “Hallucinations persist due to the way most evaluations are graded, language models are optimised to be good test-takers, and guessing when uncertain improves test performance.”

Think of it like an exam. Students who don’t know the answer often guess, hoping for a lucky mark. Chatbots, it turns out, are doing the same. They’re locked in “permanent exam mode,” where silence is treated as failure and confident guessing appears clever.

As the researchers put it, “Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks. On the other hand, language models are primarily evaluated using exams that penalise uncertainty.”

Confidence vs. Caution in AI

The outcome? AI systems that sound absolutely certain — even when they’re completely wrong.

Some companies have tried to counter this. In a blog post last month, OpenAI admitted that Anthropic’s Claude models behave differently: they are “more aware of their uncertainty and often avoid making statements that are inaccurate.”

While this cautious approach seems promising, it comes with a trade-off. OpenAI pointed out that Claude often refuses to answer altogether, which “risks limiting its utility.” In short, it may be polite, but not always practical.

The Fix: Rethinking AI Evaluations

So how do we prevent AI from bluffing like overconfident quiz contestants? OpenAI believes the solution lies in changing evaluation methods, not the models themselves.

The researchers argue, “The root problem is the abundance of evaluations that are not aligned. The numerous primary evaluations must be adjusted to stop penalising abstentions when uncertain.”

In its blog post, OpenAI elaborated further: “The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing. If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess.”

Building AI with Humility

This proposed shift may seem subtle, but it represents a major change in AI development. For years, companies have raced to make chatbots faster, sharper, and more articulate. But those qualities don’t necessarily guarantee trustworthiness.

Instead, the bigger challenge is creating systems that can balance knowledge with humility — a skill humans often acquire after making mistakes in the real world.

By adjusting evaluation methods, OpenAI hopes to produce models that prioritize reliability over bravado. After all, whether it’s medical guidance or financial advice, no one wants a chatbot delivering a confident hallucination as if it were gospel truth.

It might not be as flashy as launching a brand-new model, but OpenAI’s push to stop AI bluffing could be one of the most meaningful reforms the industry has seen.

Next Story
    Share it