Just an Illusion of Thinking? Why We Still Don’t Have Real AI Reasoning.

8. Juni

A new paper from Apple researchers takes a deep dive into so-called Large Reasoning Models (like Claude, DeepSeek, or OpenAI’s o-models) — AI systems that promise to “think” more deeply by generating step-by-step reasoning traces.

What they found is sobering: despite the appearance of structured reasoning, these models collapse completely when tasks become more complex — even when they’re given plenty of resources and guidance.

The researchers used puzzle-based test environments (like Tower of Hanoi or River Crossing) that systematically vary complexity. These allow not just evaluation of the final answer, but also a detailed look at the actual thought process.

What were the findings?

At low complexity, standard LLMs (no added “thinking”) actually perform better.
At medium complexity, reasoning models can outperform — but only slightly, and at great cost (many tokens).
At high complexity, all models fail — they stop reasoning effectively, even though they have all resources / tokens available.

Figure 4: Accuracy of thinking models (Claude 3.7 Sonnet with thinking, DeepSeek-R1) versus their non-thinking counterparts (Claude 3.7 Sonnet, DeepSeek-V3) across all puzzle environments and varying levels of problem complexity.

Even worse: when given the correct algorithm, models still couldn’t execute it reliably. This doesn’t just reflect a reasoning failure in the usual sense, but seems to reveal a more basic limitation in following logic and even might be a fundamental limitation of current AI reasoning models.

What does this mean?

We are much further from real AGI than the current hype suggests. What looks like “thinking” is often just verbose pattern matching that breaks down under complexity.
We should shift focus. Rather than speculating about what future AI might achieve, we should better understand — and fully leverage — what today’s systems can already do. Despite their limitations, current models are still massively underutilized in practice.

If you’re building with AI, this is essential reading: real breakthroughs likely won’t come from simply scaling up resources — because the system’s capabilities don’t scale linearly with them. To truly scale AI ROI, stop just optimizing your existing workflows. Instead, fundamentally rethink how you work to get the most out of today’s AI systems.

aienglishweblog

Björn Ognibeni https://www.ognibeni.de

Just an Illusion of Thinking? Why We Still Don’t Have Real AI Reasoning.

This Chinese toy just outearned Mattel + Hasbro combined – and it's China's newest soft power weapon.

Move Over Starbucks: How CHAGEE Is Reinventing Tea to Become China’s Next Global Icon.