Apple Debunks AI Reasoning Hype: Models Memorise, Don't Think, Study Reveals

Apple claimed that reasoning models like Claude, DeepSeek-R1, and o3-mini do not reason at all, instead, they just memorise patterns.

Edited by: Abhinav Singh
Feature
Jun 09, 2025 07:35 am IST
- Published On Jun 09, 2025 07:35 am IST
- Last Updated On Jun 09, 2025 07:35 am IST

Read Time: 3 mins

Twitter
WhatsApp
Facebook
Reddit
Email

Apple Debunks AI Reasoning Hype: Models Memorise, Don't Think, Study Reveals

Once patterns become too complex, the reasoning models fall away, Apple said.

Quick Read

Summary is AI generated, newsroom reviewed.

Apple's study questions the intelligence of current AI reasoning models.

The research indicates these models excel at pattern recognition but struggle with complexity.

Reasoning performance declines significantly as puzzle complexity increases, per the findings.

Apple has claimed that new-age artificial intelligence (AI) reasoning models might not be as smart as they have been made out to be. In a study titled, The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, the tech giant claimed that reasoning models like Claude, DeepSeek-R1, and o3-mini do not actually reason at all.

Apple claimed that these models simply memorise patterns really well, but when the questions are altered or the complexity increased, they collapse altogether. In simple terms, the models work great when they are able to match patterns, but once patterns become too complex, they fall away.

"Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities," the study highlighted.

"Moreover, they exhibit a counterintuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget," it added.

For the study, the researchers flipped the script on the type of questions that reasoning models usually answer. Instead of the same old math tests, the models were presented with cleverly constructed puzzle games such as Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World.

Each puzzle had simple, well-defined rules, and as the complexity was increased (like more disks, more blocks, more actors), the models needed to plan deeper and reason longer. The findings revealed three regimes.

Low complexity: Regular models actually win.
Medium complexity: Thinking models show some advantage.
High complexity: Everything breaks down completely.

Also Read | US Researcher Proposes Detonating Massive Nuclear Bomb Under Ocean To Save Earth

AGI not as near as predicted?

Apple reasoned that if the reasoning models were truly 'reasoning', they would be able to get better with more computing power and clear instructions. However, they started hitting walls and gave up, even when provided solutions.

"When we provided the solution algorithm for the Tower of Hanoi to the models, their performance on this puzzle did not improve," the study stated, adding: "Moreover, investigating the first failure move of the models revealed surprising behaviours. For instance, they could perform up to 100 correct moves in the Tower of Hanoi but fail to provide more than 5 correct moves in the River Crossing puzzle."

With talks surrounding human-level AI, popularly referred to as Artificial General Intelligence (AGI), arriving as early as 2030, Apple's study suggests that it might not be the case, and we might be some distance away from sentient technology.

Show full article

Track Latest News Live on NDTV.com and get news updates from India and around the world

Apple, Reasoning Models, Artificial Intelligence

Committed To Closer Ties With India, Says Justin Trudeau Amid Row

In Avoiding Repeat Of Telangana, BJP Pays Price In Tamil Nadu

Man Complains Of Stomach Pain For Years, Doctors Find This Inside His Body

"They Can Speak For...": US On India's Response On Canada's Allegations

Apple Debunks AI Reasoning Hype: Models Memorise, Don't Think, Study Reveals

Apple claimed that reasoning models like Claude, DeepSeek-R1, and o3-mini do not reason at all, instead, they just memorise patterns.

AGI not as near as predicted?