Do LLMs have a core problem?

How Apple’s latest research polarised opinion and triggered a radical shift in thinking.

The champagne corks that popped across Silicon Valley when ChatGPT launched have long been consigned to the trash can. In boardrooms from Cupertino to Copenhagen, a more sobering conversation is now taking place. One that questions whether the $100 billion already invested in large language models (LLMs) is delivering true value… or whether expectations have dangerously exceeded their actual capabilities.

Apple’s bombshell research paper, ‘The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models’, has done what few industry observers thought possible. It has made the AI sector pause and question its own gospel. Because the findings are stark enough to make any executive reconsider their AI strategy. When presented with classic puzzles, such as the Tower of Hanoi, the most sophisticated reasoning models didn’t just struggle – they collapsed. They delivered zero correct answers as complexity increased.

Make no mistake, this is not a failure by one company or one model. It’s a fundamental limitation that reveals itself when LLMs are pushed beyond their designed capabilities. The danger lies not in the technology itself, but in the unrealistic expectations that have been built around these systems.

The $100 billion question

The implications stretch far beyond technical specifications. As Yann LeCun, a Turing Award winner and one of AI’s founding fathers, bluntly declared: the core technology behind ChatGPT is a “dead end” for human-level AI. His advice to students is equally terse. Don’t work on LLMs if your goal is artificial general intelligence (AGI). Current LLMs excel at pattern recognition and language generation but struggle with logical reasoning, long-term memory, and strategic thinking. They lack the appropriate schemas for such tasks.

This doesn’t diminish their considerable value in text-heavy applications – from content creation to code assistance to customer support – where they demonstrably excel. The problem arises when these tools are positioned as universal solutions rather than powerful instruments for specific use cases.

So here’s the trillion-dollar tension…

Approximately $100 billion of investment is tied up in the LLM approach. This creates enormous pressure to continue down a path that is mired in controversy and conflict. Leading researchers increasingly question its self-determining value and validity. However, Geoffrey Hinton – and other AI luminaries – still believe that scaling up LLMs could achieve AGI.

For senior executives, this schism is more than mere academic debate. It’s a strategic crossroads that will determine which companies harness AI’s genuine strengths effectively… and which become costly cautionary tales of excessive expectations colliding with narrow technological capabilities.

The Agentic AI mirage

Agentic AI – multiple autonomous systems collaborating in chains – has promised to solve LLMs’ limitations. And the feisty marketing messages look pretty compelling. Agentic AI systems, it is claimed, will produce extensive reports… conduct complex analyses… and seemingly think through problems using step-by-step logic.

But when you scratch beneath the surface, the reality is far from picture perfect…

Gartner’s prediction that 40% of Agentic AI projects will fail isn’t pessimistic. It’s extremely realistic. Much of what’s marketed as ‘Agentic AI’ is simply chatbots with new branding. They lack genuine adaptive capabilities. The fundamental issues of hallucination and error cascading remain – often amplified as mistakes compound through multiple model interactions.

This reality check is uncomfortable but necessary: impressive demonstrations don’t equal reliable business solutions for every application. Many so-called AI agents are sophisticated demos that cannot adapt or make nuanced decisions when faced with real-world complexity beyond their training scope.

The neurosymbolic alternative

While the industry debates the future of LLMs, a quieter revolution is taking place. Companies like Rainbird AI and OpenHorizon are pursuing neurosymbolic approaches. These combine the pattern recognition of neural networks with symbolic logic and reasoning. This isn’t just theoretical – it’s actually delivering results where pure LLM approaches fail:

Precision and consistency: Deterministic outputs for identical inputs, eliminate the unpredictability that makes LLMs unsuitable for critical applications.

Complete explainability: Every decision comes with a full chain of reasoning, essential for regulated industries and high-stakes decision-making.

Zero tolerance for error: Unlike LLMs that generate plausible-sounding but potentially false information, these systems can be designed with error-free operation in specific domains.

Domain specificity: Highly effective in narrow, well-defined areas where businesses actually operate, rather than attempt to be everything to everyone.

The intelligence approach in practice

OpenHorizon – founded three weeks after the Russian invasion of Ukraine by former senior officers from Norway’s intelligence services – demonstrates how this approach delivers value. Their ‘digital twin of the global threat landscape’ draws on extensive knowledge graphs. These contain hundreds of thousands of nodes representing threat actors, capabilities, and intentions. Rather than gambling on an LLM hallucinating accurate threat assessments, they have built a system that:

  • Segregates AI models from the internet to prevent contamination
  • Uses only vetted, human-curated knowledge graphs
  • Implements frameworks, like the Admiralty Code, for scoring source credibility
  • Employs subject matter experts to validate outputs
  • Uses Bayesian inference to bring transparency to uncertainty

The results are truly impressive. Twelve-month threat assessments that comply with strict regulatory frameworks while making analysts ten times more productive. This isn’t about replacing human expertise – it’s all about amplifying it intelligently.

The human element remains critical

Perhaps the most important insight from this AI reality check is the irreplaceable role of human expertise. The snag, of course, is that humans are also unreliable. Even as we critique AI’s limitations, research shows that highly skilled professionals exhibit 40-60% variance in decision quality. Humans are far from perfect decision-makers – particularly when dealing with complex, high-volume tasks.

The solution isn’t to renounce AI but to position it correctly: as an augmentation tool rather than a replacement for human judgement. The most successful implementations maintain precise human oversight while leveraging AI’s strengths in pattern recognition and rapid information processing.

Strategic implications for leaders

The future lies not in AI pessimism, nor in unbridled optimism, but in practical realism. And five principles will be key to achieving this grounded peace of mind:

Abandon AGI fantasies: Focus on practical, domain-specific AI solutions rather than pursuing general-purpose AI that may never materialise.

Embrace hybrid approaches: Combine the strengths of different AI technologies rather than betting everything on LLMs.

Implement proper governance: Establish frameworks for measuring and validating AI outputs before they influence business decisions.

Maintain human oversight: Ensure subject matter experts remain integral to AI-driven processes, particularly in high-stakes environments.

Accept limitations strategically: Use AI where ‘probably correct’ is acceptable – such as content generation, initial research, or creative assistance – but implement deterministic systems where precision is crucial.

Navigating the ‘trough of disillusionment’

AI appears to be entering what Gartner calls the ‘trough of disillusionment’ – the post-coital melancholy after the romantic tryst. This isn’t a failure of technology; it’s a natural part of innovation adoption.

The reassuring message from Apple’s research – and from the broader AI community – is that LLMs do deliver value. They have revolutionised how we interact with text, generate content, and process language. But LLMs also have serious and very finite limitations. They do not have the cognisant power to rationalise or reason. So viewing LLMs – or any single platform – as a panacea solution creates dangerous blind spots

In an industry hooked on hype and hyperbole, the most radical position happens to be the most conservative ‘horses for courses’ strategy. Never believe the tipsters. And never back a Brighton beach donkey to win the Grand National – you will lose more than your shirt.

Ian Spencer is a founding partner of Clustre, The Solution Brokers. Our special thanks to James Duez of Rainbird AI and to Rein-Owe Flister of OpenHorizon for their inspirational contribution to this article. If you would like to discuss any of the thoughts and messages in this article, James and Rein-Owe would be happy to help.

Contact: Robert.baldock@clustre.net

MORE INFO
FOLLOW
IN TOUCH
© 2026 Clustre, The Solution Brokers All rights reserved.
  • This field is for validation purposes and should be left unchanged.
  • We will use the data you submit to fulfil your request. Privacy Policy.