
The boardroom presentation looked flawless. Brimming with confident predictions and projections, the promise was crystal clear: a transformative Gen AI pilot would revolutionise everything from customer service to supply chain management.
Six months – and several million pounds – later, the same executives meet around the same table to confront an uncomfortable truth: nothing has changed. No revenue acceleration. No cost reduction. No measurable return whatsoever.
Welcome to the reality of enterprise AI.
In July 2025, MIT published a report containing one statistic that should make every C-suite executive pause for thought: ‘95% of enterprise Gen AI pilots are delivering no measurable ROI’. Only 5% are achieving any revenue acceleration.
Let that sink in for a moment. We are not talking about ‘underwhelming returns ‘or ‘slower than expected progress’. This is a damning, virtually total, failure.
This revelation is so serious that we asked James Duez – founder of Rainbird AI – to make it the keynote topic for our latest ‘C’ suite briefing. In a 30-minute masterclass he forensically unpacked the reasons why this catastrophic failure isn’t just shocking, it’s utterly predictable. Indeed, the way most organisations approach AI technology makes it almost inevitable. His analysis cut through the hype to reveal three fundamental attributes that high-stakes applications absolutely require… three vital elements that generative AI cannot deliver…
The first is precision. Whether you are making mortgage-decisions… processing insurance claims… or identifying sanctions in banking transactions, a business must have systems that reflect its rules, regulations and professional integrity. Gen AI excels at generating plausible-sounding text. But don’t confuse that with nuanced precision. Gen AI is spectacularly bad at understanding and upholding the core values and ethical imperatives of your business.
The second attribute is determinism. If you feed an AI system the same case twice, it should produce the same answer. This seems embarrassingly obvious, yet both human decision-makers and machine learning systems struggle with consistency. Gen AI systems are probabilistic by design. They are built to vary their outputs… and that’s precisely what you don’t want when making consequential business decisions.
The third is auditability. Not the superficial kind where a system points to citations or sources, but genuine transparency about the logical steps taken to reach a robust conclusion. When regulators come knocking, the glib, reflex response – “Oh, the algorithm said so” – simply won’t cut it. You need to demonstrate why a specific decision was made… based on which rules… applied in what sequence.
These three attributes – precision, determinism, and auditability – are why experiments don’t make it to production. They also explain why external vendors and platforms have proven twice as likely to succeed compared to homegrown projects. Organisations building from scratch are discovering that creating production-ready AI is vastly more complex than the proof-of-concept first suggested.
Retrieval Augmented Generation emerged as the great hope for extracting value from Gen AI. The logic seemed sound: overcome the limitations of large language models by giving them access to your institutional knowledge through vector databases. Point the AI at your documents, and suddenly it would have all the context it needed.
Reality has been rather less cooperative.
Research now demonstrates that RAG is fundamentally vulnerable to misleading or conflicting evidence. In some benchmark tests, misleading retrieval actually produces worse performance than simply using a zero-shot LLM without any retrieval at all. You’d be better off with nothing than with RAG feeding your AI with contradictory information.
The architecture also faces multiple failure modes. There can be conflicts between the LLM’s internal training knowledge and externally retrieved content. This causes the system to confuse or incorrectly blend information. Retrieval misses and incomplete context are common. Performance also degrades over time, particularly as vector databases scale. One Rainbird customer attempted to build a RAG database for 20 million documents only to discover that the bigger the vector database grew, the worse the degradation became.
And, of course, you have to consider the security implications. Attackers can manipulate external sources to poison RAG systems by injecting misinformation. Vector databases then become threat vectors requiring careful access control. Even with curated datasets, RAG can produce retrievals based on bias in the training data. Alarmingly, these can corrupt accuracy in ways that are not immediately visible.
Here’s the part that most people miss entirely: when you tell an LLM to follow your rule book and not to hallucinate, you’re not actually giving it an instruction in the traditional programming sense. You are simply providing input tokens that can influence output tokens. Your institutional knowledge isn’t a first-class citizen in the tech stack. It’s merely context that can be forgotten, misinterpreted, or overwhelmed by other inputs.
This is why Duez rejects the term “prompt engineering” – preferring instead to call it “prompt wrangling”. Why? Because there’s no engineering discipline at work. Within regulated industries, around 25% of AI use cases deliver business-critical outcomes – consequences where error is intolerable. Prompting simply isn’t acceptable in situations where precision is paramount.
Think what this means for your organisation. Every compliance rule, every business process, every regulatory requirement that you have carefully documented over decades, becomes nothing more than suggestive context that your AI might – or might not – consider relevant. You’re not encoding knowledge; you’re hoping the probabilistic magic of large language models will interpret your intentions correctly.
This is a hugely high risk strategy.
Market sentiment tells its own story. Recent AI conversations with around 500 CEOs and CFOs revealed a sobering breakdown: 70% are scared, 25% are clueless, 4% like the idea but aren’t sure what it is, and only 1% are genuinely ready to move forward.
Public confidence is eroding even faster. A poll of nearly 4,000 tech-savvy adults found that 38% view AI primarily as an economic risk while only 20% see it as an opportunity. Lack of trust in AI outputs is cited as the single biggest barrier to adoption.
This isn’t the usual technology adoption curve playing out. This is a crisis of credibility born from overselling capabilities that don’t exist.
Understandably, boards are becoming increasingly averse to signing off on AI expenditure that delivers no value. They want tighter cost controls. Focus is shifting from a somewhat gullible belief in proof-of-concepts to a hardline mindset that demands governance and explainability from the start. Investors now accept that simply throwing more compute at the problem won’t achieve artificial general intelligence.
The answer begins with intellectual honesty about what different AI technologies can – and even more importantly – cannot deliver.
First, establish your objectives. Start with clear business needs rather than asking what a particular technology might do for your business. Involve domain experts early. Ensure projects are business-led rather than IT-driven. Focus on high-impact use cases, particularly in back-office functions where initial ROI tends to be strongest. Any project worth doing should deliver a 10x return on investment – quite frankly, anything less isn’t worth the organisational disruption.
Second, build in rigour. Treat your institutional knowledge as a first-class citizen in the tech stack if you want precision. Build-in determinism and auditability from the start, not as afterthoughts. Don’t experiment unless you have a clear path to production. Continuously measure ROI rather than assume that a successful POC guarantees production success.
Third, know your limitations. Critically, don’t rely on humans as guardrails for Gen AI outputs. Recently, Deloitte came under fire for submitting a document to the Australian government that was littered with AI-generated errors. But this was much more than a costly, brand-damaging lapse of judgement. It’s a wake-up call for all organisations that use AI tools without proper safeguards. As Bryan Lapidus – FP&A Practice director for the Association for Financial Professional – observes: “This situation underscores a critical lesson for all finance professionals… AI isn’t a truth-teller; it’s a tool meant to provide answers that fit your questions”.
Four, watch your bottom-line. Don’t over-invest in Gen AI outputs while under-investing in deterministic systems. And do your arithmetic – the total cost of AI reaches far beyond the licensing fees. Importantly, symbolic systems – such as Rainbird – don’t consume vast amounts of compute. One use case would cost over a million dollars per run with an LLM, while Rainbird can handle it essentially for free because it’s not burning massive amounts of energy to reason over logical knowledge models.
The emerging solution isn’t a binary choice between Generative AI and Symbolic AI. It’s all about understanding where each belongs in your architecture.
Gen AI excels at finding insights in data, a capability that’s been delivering multi-trillion dollar value for 25 years. It’s outstanding at natural language processing, rich interaction, and identifying patterns that humans often miss. But it’s fundamentally the wrong tool for decisioning, particularly in high-stakes regulated environments.
Organisations need hybrid systems that combine the benefits of neural networks and LLMs with the ability to properly represent organisational knowledge – with zero hallucinations, 100% determinism, and full auditability. There is also growing interest in incorporating world models, knowledge graphs, and knowledge representation to create truly ‘clever’ systems (Symbolic AI). Ones that can reason and deliver solutions that really matter to your business.
The challenge is that AI is often treated as a single monolithic entity rather than a broad category of separate technologies – each with different utilities, strengths, and weaknesses. Even people with PhDs in data science can have narrow expertise in one lane without any understanding the broader landscape.
Success requires executive investment in properly understanding the landscape. CEOs, CFOs, COOs, and CIOs should never delegate this to IT departments – they need to understand the risks, opportunities, and different technology categories for themselves.
This isn’t optional. The organisations that treat AI strategy as a technical implementation detail rather than a fundamental business decision will join the 95% delivering zero ROI. Those that develop genuine strategic literacy about different AI approaches, their appropriate applications, and their limitations will be positioned to leapfrog competitors still trapped in the proof-of-concept graveyard.
We are witnessing a long-overdue correction in enterprise AI. The hype cycle that promised to transform everything overnight is colliding with the messy reality of production systems, regulatory compliance, and organisational complexity.
The good news is that AI – when properly understood and appropriately applied – can deliver extraordinary value. The 5% of projects that are working prove the potential is very real. The 95% failure rate proves that getting there requires intellectual rigour, strategic clarity, and honest assessment of what different technologies can actually deliver.
The question is, does your organisation have the candour, confidence and courage to buck the trend?

Ian Spencer is a founding partner of Clustre, The Solution Brokers.
Rainbird offers hybrid AI solutions that combine neural networks and large language models with symbolic reasoning. It delivers precision, determinism, and auditability for high-stakes decision-making.
Rainbird’s approach treats institutional knowledge as a first-class citizen in the tech stack. This enables organisations to achieve the benefits of AI without the catastrophic failure modes of most pure Gen AI implementations.
Proof-of-concept is simply not enough – organisations must build AI production systems that deliver genuine, measurable ROI. And this is Rainbird’s forte.
To learn more about Rainbird and for a private introduction to these AI thought-leaders, simply contact Robert Baldock: robert.baldock@clustre.net