Knocking down the 10 biggest data myths | Clustre, The Innovation Brokers

Big data is one the most touted and talked about topics in technology. And the statistics are mind-blowing…

Over 90% of all data in the world has been created in the last 2 years.

2,000,000,000,000,000,000 bytes of data are now generated every day across all industries.

Every one of us generates 1.7MB of data per second… that’s 147GB per day!

We’re not talking data lakes here; these are data oceans. And some organisations are putting this deep data to very good use. Netflix, for example, uses data insights to influence 80% of the content viewed by its subscribers. That prescience and precision has just helped Netflix to double its year-on-year profits.

But here’s the sobering reality. Netflix is the exception, not the rule. Recent research reveals that most organisations struggle to extract value from their data…

74% of companies still report deep challenges in adopting Big Data initiatives.
Most companies analyse only 12% of the data they possess. 88% is totally wasted.
Only 26% of companies have created a data-driven culture – 73% are failing to build an information-rich business.

Source: Sigma Computing – Top 20 Big Data Statistics

Scratch the surface and the golden prospects offered by Big Data soon turn to the base metal of empty promises. The alchemy simply isn’t working. But why are so many companies missing out? And what can we do to mend this broken data dream?

The answer, I firmly believe, is to be found in a series of myths that have grown up around big data. Ten big myths that I want to explore – and explode – in this article. So, let’s get that ball rolling…

Myth No.1: Our data is not clean enough

Well, it’s undeniably true that not all data is instantly useful. Indeed, one leading data expert claims that only 37% of data is ‘really useful’. But don’t be fooled by this modest statistic… the remaining 63% is certainly not worthless. It genuinely does have value. And this holds especially true when looking for patterns in human behaviour rather than specific numerical insights.

For example, researchers of the property market are clearly interested in price points across all geographies. But other statistics – such as the frequency with which people move home – are also of serious interest. When AI is used to crunch these stats, some very revealing insights begin to emerge about the buoyancy of housing, mortgage, home improvement and a host of interdependent markets.

The message is clear. Data doesn’t have to be clean and pristine to be both valid and valuable.

Myth No.2: We don’t have enough data

In a world that’s drowning in data, this myth is deeply puzzling. What precisely do people mean when they say: “we just don’t have enough data?”

Well, for some people, it’s a genuine expression of frustration. They simply cannot locate the data they think they need inside their data lakes. For others, it’s an honest belief that they cannot find enough data – it’s simply too small and insignificant to be of any real merit.

Both feelings are perfectly understandable. Both are also entirely misguided. Let me explain why with a true story about small – very small – data…

Railway engines are very costly and complex assets. Typically, they are owned by finance companies who lease them to franchise operators.

Once an engine is ‘passed down the line’, owners lose sight of their £multi-million asset. They don’t know where, when or how frequently it’s operating. They don’t know whether maintenance programmes are being honoured or whether their depreciation models are even remotely accurate. In short, they know next to nothing. And that’s why one of Clustre’s expert teams was commissioned to track down the answers.

The full and fascinating account is detailed in our new ‘Big Data Point of View’ (free to download at the end of this article). But to cut a longer story short, this team used the latest machine-learning techniques to deliver outstanding insights from truly microscopic datasets…

Every five minutes, a train’s fuel-tank sensors and GPS ‘ping’ fuel level and train location to a central data store. After merging this data with timetable data available from Network Rail, a very credible and revealing picture of train usage emerges. The team were able to deduce stationary periods for the train to be recommissioned or refueled… running periods between stops… the frequency of stations… the distance between these stations… and even the probable route and type of service.

Five years ago, the dearth of data would have derailed all attempts to find the truth. Working with only a small amount of data per engine, our team generated insights that would be impossible to establish manually. Even on microscopic datasets, machine-learning techniques outstrip and outperform anything that humans can deliver. Small data really does matter.

Myth number 3: The only data we can trust is our own

Can I trust my data? This nagging doubt is not confined to business data alone. Let me explain…

As a young Physics student, I became worried that one of my experiments could have been compromised by poor data or a wrong methodology. Sharing this concern with my lecturer, he offered me this valuable nugget of advice – it’s been my touchstone ever since…

“If you ever doubt the integrity of your experiment, look for a totally different method and data source to verify it. If the new approach and dataset delivers the same – or close to the same – results, there’s a bloody good chance your first answer was right.”

Wise counsel that can be equally applied to data.

Myth number 4: Data is too difficult to get hold of

Considering the insane amount of data that we all generate, this is another seemingly incomprehensible myth… but it’s real. The next case study explains why…

A major global bank recently approached one of our Clustre firms with an innovation request:

‘We need a robust approach to experimentation. We need to develop a process for taking a problem or challenge to a working prototype for a pressing business case in just four weeks.’

Our team rose to the challenge. Within a month, they had delivered the approach, trained an experimentation team within the bank and were casting around for a suitable data trial. And that’s when they hit a brick wall. It would take the bank four months to find the test data!

This is not abnormal. But it doesn’t have to be a terminal, game-over obstacle. Our team’s answer was to create a sandbox in the cloud which they filled with anonymised data. By-passing formal channels, data could then be easily and swiftly accessed. Within four weeks they had delivered a proven process for sandbox experimentation in the cloud.

Myth No.5: The data’s old and out of date

It always is!

Almost by definition, data is out of date from the moment it is created – but that doesn’t tarnish or devalue it. Clearly, data such as current addresses, employment and medical records need to be checked meticulously for contemporary accuracy. However, most data is not time sensitive. For example, a significant amount of data is now used to map consumer behaviours and plot market trends such as: how often people move home… when and where are they most likely to move… what sort of home are they likely to buy etc.

These are all critical consumer insights, but they are drawn from data that is often weeks, months or even years old.

Myth No.6: We can only use structured data

Any company that clings to this myth, is short-changing itself. And there’s one simple, irrefutable truth to this statement… 80% of the data we now generate is unstructured.

If you are serious about data, you simply have to accept and embrace the fact that most of it is unstructured. With such a mass of rich data, can you really afford to miss out?

(As an interesting footnote, there is a great case study about unstructured data and how to exploit it, in our new Big Data Point-of-View. Download your free copy at the end of this article).

Myth No.7: The data is historical, so what’s the point?

This may appear to be returning to a previous myth, but there is a very valid reason. I want to share a story about the ubiquitous power of AI and historic data to enrich our lives. Imagine this scenario…

A man returns home from work and is met at the door by his wife. She throws her arms around him in a genuine rush of affection and announces that she is preparing his favourite ‘surf and turf’ dinner.

The husband is both delighted and mystified. That very morning, he had left home after a rare door-slamming argument with his wife. So, he had not been expecting the warmest of welcomes.

He wanders into the lounge and there, dominating the centre of the room, is the most beautiful display of flowers. Beside the lavish arrangement is a greetings card bearing the inscription: ‘I love you’.

His confusion is now tinged with a new edge of anxiety. The man closes the door and whispers: “Alexa, what the hell is happening here?”. Alexa’s reply is revealing…

The sound of the earlier heated exchange was picked up by Alexa. After the only previous altercation, the husband had instructed Alexa to select a florist and to deliver a bouquet together with a simple but affectionate message. So, Alexa seized the initiative and repeated the instructions… checking the wife’s diary first to ensure that the flowers were delivered at precisely the right moment.

This charming story is based upon authentic events. AI-empowered lifestyle devices are proactively exploiting data to enrich our lives. Humans are creatures of habit, so machines are programmed to interpret received intelligence, interrogate historical archives and anticipate our future needs. This is not some fanciful vision of tomorrow’s world – it’s happening, here and now.

Myth No.8: We have so much data, we don’t know where to start

After the myth about insufficient data, we now shift to the polar extreme; an excess of data.

Some companies really are confused by the abundance of information that is now available – hence the phrase “we are drowning in data”. But instead of looking at their data and wondering how they might use it, they should be looking at their customers and asking themselves “how can we serve them better?”.

first direct has been voted the top company in the UK for delivering high quality customer service (The Institute of Customer Service). It is a well-deserved reward for a brand that has consistently broken the mould in banking.

But success is no lucky accident. This bank has a very clear idea of where it’s going and how it will get there. And it’s all predicated on a very radical vision: ‘People don’t want banks – they want a financial lifestyle service built around their individual needs. A service that removes friction and reduces money stress’.

While most banks tinker around the edges by offering better interest rates or cheaper mortgages, first direct has confronted the big issue head-on…

People want a stress-free financial life. To deliver this vision, the bank is digitising its business to drive game-changing levels of service. It is crunching acres of data to better understand customer pain-points and so anticipate their true needs. But make no mistake, without a clear focus on customer service, first direct would be just another bank struggling – and failing – to understand its customers.

Myth No.9: We are already fully utilising our data

Congratulations – you are a member of a very exclusive and elite global club. I look upon Google, Facebook, Amazon, eBay and Netflix (who use data to save $1 billion a year on customer retention) as five of the greatest data-exploiting exemplars.

But that’s about as far as it goes. Because beyond this quintet there’s a huge gulf in data optimisation – let me explain what makes them so very special…

Complacency plays no part in their thinking. These prime movers are never content with their performance. They are constantly striving for improvement. And this is a lesson that all senior managers should take to heart…

Anyone who believes they have ‘cracked’ the challenge and are fully utilising their data resources is dangerously delusional. You will never encounter this attitude among the five leading exemplars and that is why they remain at the top of their game.

Myth No.10: Our data is not having any effect

This is the ultimate delusional myth. When all else fails, blame the data. It is shamefully misguided…

Data will have a profound impact on your business – but only if three factors converge at precisely the right moment:

Use the right data
At the right time
And present it in the right way.

Here is a tale of two very different outcomes…

Ignaz Semmelweis was a pioneering 19th-century Hungarian doctor and scientist. In 1847, while working at the Vienna General Hospital, he noticed that the incidence of ‘Childbed Fever’ – a frequently fatal sickness – was three times higher in the maternity wards looked after by junior doctors than by midwives.

When one of these doctors accidentally cut himself during a dissection and died from his minor injury, Ignaz wondered whether other junior doctors might also be carrying infection from cadavers into the maternity ward. The simple remedy of persuading doctors to wash their hands with chlorine and soap before entering the ward, had an instant and dramatic impact. Cases of infection plummeted and, by rights, Ignaz Semmelweis should have been hailed as a hero. Unfortunately, in an age that still practised purging and bloodletting as principal cure-alls, his findings were widely dismissed as ‘spurious’ aberrations…

Stung by such criticism, he angrily attacked the short-sightedness of his fellow professionals. He published a book full of complex tables that served only to confuse and antagonise the wider medical community. Ignaz was fighting a losing battle. He had all the right data at exactly the right time – but his data presentation was woefully inept. Tragically, the stress affected his health. He suffered a mental breakdown, was consigned to an asylum and died from sepsis-infected wounds inflicted by his guards.

Now contrast those events with a very different experience…

Florence Nightingale also fought a bitter battle with the medical establishment. She, too, was confronted by self-exonerating physicians who refused to accept any responsibility for the appalling contamination and fatality rates in field hospitals. Many more soldiers died from disease and negligence than from their battlefield injuries.

On her own initiative and using her own funds, Florence and her nursing team waged war on the real killers: typhoid, cholera, dysentery, malaria, gangrene and frostbite. She introduced triaging techniques and new hygiene regimes – tirelessly bathing the injured, laundering linen and feeding the sick with nourishing meals. And just as importantly, she kept meticulous records.

In 1857, her report to Parliament exposed the systemic failings of army bureaucrats. Most telling of all, Florence used simple ‘petal’ and ‘coxcomb’ graphics to illustrate the squalid truth. This shocking evidence prompted a radical reform of the British army’s medical treatment of soldiers. The right data. The right time. The right presentation. The right result.

The Strike: My top 3 recommendations

Well, that is my list of Top 10 data myths. Let me close with three simple pieces of advice that have shaped my thinking on data…

Look up and not down. Don’t look down at your data lake and wonder what to do with it. ‘Stare at the sky’ and seek inspiration for the infinite possibilities.
Don’t be constrained by what you have. You will be amazed by how much more you can make of your own data. And remember, there’s a wealth of data to be tapped from outside your organisation.
Focus on individual user needs. Don’t try to boil the ocean. Build pipelines from source to user. And make sure the data is displayed in an actionable format.

I hope you have found this article useful and interesting. For deeper insights into data, simply download our free and extensively revised ‘Big & Small Data Point-of-View’. Click here.