The moment I’ve been waiting for has finally arrived – I can talk to my computer!
No-one is born with a keyboard or smartphone in their hands. But that obvious truth hasn’t stopped these two accessories becoming as indispensable as breathing. Before today’s new-born baby reaches its second birthday, it’s virtually certain to be using one or both of these devices. What’s more, it will probably do so instinctively without need of any guidance or user tutorials.
But is this progress or natural evolution?
Voice-driven IT will be more than a game-changer – it will be a revolution
I’ve been working with technology since 1969 (I was a teenage programmer in the very early days of UK computing). Although I’ve seen massive changes, IT has not yet developed into an intuitively natural act. Despite such game-changing developments as touchscreens, we still end up using a screen-based or physical keyboard. But this is all about to change. We are on the verge of voice-driven IT and, believe me, this will be more than a game-changer – it will be a revolution. And who is leading this technology coup? Amazingly, it’s everyone’s ‘favourite book retailer’ – Amazon!
Last year, Amazon launched Alexa in two visible manifestations: Echo and Echo Dot. The only difference between these two variants is the price and size of the device/speaker.
Functionally, they are identical. Both provide access to a voice driven interface called Alexa – a genuinely clever piece of software that runs on Amazon Web Services (AWS).
Alexa can recognise the voice of almost everyone who talks to her. Amazon’s ‘Evangelist’ for Alexa and Echo – tells me that Alexa is currently achieving 95% accuracy (i.e. she correctly transcribes words 95% of the time). Now that’s an extraordinary statistic – but, for Amazon, it’s simply not good enough. By 2020, the company is confident that Automatic Speech Recognition (ASR) improvements will have boosted that figure to 99%. And this seemingly small 4% uplift will make a greater difference than most people imagine… it will be a game-changer.
Understanding the spoken word is the really big challenge – cue NLP
But ASR is actually the easy part. By far the hardest challenge is to understand what the words actually mean. This is the area of Natural Language Processing (NLP) that is now focusing most of Amazon’s energies and resources.
By creating more endpoints, more users and the more feedback loops, the NLP system can learn appreciably quicker. Alexa will listen to the user’s speech… convert it into text via ASR… understand the message sentiment with NLP…and then select the relevant application to generate a response which is finally converted into speech. And it will do all of this in under 0.5 of a second!
This is hugely impressive because, as many readers will know, NLP normally chews processing power.
You would expect this insatiable appetite to seriously impact response speeds.
But it doesn’t because Amazon has drawn upon the vast latent potential of AWS.
That’s why my Echo is able to recognize and respond to anything I say in fractions of a second (mind you, Alexa isn’t infallible and she does struggle to answer some questions).
But it’s not just Amazon that’s showing us the power of voice recognition. Some of you may be using a package called ‘Dragon Naturally Speaking’ to help you transition from keyed text to voice…
I first tried Dragon software a few years ago and, to be brutally honest, I wasn’t impressed. The voice recognition was clunky and patchy. I was constantly having to check and correct the text – a chore that took longer than keying it in from scratch. So, in frustration, I stopped using it. However, spurred on by the Alexa experience, I decided to dig it out, dust it down and give it one last chance. And guess what? It was transformed. It is now so damned good I use it pretty well all the time for creating emails and word docs!
So, on the strength of that experience, what does the future hold?
Well, only last week, I was told about a lift system that uses the sound made by the door opening and closing to predict – very precisely – when an elevator requires maintenance. Now that is a really clever use of AI and sound recognition.
Also, for those killjoy critics who bleat and tweet that Alexa only works in the home, there’s some very bad news… the brazen hussy is putting herself about! Three global car companies have already signed to use Alexa as their default in-car ASR system. Amazon is deliberately letting companies have free access to Alexa – allowing them to use it as the default front end to systems or to develop skills which can then be accessed by Alexa. It’s a clever strategy – one that’s calculated to establish Alexa as the world’s pre-eminent ASR.
I confidently predict that before very long you will be accessing Alexa on your Android smartphone (Amazon use Android). But this begs an intriguing question: where does that leave Google and especially Apple?
I imagine Google must feel pretty miffed (and that’s putting it mildly) that they have been caught napping by a ‘jumped-up book retailer’. But the omens for Apple look no better. Siri is monogamously wedded to the iPhone and tightly micro-managed by Apple. By contrast, Alexa is flirtatious and fancy-free – keen to build relationships with anyone who is serious about ASR. And, in the end, that could be Apple’s undoing. Apple are trapped in their own micro-climate, and Amazon may well rain on their parade!
However, for the rest of us, the future looks awesome. We are on the cusp of true voice-driven interactions with our IT. No more keyboards! Oh joy. Bring it on.
PS This article was written entirely – and totally free of errors – using the Dragon voice recognition system. Proof that it really does work.
PPS If you’re watching any Russian TV dramas – such as the BBC’s ‘War and Peace’ – do remember to switch off Alexa. The constant refrain – “I don’t understand your question” – does become slightly tedious!