The future is voice control

iPhone users are already speaking to their mobiles via ‘Siri’


iPhone users are already speaking to their mobiles via 'Siri'. Voice technology is progressing fast, with car manufacturers taking it on board, writes DAVIN O'DWYER

FROM THE malevolent HAL in 2001: A Space Odyssey to the helpful Kitt in Knight Rider, the computers and robots of our dreams for the future have one thing in common – they talk to us and, almost more importantly, they listen. That vision of holding conversations with sentient computers might seem like the preserve of science fiction but, amazingly, it’s a future that is already showing signs of having arrived.

At Nuance, the leaders in voice-recognition technology, they call it the “Siri effect” – people are asking questions of their iPhones in their millions, the first sign that voice-recognition technology is ready to go mainstream.

The firm has just opened its international headquarters in Dublin, creating more than 40 jobs and, judging from the technologies that Nuance is working on, this is a company poised to transform how we interact with technology – it’s not going to stop at smartphones, not by a long shot.

READ MORE

One of the most obvious applications for this advanced interaction paradigm is the car. At Apple’s Worldwide Developers Conference last week, the Cupertino tech giant announced the “Eyes Free” version of its voice-activated personal assistant, which will make it ideal for implementation in vehicles – BMW, General Motors, Mercedes, Land Rover, Jaguar, Audi, Toyota, Chrysler and Honda were said to be preparing to adopt the technology with an activation button on the steering wheel. Eyes stay on the road and hands on the wheel, rather than on the touch-screen device.

One notable absentee from the list of early partners with Apple’s “Eyes Free” scheme was Ford, which has been pioneering its own similar technology for a few years now. Recently, Ford and Nuance gave a display of some of the technologies they are co-operating on at Ford’s European research centre in Aachen in western Germany.

The event demonstrated the degree to which the automobile industry is betting on a voice-controlled future – the ability to choose music, get directions and reply to text messages, all things that people are used to doing with frictionless ease these days, becomes considerably safer when conducted via voice control.

Ford introduced its Ford Sync technology in its US models back in 2007, but will be introducing the integrated communications and entertainment system in European models only later this year. Built by Microsoft and powered by Nuance’s voice-recognition technology, it allows for an impressively futuristic level of engagement with the car. It’s not quite Kitt, obviously, but it illustrates the simple power, and promise, of voice control.

“Mobility has changed from the Model T to the iPhone,” says Pim van der Jagt, managing director of the Aachen research centre. Solving the problem of how to integrate the mobile communications technology of today and tomorrow with transport technology is a key challenge for the big car makers.

Ford Sync uses Bluetooth to communicate with iPhones, Blackberries and Android smartphones, allowing for calls to be initiated by spoken command and text messages to be dictated.

Van der Jagt is frank about Ford’s recent woes, acknowledging that product quality was a key reason for its financial problems. With that in mind, focusing on improving the driving experience through the application of bleeding-edge technology is a cornerstone of its continued recovery, he says.

Of course, it’s not easy for a car maker to become a tech company, and the latest generation of Ford’s in-car communication and control system, featuring an 8in touchscreen, and dubbed MyFord Touch, has been beset by reliability and usability problems.

The difficulties with MyFord Touch illustrates the challenge faced by all companies as they face a software-dependent future. Still, Ford Sync has so far been deployed on four million vehicles in the US, and the company is aiming for 13 million Sync customers worldwide by 2015, including 3.5 million in Europe.

Ford is also pitching plenty of ideas about how imminent technology might radically improve the driving experience, only some of which rely on voice recognition.

According to van der Jagt, Ford and other car manufacturers are co-operating on a secure communication protocol between cars that will allow for real-time traffic and safety information to be transmitted between vehicles, potentially vastly improving road safety. (A vaguely similar technology featured in an aspirational, and overly optimistic, Ford promotional video all the way back in 1966.)

The realistic timeline for the implementation of those ambitious plans is many years in the future, but voice control is a more immediate addition to our driving environment – the problems posed by parsing human speech patterns, eliminating background noise and understanding various accents are hugely challenging, but the solutions have improved dramatically in recent years.

“Voice technology has matured beyond simply recognising what has been said, to now include natural language processing that understands what we mean, to access content and achieve specific outcomes,” says Stefan Ortmanns, a senior vice-president of mobile engineering at Nuance.

Ortmanns suggests that the technological limitations that for so long restricted voice recognition to rudimentary functions such as answer-machine services are being quickly overcome – the Siri effect calls is evidence enough of that.

Massachusetts-based Nuance has maintained a relatively low profile for a company that will help shape our technological future.

Born out of numerous mergers and acquisitions – more than 40 at the last count – its patent portfolio makes it the undisputed leader in the voice-control industry.

It powers the voice-recognition services of a huge range of companies, from healthcare dictation firms through airlines and smartphone manufacturers – and while it refuses to admit it openly, given Apple’s penchant for secrecy, Nuance also powers Siri, the iPhone’s personal assistant software that will soon be the voice of a fleet of vehicles from other companies.

Its own Dragon Dictate software is the market leader in dictation for good reason – it is uncannily accurate at understanding a range of accents.

At the event in Aachen, Dragon Dictate didn’t hesitate even when faced with heavily accented English. Its smartphone microphone app even allows Dragon Dictate users to speak into their phone and their words to appear, as if by magic, on their computers.

But it’s not just stenographers who should be worried about their future – Nuance has also incorporated voice controls into TV sets and coffee makers. While the benefit of asking your snazzy Jura Impressa Z7 One Touch Voice coffee machine to churn out a frothy cappuccino is somewhat compromised by having to position a cup underneath the nozzle and press a button anyway – baristas can breathe a sigh of relief – the potential for voice-controlled television is huge.

A Nuance representative demonstrated the power already available in its Dragon TV software – sitting on a couch a few feet from his TV set, he asked for CNN. The TV obliged, changing to the news network.

“Dragon TV, what Harrison Ford films are on tonight?” he asked. No Indiana Jones movies on, unfortunately, but a verbal request to record The Devil’s Own to the DVR was quickly confirmed.

Even from this relatively simple, albeit impressive, display of voice recognition, it’s possible to imagine a future where we instruct our homes to adjust the temperature and dim the lights, ask our computers to find a recipe and read it aloud, handlessly initiate phone calls and dictate messages and, yes, have conversations with our cars.

Where is all this voice technology leading? “Towards artificial intelligence,” Ortmanns suggests, matter of factly. The challenges to be met before then are huge – it will be many years before our speech is understood perfectly, never mind the development of artificial intelligence. But we are facing the reality that the ability to talk, for so long a distinctly human quality, might not be restricted to our species for much longer. Say hello to the future, and the future is likely to answer back.