Voice control is the future. I see it in science fiction movies and series. I see it when my colleagues ask Siri to find a song. I see it in presentations at tech shows. These years, voice control is taking a quantum leap in its development and is taking over an increasing share of our daily communication with our technology. Even so, norms and rules for social conduct set an often strict framework for how we use this technology – in spite of its obvious advantages. I will return to this in a while. First, some words about making technology obey your voice.
In 1952, Bell Laboratories invented “Audrey”, a primitive computer which could understand single-digit numbers spoken by a human voice. In 1962, IBM presented “Shoebox”, which could understand no less than 16 words. During the 1970s, the US Department of Defense developed “Harpy”, which could understand a lot of English and had a search function. In the 1980s, voice-recognition technology snuck into the first commercial products, including the doll “Julie”, which could answer simple questions. In 1996, VAL was born – the forerunner of the voice-activated menus we are met with today when calling customer service.
Voice control finally became decent in the 00s, when it was connected to the data required to answer difficult questions. This happened in 2010 when Google added ‘personalised recognition’ to its Voice Search function (and in 2011 to the Chrome browser). In 2011, Siri was introduced for iPhone 4S. Since then, voice recognition technology has evolved rapidly. The latest software can access almost unimaginable amounts of data, and the underlying algorithms rapidly become better at learning what we want. The most important step for the development of voice control was to integrate the technology into smartphones.
Deliver us from screens
Technology can exist for several years with the potential for making the world a better place, yet still not quite get there. One example is genetic technology, which could revolutionise our health system if we all had our genomes sequenced. Another example is how we all could subscribe to free speaking and data all over Europe if telecoms weren’t nationally based. When it comes to voice control, the technology’s popularity is hampered by strong social norms. One of the obvious advantages of voice control is that we increasingly want to liberate ourselves from the small screen that have locked our gaze, prevented eye contact, and ruined our night’s sleep since smartphones and tablets became ubiquitous. Even so, I find it hard not to curl my toes when someone loudly asks their phone to find the name of a pop tune, call their mother, or find the best Thai restaurant in the neighbourhood. My reaction has to do with the fact that I’m not a first mover in technology. Besides, I am too aware of public space behaviour, and talking to your technology breaks many boundaries for what I find socially acceptable. I still read angry letters in the newspapers about other people’s loud phone conversations in filled trains – just think about the many angry letters that would be written if everybody talked to Siri.
First movers in technology always run the risk of ridicule. Remember the poor geeks who thought they were taking a step into the future with Google Glass and Bluetooth headsets – they became known as Glassholes and Blue-douches, respectively.
So far, voice control is just a bonus in our interaction with technology, but an important breakthrough has been made with the inventions of Amazon Echo (also known as Alexa) and Google Now. Both offer hands-free house computers that control a ‘smart home’ and also can buy anything from food to new socks with a single voice command. Alexa and Now can also remember our appointments, recite recipes, play music and answer questions. Of course, this is not without its teething problems, and a recent news item tells about how Alexa ordered doll’s houses from a TV commercial it overheard.
However, voice control will improve significantly in the near future with the continued development of deep learning (a tool to improve a system’s ability to better decode voices, recognise images, analyse languages, etc.; ed.). At the same time, more of us try out voice control, and the more people who use voice interfaces, the more data is collected – and more data means better algorithms. One of the more conservative predictions about voice control is that we before long, maybe within a decade, will have artificial intelligence that like Star Trek computers listens along as a matter of fact in order to satisfy your smallest wish 24 hours a day – maybe even before we realise what we want.
Image via Flickr