Is voice becoming the new text (again)?
On a recent episode of the TV show “Modern Family,” a character named Mitchell gets in his car and does something that’s frustratingly familiar for early adopters of technology:
He tries to operate the machine by talking to it.
“CD player: next track,” he says.
“Say a command,” a robotic voice responds.
“CD player: NEXT. TRACK,” he says, clearly annoyed.
“Air conditioner on.”
The idea that people should be able to talk to computers, and that the computers should understand what we’re saying, has been coming in and out of vogue since the 1970s. The technology never really went mainstream, though, and to this day, it’s often talked about as a joke.
In recent months, however — despite the pop-culture parodies and the increasing popularity of the text message — researchers say voice-activated technologies have entered a renaissance of sorts.
The technological resurgence is happening in part because of smartphones, those handheld devices with tiny keyboards or awkward touchscreens that some big-fingered adults would rather yell at than type on.
So why not just take those frustrations and transfer them into navigation commands and text messages?
Increasingly, that seems to be what’s happening.
Mobile voice-recognition technology now allows people to send text messages to friends by talking instead of typing; to scan through transcriptions of voice mail instead of taking time to listen to them all; to tell their phones what they’re looking for on the Web; and, soon, to post to Twitter from their cars by speaking, allowing drivers to keep their eyes on the road.
“It’s now possible to pick up your phone and press a single button and say, ‘I want the Yelp.com review of the Capital Grille in Burlington, Massachusetts. Period,’ ” said Vlad Sejnoha, chief speech scientist at Nuance Communications, a major producer of voice-to-text software.
Phones should know by now exactly what Web link to find, he said, and users should get a result without ever typing.
Additionally, Bing and Google both have mobile applications that let people search the Web by talking.
The voice-recognition software is also getting better, too.
The longer computers listen to us talk, the better they can predict what we’re going to say and understand how we say things, researchers said. Some believe that computers are getting almost as good at listening as we are.
“If you compare us to human performance, we are rapidly closing the gap,” said David Nahamoo, IBM’s chief technology officer for voice research.
The technology works by listening to a voice, translating it into digital data and then anticipating what sorts of sounds or words will come next. That’s different from early models of voice-recognition technology, which tried to understand every sound and used huge amounts of computing power as a result, he said.
Now, it’s more of a guessing game. Each voice-recognition program has a number of equations that analyze speech and use statistics to decide what noises match up to what letters.
Every year, the accuracy of these programs improves, said Bill Meisel, an independent consultant who has been working in the voice-recognition industry since the early 1980s.
In a recent comparison test of four programs, Meisel found that technologies that translate voice into text are roughly 80 to 90 percent accurate. That’s good enough for many common functions, like transcribing voice mail, he said.
“All the systems were almost perfect with phone numbers,” he said.
Still, a number of technological hurdles remain.
One, especially for voice recognition on the go, is background noise. A phone listening to a person on a bus, for example, can hear street noise and other conversations in addition to the person who is trying to give a voice command. It’s difficult for voice-recognition software to differentiate between all of those noises.
New hardware may help address that issue. Google’s Nexus One phone comes fitted with two microphones: one that records a voice and another that records interference noise and then subtracts it from the voice file, making it easier for the phone to determine what noise is human and what isn’t.
Another problem is the fact that no two people speak alike.
Even if we’re saying the same words, we tend to pronounce them different ways. And, often, even if we’re asked to say the same sentence twice, we might add different inflections or sounds that can throw computers off.
It’s “the whole thing of ‘I say toMAYto and you say toMAHto,’ ” said Nahamoo, who is Iranian. “I come from a foreign country, and some of the phonetic nuances that a native person learns, I don’t learn and I can’t reproduce.
“They all add up essentially to make me sound different.”
Over time, computers are getting better at recognizing those differences, he said, especially when an accent is fairly common. He said that is one of the major achievements of voice technology since the ’70s.
To be understood by computers, it’s more important to speak clearly and consistently than to have a perfectly neutral accent, he said.
Another issue: Not all phones have the computing power to handle voice recognition, said Tuong Nguyen, a principal analyst Gartner, the research firm.
“The biggest limitation that I see right now … is processing power,” he said. “It is fairly intense, so you do need a better, higher-end phone to do it. And then a lot of people speak with accents or colloquialisms or different languages or stuff like that, which provides some challenges as well.
“But overall, I’m pretty positive about the technology.”
Nguyen said it’s especially handy when he’s driving. In that situation, typing isn’t a safe alternative.
Meisel, the consultant, said voice may be the new way we interact with computers.
We’re already able to “have a conversation” with the technology to some degree, he said.