{"82151":{"#nid":"82151","#data":{"type":"news","title":"Communicating with Machines: What the Next Generation of Speech Recognizers Will Be Able to Do","body":[{"value":"\u003Cp\u003EWhen the motion picture \u00222001: A Space Odyssey\u0022 opened in 1968, that conversation between a stranded astronaut and a malevolent computer named HAL seemed plausible for the year 2001 -- then more than three decades in the future. \n\u003C\/p\u003E\n\u003Cp\u003EBut as any user of today\u0027s automatic speech recognition technology can attest, that future hasn\u0027t quite arrived yet. \n\u003C\/p\u003E\n\u003Cp\u003EAs a scientist at AT\u0026amp;T Bell Labs, B.H. \u0022Fred\u0022 Juang helped create the current generation of speech recognition technology that routinely handles \u0022operator-assisted\u0022 calls and a host of other simple tasks, including accessing credit card information. Proud of that pioneering work, Juang today is working to help create the next generation of speech technology -- one that would facilitate natural communication between humans and machines.\n\u003C\/p\u003E\n\u003Cp\u003ENow a professor in the School of Electrical and Computer Engineering at the Georgia Institute of Technology, Juang presented his vision of next-generation speech systems Saturday, February 14 at the annual meeting of the American Association for the Advancement of Science (AAAS). \n\u003C\/p\u003E\n\u003Cp\u003E\u0022If we want to communicate with a machine as we would with a human, the basic assumptions underlying today\u0027s automated speech recognition systems are wrong,\u0022 he said. \u0022To have real human-machine communication, the machine must be able to detect the intention of the speaker by compiling all the linguistic cues in the acoustic wave. That\u0027s much more difficult than what the existing technology was designed to do: convert speech to text.\u0022\n\u003C\/p\u003E\n\u003Cp\u003ETo make the speech recognition problem solvable in the 1970s, researchers made certain assumptions. For instance, they assumed that all the sounds coming to the recognizer would be human speech -- from just one speaker. They also assumed the output would be text, and that recognizer algorithms could acceptably match speech signals to the \u0022closest\u0022 word in a stored database. \n\u003C\/p\u003E\n\u003Cp\u003EBut in the real world, human speech mixes with noise -- which may include the speech of another person. Speaking pace varies, and people group words in unpredictable ways while peppering their conversations with \u0022ums\u0022 and \u0022ahs.\u0022\n\u003C\/p\u003E\n\u003Cp\u003ESpeech researchers chose mathematical algorithms known as Hidden Markov Models to match sounds to words and place them into grammatical outlines. That system has performed well for simple tasks, but often produces errors that make the result of speech-to-text conversion difficult for humans to understand -- and even worse for natural human-machine communication.\n\u003C\/p\u003E\n\u003Cp\u003E\u0022It doesn\u0027t matter what you give the system, it just picks the closest sounding word and gives that to you as text,\u0022 explained Juang, who holds the Motorola Foundation Chair at Georgia Tech and is a Georgia Research Alliance Eminent Scholar in Advanced Communications. \u0022But that\u0027s quite wrong if you are interested in general communications. When you talk, a lot of information is lost if you use the current methods.\u0022\n\u003C\/p\u003E\n\u003Cp\u003EIn addition, current machines cannot understand \u0022reference,\u0022 a linguistic shorthand people use to communicate. When discussing a technical issue such as electrical resistance, for instance, a group of engineers may use the word \u0022it\u0022 in referring to Ohm\u0027s Law. Humans easily understand that, but machines don\u0027t.\n\u003C\/p\u003E\n\u003Cp\u003E\u0022If every time we began to discuss one term, we had to define it, conversation would be very awkward,\u0022 Juang noted. \u0022Being able to understand reference is very important for natural communication. If we can create a system to do that, the machine would behave much more like a human and communicate more like a human.\u0022\n\u003C\/p\u003E\n\u003Cp\u003EThe next generation of speech recognizers, he says, will have to go beyond conversion to text. \n\u003C\/p\u003E","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cem\u003E\u0022Open the pod bay doors please, HAL.\u0022\u003C\/em\u003E","format":"limited_html"}],"field_summary_sentence":"","uid":"27303","created_gmt":"2004-02-18 01:00:00","changed_gmt":"2016-10-08 03:03:38","author":"John Toon","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2004-02-18T00:00:00-05:00","iso_date":"2004-02-18T00:00:00-05:00","tz":"America\/New_York"},"extras":[],"related_links":[{"url":"http:\/\/gtresearchnews.gatech.edu\/newsrelease\/speech.htm","title":"Changes needed"}],"groups":[{"id":"1188","name":"Research Horizons"}],"categories":[{"id":"135","name":"Research"}],"keywords":[],"core_research_areas":[],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cstrong\u003EJohn Toon\u003C\/strong\u003E\u003Cbr \/\u003EResearch News \u0026amp; Publications Office\u003Cbr \/\u003E\u003Ca href=\u0022http:\/\/www.gatech.edu\/contact\/index.html?id=jt7\u0022\u003EContact John Toon\u003C\/a\u003E\u003Cbr \/\u003E\u003Cstrong\u003E404-894-6986\u003C\/strong\u003E","format":"limited_html"}],"email":["jtoon@gatech.edu"],"slides":[],"orientation":[],"userdata":""}}}