IF YOU WATCH spy thrillers you have probably seen the hero or villain defeat the fail-safe "voice recognition system". A research group at University College Dublin hopes to make automated voice recognition a reality, confirming identity by analysing a person's speech, writes Dick Ahlstrom.
"It is an old idea. You have heard of the voice print, but there is actually no such thing," states Dr Fred Cummins, a lecturer in the School of Computer Science and Informatics. He is the principal investigator in a Science Foundation Ireland-funded project to develop real-life voice recognition.
"The basic area is speaker identification, recognising a speaker from their voice. Our goal is to use new computer-based analysis methods for identifying speech."
Secure entry systems are only one possible use for this technology. It could help identify a criminal voice on a phone demanding a ransom or it could confirm that a sound recording of say Osama bin Laden was actually made by him, Dr Cummins says. "Humans aren't very good at it but the object is to do this better than humans."
The project started with the creation of a "speech corpus", a recorded body of speech available for analysis, he explains. He enlisted 36 speakers who put together about 15 minutes each of speech that now provides a test bed for automated recognition systems.
"It is the foundation stone of the project and it was released at a conference in Russia in 2005. We give it away to researchers free of charge, we and SFI wanted everyone to make use of it."
One inherent challenge within the research is finding a way to identify a person when they change from their normal style of speech. "What we are seeking to do is retain a grasp of the identity of a speaker as they adopt different ways of speaking."
For this reason they had the 36 subjects adopt a variety of styles, first providing examples of normal speech from a fixed text and then other styles including spontaneous speech, quick speech, whispered speech, and imitated speech, attempting to copy another voice.
He and post-doctoral research fellow Dr Marco Grimaldi are now using the speech corpus to conduct experiments. "We have found new ways of analysing the speech which seem to be better at dealing with style switches." It is based on "machine learning techniques", says Dr Cummins. "We train [the computer] on a series of utterances and it has to pick out which speaker it is."
The approach is based on detecting "instantaneous frequencies" in the voice and "how the frequency changes over time", he explains.
He is also studying other properties of speech, for example associated hand gestures. "Speech is a movement skill, one of the most complex movement skills in the body."