[Asterisk-Users] Anybody using Sphinx

Wed Nov 19 05:55:50 MST 2003

Anthony Wood wrote:

>On Wed, Nov 19, 2003 at 10:22:55AM +0800, Steve Underwood wrote:
>  
>
>>Arnold Ligtvoet wrote:
>>
>>    
>>
>>>Since I would like the user names to be auto-generated by the system, I
>>>would guess that this could best be done using festival with a localized
>>>voice. I think there is a Dutch voice for Mbrola with should integrate into
>>>festival ( note to self : need bigger harddisk :-) )
>>> 
>>>
>>>      
>>>
>>Speech recognition accuracy is not great under ideal conditions. Doing 
>>what you suggest seems unlikely to achieve any meaningful accuracy. 
>>Speech recognition training systems require many occurances of a word or 
>>phrase, clearly spoken, before their accuracy becomes useful. A one shot 
>>utterance from Festival seems to fail on both counts :-)
>>
>>    
>>
>
>Sphinx isn't doing general speech recognition, it is determining which
>of a list of patterns it has you said, like mobile phones do.
>
That is essentially all that any voice recognition currently does. There 
is little meaningful context directed recognition (a "phrase locked 
loop" to use an old in joke) in anything available today.

>So it's fairly easy to tell between "Jennifer" and "Frank" if there
>are no other options.
>
Many commercial on-line recognisers have serious trouble telling between 
"yes" and "no" when those are the only two acceptable answers.

>When you call directory assistance in Australia, the IVR asks you what name
>you want, and gives you a suggestion out of the top 100 or 200 names, which you
>can accept or reject.  Makes for riducule, but beats waiting on hold.
>
Beware that many of these systems are actually a human operator hiding 
behind and IVR. I've had people tell me about amazing automated 
directory enquiry systems in the US, which turn out to be a human 
masquerading as an IVR. If the list is known to be short, that many not 
be the case here.

>>Bottom line: the very best speech recognition still sucks. As a British 
>>speaker I never get more than about 40% accuracy speaking into a US 
>>trained recogniser. I have never had better than about 70-80% accuracy 
>>on a British trained recogniser. Strangely, my terrible Cantonese gets 
>>nearly 100% on SpeechWorks recogniser. :-\
>>    
>>
>
>This is true for general speech recognition, where the computer
>has a much larger dictionary to match the sound waves against.
>  
>
Only a speaker trained system could even begin to approach these 
accuracies for general text input. The accuracies I gave are for phone 
based systems expecting a very limited set of responses from an 
arbitrary caller.

Humans really don't do that much better at raw word recognition, but we 
heavily apply context to improve things.

Regards,
Steve