[Asterisk-Users] Speech Recognition

Race Vanderdecken asteriskusers at codetyrant.com
Fri Feb 11 18:53:30 MST 2005


Ahem,

Being one who has programmed, consulted and argued to points beyond
violence about the subjects of your first paragraph, I shall now
expound.

Expounding begins:

I worked on several projects with a company named Intellivoice that did
" so called voice dialing", voice activated dialing, VAD, as a bread and
butter product in the PSTN/T-1 world.

The product was good at about 3-5 recognitions so long as they were
distinct enough that your well trained dog could understand them as
different commands.

I was first hand witness to many sales and customer meetings, I rode in
the car of the "inventor" and ate lunch with the VAD developers and beat
them often with questions about how they did it and why it did not work.

Personally, I have a Mid-Western trained Mid-Atlantic accent, i.e. no
accent to speak of, so speech and voice recognition engines like me. I
am even tempered and have been working in telecommunications, 22 wpm
Morse code, to Tech Plus, to before NETBIOS, SNB, and 256K twisted pair
Ethernet on 9DB, through voice and right back into VoIP before it was an
acronym. I have been to college to study communications. I have an ear
for dialects and can place most people in 100 mile range within their
State. I coded the Persona project. I have pushed Sphinx down Festivals
throat, and I have worked with Dave. I was working to create voice
X/HTML/XML browser before they were committees. I am pushed speech and
voice and dictation since I got my hands on a computer. I love speech
recognition and generation, period.

So, when I say that you are out of your mind if you think you can get
VAD or SAD to work across the wire if there is an analog device in the
path you should take heed.

VAD on cel-phones works now because the reco is in the phone, for the
most part. VAD over the analog wires can be done but is of no use to
anyone unless they like to scream at the phone from time to time. By the
way screaming at voice-reco engines only makes the angry. So angry in
fact that they will either repeatedly ask you to "please say the name
again" until you calm down or they will deliberately misdial the number
for you. Machines just don't like to be yelled at, ask Woody Allen about
the time beat up his television and the elevator incident.

If you are Digital from speaker to reco then you have a chance. If you
are G.711 all the way you have a chance. And by chance I mean if you use
grammar based recognition and have a caller with an IQ greater then the
first two digit of their Area code.

Zhong's thesis work is interesting and I will state he is on the right
track and I enjoin him to continue his work. Neural works are the right
path, but he needs to re-read Strousstrup on objects in Tries. But short
utterances don't work in communication models; see "trying to
communicate with teenage son" and HDLC.

My Expounding ends.

Yes, what you want can be done, and it would be easy, no, I say trivial
to accomplish.

Follow the dynamic context example. Listen for reco, then create a
dynamic context, then forward to that dynamic context. Easy peasy.

Please contact me, asterisk at codetyrant, if you would like to create a
project to do this. I would love to see VAD on VoIP come to fruition in
my lifetime.

Race "The Tyrant" Vanderdecken

-----Original Message-----
From: asterisk-users-bounces at lists.digium.com
[mailto:asterisk-users-bounces at lists.digium.com] On Behalf Of Robert
Rozman
Sent: Saturday, January 29, 2005 12:16 PM
To: Jon Radon; Asterisk Users Mailing List - Non-Commercial Discussion
Subject: Re: [Asterisk-Users] Speech Recognition

Hi,

probably I won't be much of help, but I'm also looking for speech
recognition solution. But we're actually looking at two problems:
- one would be so called voice dialing (similar to celular phones) - one
records its own spoken names and speaks them after to call certain
person -
this problem is much easier to solve. Recently I have found interesting
project that could be easily integrated for such functionality
(http://www.princeton.edu/~lzhong/DNN.html). I'd like to start doing
this
but don't know much about Asterisk and its eagi interface to get sound
out
of it. I guess some with more insight could easily integrate this code.
This
solution could be probably used for simple 1 word recognition tasks
(like
speak name for outgoing call, or maybe say "sales" to get sales
department -
but as said this is speaker dependent solution.

- using speaker independent solutions for other stuff. I guess that
Sphinx
is at the moment most serious candidate. There is already some work on
connecting speech recognition to MH and I'm sure that guys will help
with
other uses.

What would be most desirable from Asterisk community is some skeleton
code
for eagi interface... I also have question:
does eagi based recognition take place in parallel to other dialplan
activities (like dtmf recognition, actions, etc...) ?

Regards,

Rob.

_______________________________________________
Asterisk-Users mailing list
Asterisk-Users at lists.digium.com
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users





More information about the asterisk-users mailing list