[asterisk-users] Simple speech recognition for driving IVR - "press or say one".

Wed Dec 6 08:33:06 CST 2017

Briefly: I want to be able to have "press or say (number)", with
Asterisk listening for a spoken number, but accepting a DTMF digit,
too.

I'm posting everything I found so far, here, partly to show working,
but also in case anyone else finds it useful. So, moving on....

This looked hopeful for a moment until I realised that it doesn't do DTMF:
https://wiki.asterisk.org/wiki/display/AST/Asterisk+15+Application_SpeechBackground

So then there's
https://wiki.asterisk.org/wiki/display/AST/Asterisk+15+Application_Record,
which can terminate on any DTMF key with "y", but according to the
docs, "RECORD_STATUS" only sets a flag of "DTMF" (A terminating DTMF
was received ('#' or '*', depending upon option 't')).
So, I don't get to know which key was pressed via that method, either.

There's very little information I can find about the built-in
functions for speech recognition.
https://wiki.asterisk.org/wiki/display/AST/Speech+Recognition+API
doesn't actually explain how to integrate the actual speech engines.

In this previous forum post,
https://community.asterisk.org/t/asterisk-15-jack-streams-speech-recognition-so-many-questions/72108/2
, jcolp explained that most people don't use the speech interface
anyway, because
"Asterisk modules are written in C, and it’s more difficult to do
things in that fashion. Using the Record and ship it off using Python,
etc, is just easier and gets the job done for a lot of people to where
they find it acceptable.
So, AGI it is! But I'm still stuck on how I record for speech AND get
a DTMF if it was dialled.

Regarding speech in general, even "Asterisk - The Definitive Guide" just says:

"Asterisk does not have speech recognition built in, but there are
many third-party speech
recognition packages that integrate with Asterisk. Much of that is
outside of the scope
of this book, as those applications are external to Asterisk" - helpful!

The speech-rec mailing list at
http://lists.digium.com/pipermail/asterisk-speech-rec/ hasn't been
posted to since 2013

Someone else asked about speech recognition and unimrcp in this post:
http://lists.digium.com/pipermail/asterisk-users/2017-February/290875.html

uniMCRP https://mojolingo.com/blog/2015/speech-rec-asterisk-get-started/
http://www.unimrcp.org/manuals/html/AsteriskManual.html#_Toc424230605
This has a Google Speech Recogniser plugin, but it's $50 per channel
http://www.unimrcp.org/gsr

*Reasons to use Lex over Google TTS*
• Has just been released in eu-west-1:
https://forums.aws.amazon.com/ann.jspa?annID=5186
• Supports 8KHz telepony https://forums.aws.amazon.com/ann.jspa?annID=4775
• Is in the core AWS SDK
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/LexRuntime.html
• Has a number slot type:
http://docs.aws.amazon.com/lex/latest/dg/built-in-slot-number.html
 - this means no accidental recognition of "won", "one" or "juan" instead of 1!

The pricing is definitely right: "The cost for 1,000 speech requests
would be $4.00, and 1,000 text requests would cost $0.75. From the
date you get started with Amazon Lex, you can process up to 10,000
text requests and 5,000 speech requests per month for free for the
first year".

Amazon Transcribe looks promising too, but is only available for
developer invitation at this time:
https://aws.amazon.com/transcribe/ https://aws.amazon.com/transcribe/pricing/

But all I need now is the quickest, simplest way to send Lex a short
8KHz file and get a single digit back, as quickly and reliably as
possible.

Before I travel too far down this road, can someone point me in the
right direction and possibly steer me away from the wrong path?!