[asterisk-app-dev] Asterisk and UniMRCP Licensing

Joshua Colp jcolp at digium.com
Fri Sep 5 12:29:56 CDT 2014

Ben Klang wrote:


> Is it really required to use res_speech? If so, can we change the
> interfaces that ARI presents?
> Over the last few years we’ve evaluated res_speech vs. the various
> UniMRCP applications (SynthAndRecog primarily). We’ve always come to the
> conclusion that the res_speech API either couldn’t give us what we
> needed, or was not as performant. SynthAndRecog isn’t perfect, but it
> does a couple of crucial things, perhaps most importantly is the
> combined lifecycle of TTS + ASR so that you can “barge” into a TTS
> playback before it is finished.

The res_speech module and API is a very thin wrapper over common speech 
recognition concepts. It does some helpful stuff like handling 
transcoding and having a state machine but otherwise it relies on the 
underlying speech technology to do everything. It doesn't provide 
anything to the dialplan 'nor does it even know about channels.

What you probably found limiting was the interface provided to the 
dialplan/AGI for speech recognition, with the dialplan applications 
taking care of things. These wouldn't get used in ARI. We're free to 
make the interface there whatever we want.

During lunch though I gave this some more thought and think that speech 
recognition should always be a passive action on a channel (or heck, a 
bridge). It would sit in the media path feeding stuff to the speech 
recognition and raising events but does not block. This would allow it 
to easily cooperate with every other possible thing in ARI without 
requiring a developer to use a Snoop channel and manage it. It also 
doesn't put the "well if they start speaking what do I do" logic inside 
of Asterisk - it gives that power to the developer.


