[asterisk-app-dev] Asterisk and UniMRCP Licensing
Joshua Colp
jcolp at digium.com
Fri Sep 5 13:10:05 CDT 2014
Ben Klang wrote:
<snip?
>>
> Yes, that’s exactly what we found. It’s good to know that
> res_speech internally isn’t as limited as the Dialplan applications -
> I definitely thought of them as the same thing in my head, which
> sounds incorrect from your explanation.
>
> Can res_speech be extended to include TTS as well as ASR, assuming
> both are controllable via MRCP?
>
> If so, what about other MRCP functions like Call Progress Analysis or
> Answering Machine Detection?
There are no interfaces or anything defined in Asterisk for these, so
it's new stuff being added. Same caveats apply like everything new. ^_^
> CPA/AMD in particular behaves like ASR, and has similar variables (no
> input timer, final silence timer, can take a grammar document for
> input, etc).
>
>> During lunch though I gave this some more thought and think that
>> speech recognition should always be a passive action on a channel
>> (or heck, a bridge). It would sit in the media path feeding stuff
>> to the speech recognition and raising events but does not block.
>> This would allow it to easily cooperate with every other possible
>> thing in ARI without requiring a developer to use a Snoop channel
>> and manage it. It also doesn't put the "well if they start speaking
>> what do I do" logic inside of Asterisk - it gives that power to the
>> developer.
>>
>
> Yes, that sounds great. Async FTW.
>
> One observation to share: We often use something like SynthAndRecog
> (unimrcp-asterisk’s dialplan implementation) to handle both input and
> output in a single command. This allows prompts to be “barged”, or
> interrupted by speech or DTMF. What happens is that the speech
> recognizer is running while the synthesizer is playing back. When the
> caller speaks, it raises an MRCP event, which UniMRCP uses as a
> trigger to stop TTS playback. This works well enough, though
> occasionally the delay between start-of-speech and TTS getting hugged
> can be noticeable.
>
> What you’re proposing would mean letting the application stop TTS
> playback in response to a start-of-speech event. In our experience
> applications can get loaded down and delay those responses even more.
> Even in a best-case scenario, the latency for the application
> handling this kind of request would be significantly more than doing
> it inside of Asterisk. Since this is a very timing-critical operation
> (milliseconds count, as a human will pick up on the delay), it might
> be good to have an option that combines input with output for the
> purpose of barge.
>
> To borrow an example from a similar protocol: Rayo handles this by
> allowing all three kinds of commands: Input (for ASR or DTMF), Output
> (for TTS or audio file), and Prompt (for a combined Input + Output,
> where the Output is linked to stop on a start-of-input event). All 3
> actions are async, raising the appropriate events as things happen.
I fear doing the Prompt case that we then have to somehow jury rig
things to use the existing playback mechanism (to allow current and
future URI schemes) but allow it to be influenced more as a result of
the events. That's why I hesitated.
If we could come up with a clean way to do that, yeah.
--
Joshua Colp
Digium, Inc. | Senior Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org
More information about the asterisk-app-dev
mailing list