[asterisk-app-dev] WebSocket Stasis Control Best Practice

Krandon krandon.bruse at gmail.com
Tue Jul 8 13:40:02 CDT 2014


Hey guys,  

Scale testing has been going very well. I will have some prelim numbers soon. You guys are on top of it. I found one small bug - a crash related to string length of the request. I went to Jira to report it, updated Asterisk from 3 weeks ago to now, saw there was a patch and the bug was already fixed. A+!

I do have a implementation question. I am currently using Websockets to create the call request and dump it into Stasis. However, if the initial leg A call fails (for whatever reason) then I never see the call come into the Stasis app. This is to be expected, as the App has not been invoked. What's the best way to get the status of that first call? (SIP code would be great, but not necessary first time around)

Thanks!  

--  
KB


On Thursday, June 19, 2014 at 11:23 AM, Ben Klang wrote:

> On Jun 19, 2014, at 11:57 AM, Matthew Jordan <mjordan at digium.com (mailto:mjordan at digium.com)> wrote:
>  
> > On Wed, Jun 18, 2014 at 12:16 PM, Ben Klang <bklang at mojolingo.com (mailto:bklang at mojolingo.com)> wrote:
> > > Excuse my somewhat tardy reply to this thread, but since you brought up AMD:
> > >  
> > > On Jun 16, 2014, at 11:47 AM, Ben Langfeld <ben at langfeld.me (mailto:ben at langfeld.me)> wrote:
> > >  
> > > On Sun, Jun 15, 2014 at 9:24 PM, Krandon <krandon.bruse at gmail.com (mailto:krandon.bruse at gmail.com)> wrote:
> > > >  
> > > > Hello Asterisk friends,
> > > >  
> > > > I am currently interfacing with Asterisk through ARI and loving the
> > > > experience so far. I have successfully originated calls and dumped them into
> > > > my Stasis app. I am trying to figure out what the best way is to send a
> > > > channel into an Application. The current architecture for
> > > > /channels/{id}/play works well for the majority of my app, but I am running
> > > > into a block figuring out how to interact with Asterisk dialplan
> > > > applications.
> > > >  
> > > > To give an example - I submit an originate to go to SIP/vendor/phoneNumber
> > > > - with the other leg going to App: myStasisApp, {"soundFile":"blah"}. That
> > > > works fine (with the proper quote escaping). Now my Stasis app has received
> > > > the channelID to which we can do a lot of neat stuff. Say I play a sound to
> > > > the user but then want to call the app WaitForSilence. What's the best way
> > > > to do this? I may be misinterpreting the intended use of both Stasis and ARI
> > > > - but I am curious to see what your thoughts are.
> > > >  
> > > > Also, for the stasis app to get a list of arguments, I am passing it
> > > > through as JSON. So far that is working fine - but I wanted to see if there
> > > > was a better way to get a list/array of app args to Stasis.
> > > >  
> > > > Forgive me if there is an easy solution - through digging and poking the
> > > > last few days, I have not been able to find the intended use case or even a
> > > > use case.
> > > >  
> > > >  
> > >  
> > > Well, the solution for this just got added into the Asterisk 12 branch, and
> > > so it hasn't made it into a release yet. It should be coming soon in
> > > Asterisk 12.4.0.
> > >  
> > > The TALK_DETECT [1] function enables AMI/ARI events [2] [3] [4] [5] on a
> > > channel, such that a connected ARI application receives notifications over
> > > the WebSocket when a person starts/stops talking. This lets you
> > > asynchronously 'know' when both talking/silence has occurred - obviating the
> > > need for the WaitForSilence/WaitForNoise dialplan applications. Plus,
> > > because it is asynchronous, if you decide you don't *want* to wait for
> > > silence, you don't have to!
> > >  
> > > With a bit of manipulation, you could also construct AMD from this as well -
> > > but I'll admit that's a bit more challenging. I'd be interested in people's
> > > experiences with attempting to do that, and if an asynchronous "IS_HUMAN"
> > > detection function is needed or not.
> > >  
> > >  
> > > We are in the process right now of creating an application that needs
> > > asynchronous AMD.  Specifically, we are implementing LumenVox’s CPA
> > > product[1] and the use case is this:
> > >  
> > > * Reminder call is placed to recipient
> > > * Recipient answers (don’t yet know if it is a human or a machine)
> > > * Outgoing message begins to play
> > > * If a human is detected, stop playback and connect to an agent
> > > * If a machine is detected, keep playing back until…
> > > * If a beep is detected, stop and restart playback
> > >  
> > > The only way to achieve this is if we can have an async speech recognizer
> > > running while simultaneously playing output, which isn’t possible with
> > > Dialplan today, and would require a specialized app even if it were
> > > implemented that way.  Instead, we are hoping to have a lower-level
> > > primitive to do signals detection and playback asynchronously.
> >  
> > You are correct that dialplan constructs (such as SpeechBackground) do
> > not readily translate to asynchronous handling of events. While there
> > are limited situations where 'event handlers' have crept into the
> > dialplan - such as pre-dial/hangup handlers - those are (a) special
> > cases and (b) not readily applicable to a large swath of events in
> > Asterisk.
> >  
> > Dialplan/AGI being synchronous was a large motivating factor for ARI.
> > While synchronous interfaces are easier to understand, we don't live
> > in a synchronous world.
>  
> This has been our main stumbling block on AGI in the past, so this move to asynchronous primitives is very welcome!
>  
> >  
> > > In an ideal world, ARI would provide primitives for playback (file or TTS)
> > > and input (DTMF or ASR). Some more background from discussion related to our
> > > project, courtesy Ben Langfeld:
> > >  
> > > The asynchronous example is more complex. While Adhearsion sees both the
> > > input and output components as being asynchronous, this is a fake facility
> > > provided by Punchblock to make Asterisk look like an async server when it is
> > > not. Both components are implemented atop synchronous Asterisk dialplan
> > > applications:
> > >  
> > > For output: Playback() or MRCPSynth()
> > > For input: MRCPRecog()
> > >  
> > > This means that given the simplest approach to implementation discussed
> > > above, the output would be executed, followed by the input being queued and
> > > executed once the output had completed. If we were to swap the two, not only
> > > would we now have a coordination problem where we have to queue cancellation
> > > of the output to paper over the race condition introduced by potentially
> > > being asked to stop it before we have a handle on it, we would have the same
> > > blocking problem with MRCPRecog().
> > >  
> > > So that rules out combining one of the UniMRCP dialplan applications with
> > > the Playback() application in this fashion. There are two other remaining
> > > solutions that come to mind:
> >  
> > Agreed; the existing mechanisms in the dialplan do not translate well
> > to asynchronous control of a channel.
> >  
> > Without punting all the way to ARI, in Asterisk 12 you do also have
> > the option of asynchronously stopping media during a Playback
> > operation via the AMI action ControlPlayback [1]. Granted, that
> > doesn't solve many of the synchronous issues you are describing, but
> > it would at least let you cancel a Playback operation.
> >  
> > [1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+12+ManagerAction_ControlPlayback
>  
> I have looked at ControlPlayback before.  I’ve not tried it in combination with the MRCP* family of commands. I may give that a shot, but you’re right that it’s really only a partial answer.  I’d be more interested in investing effort into getting the async ARI comments specified, like TALK_DETECT.
>  
> By the way - I wasn’t familiar with TALK_DETECT until this discussion, but it’s very cool/useful!
>  
> >  
> > > A prompt command to combine the output and input into a single dialplan
> > > application invocation (MRCPRecog() for native file playback,
> > > SynthAndRecog() for TTS). This avoids the problem of multiple dialplan
> > > applications blocking one another, but introduces a fresh one: these
> > > applications terminate output as soon as recognition completes (or earlier
> > > if barge-in is enabled). There is no opportunity to inject logic to filter
> > > the recognition result prior to terminating the output, nor do I think this
> > > would make sense.
> > >  
> > > The Asterisk Speech API (SpeechLoadGrammar(), SpeechActivateGrammar(),
> > > SpeechStart(),SpeechBackground(), etc). If SpeechBackground() this would be
> > > the obvious solution, but it unfortunately is not. SpeechBackground()
> > > actually sits in a loop, directing audio frames to the recognizer while
> > > simultaneously rendering frames of audio (the first option is a file path).
> > > The app does not return until recognition has completed, so cannot be
> > > combined with Playback(). Upon recognition completion, the output will be
> > > terminated, regardless of the recognition result, so this suffers the same
> > > problem as Rayo Prompt. It is also not possible to use any other output
> > > renderer, such as a TTS engine via MRCP.
> > >  
> > > Can we implement Asterisk/Lumenvox CPA in way to be compatible with the
> > > adhearsion-cpa controller methods API?
> > >  
> > > The problems stated above leave us with only one option: extra capability
> > > must be introduced to Asterisk in order to handle simultaneous dialplan
> > > applications, or to introduce a true async version of SpeechBackground().
> > > The viability of this is something that must be discussed with the Asterisk
> > > project / Digium. Note that FreeSWITCH already has this capability, but
> > > would also need less invasive changes to cope with LumenVox CPA as stated
> > > above; a far more approachable task.
> >  
> > A few thoughts here:
> > (1) I'm not sure that introducing a dialplan variant of
> > SpeechBackground that had some asynchronous capabilities will buy
> > much. At the end of the day, you're still stuck in the dialplan -
> > which has a synchronous model of operation. To do everything that you
> > need, you need:
> >  (a) Asynchronous results from the speech engine
> >  (b) Asynchronous capabilities to control media operations
> >  (c) Asynchronous capabilities to control the speech recognition
> > While (b) does exist in the previously mentioned AMI action, we're now
> > once again requiring a combination of AGI/dialplan + AMI - which is
> > clunky. It's the reason why we wrote ARI in the first place!
> > (2) The good news is, the speech API in Asterisk is not synchronous.
> > The current APIs that expose it certainly are, but there is no
> > implicit long running blocking operation involved with
> > ast_speech_write (or any of the other C API functions involved in
> > res_speech). Building an asynchronous function that emits events
> > (similar to TALK_DETECT) or adding this as an explicit operation to an
> > ARI resource is not a very hard task. In fact, using audiohooks is a
> > fairly painless way of passing audio frames from a channel (regardless
> > of where they are) into ast_speech_write, and would be a simple way of
> > passing media into the speech engine in an asynchronous fashion.
> > (3) I think it'd be nice if this was a native operation in ARI. Unlike
> > TALK_DETECT - which is a relatively simple on/off use case - there's a
> > lot of subtlety to speech recognition. Some of the existing operations
> > (such as engine creation/enabling) could probably be hidden under an
> > operation on a channel resource, but the ability to activate certain
> > grammars while speech recognition is enabled on a channel would
> > certainly be nice. I'd imagine this would be somewhat similar to the
> > /play operation, where what you are handed back is a resource that has
> > some additional properties that can be manipulated independently.
> > Something like:
> >  
> > POST /channels/{id}/recognizeSpeech?speechId=12345&default_grammar=yes_no
> >  
> > A speech resource (maybe a different name? We typically use a plural
> > form for this - speechInstances?) could be used to manipulate an
> > active speech recognition process on a channel:
> >  
> > DELETE /speech/12345/  (stop speech recognition)
> >  
> > POST /speech/12345/grammar?name=moar_grammars
> >  
> > POST /speech/12345/parameter?name=engine_specific_property&value=foobar
> >  
> > Or other things along those lines.
> >  
> >  
> This proposal sounds great to me.  If you abstract input a bit further, note that we may need more than one recognizer at a time.  At a minimum, we need the ability to specify an input grammar for both Speech and DTMF (e.g.. “Say or enter the extension you want”).  In an ideal case, we could run simultaneous inputs with different speech grammars.  The use case here would be something like hotword detection + directed IVR, or hotword detection + voice biometrics.  I don’t think your proposal conflicts those goals, I just wanted to mention them since the events emitted as part of these recognition activities should include a reference to the ID of the action that started them.
>  
> /BAK/
>  
> --  
> Ben Klang
> Principal/Technology Strategist, Mojo Lingo
> bklang at mojolingo.com (mailto:bklang at mojolingo.com)
> +1.404.475.4841
>  
> Mojo Lingo -- Voice applications that work like magic
> http://mojolingo.com (http://mojolingo.com/)
>  
> Twitter: @MojoLingo
>  
>  
>  
> _______________________________________________
> asterisk-app-dev mailing list
> asterisk-app-dev at lists.digium.com (mailto:asterisk-app-dev at lists.digium.com)
> http://lists.digium.com/cgi-bin/mailman/listinfo/asterisk-app-dev
>  
>  


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-app-dev/attachments/20140708/59b8c430/attachment-0001.html>


More information about the asterisk-app-dev mailing list