[asterisk-speech-rec] New ASR

Mon Jan 26 15:49:52 CST 2009

I have now my speech recognizer integrated with Asterisk!

The engine is called Audimus and it's made by VoiceInteraction.pt.
It supports the following languages:
 - European Portuguese
 - Brazilian Portuguese
 - Angolan Portuguese
 - Spain Spanish
 - Some Latin American Spanish variants
 - US English

Thanks Joshua and Allann for the valuable help that you provided me!

Renato

Allann Jones wrote:
>
> You must change the recognizer state to AST_SPEECH_STATE_READY 
> (speech->state) to recognizer to be ready. You control the recognizer 
> states for it starts (ready to start), when it is waiting results, 
> when it received the results, when it must stop, app_speech_utils.c 
> works based on the recognizer states. I has implemented a recognition 
> module for the enterprise that I work, the API functions better with 
> function SpeechBackground (see the implementation in 
> app_speech_utils.c - speech_background() function). SpeechStart 
> (speech_start() in app_speech_utils.c) has a basic implementation of 
> the recognizer and it doesn't start some resources needed as the 
> speech_background does. SpeechBackground is more complete and 
> convenient, you call it and change the recognizer states as needed, if 
> you don't want to use a background playback (at this point can occur 
> problems with echo depending on the telephony card that can be fixed 
> with a echo cancellation if it is supported) you can use a empty audio 
> file. See the references to AST_SPEECH_STATE_READY inside 
> speech_background() in app_speech_utils.c and you will see the solution.
>
> The hierarchical tree is:
>
> app_speech_utils
>     |
>     v
> res_speech
>
> app_speech_utils.c implements functions that calls the res_speech.c 
> functions. Attention to the recognizer states explanation. Begin 
> implementing AST_SPEECH_STATE_READY, AST_SPEECH_STATE_NOT_READY and 
> AST_SPEECH_STATE_DONE for a basic implementation, after this add the 
> other states.
>
> The API only works with short linear audio as showed in the code from 
> speech_create in app_speech_utils.c:
> speech = ast_speech_new <http://www.asterisk.org/doxygen/1.4/speech_8h.html#92756eef3e31400803fd6fb93c3eaaab>(data, AST_FORMAT_SLINEAR <http://www.asterisk.org/doxygen/1.4/frame_8h.html#a68ce7f14882005613a3e1fb0f4181b7>);
>
>   
>
> See these parts of the documentation (attempt to the recoginizer state 
> information):
>
> -----
> ast_speech_start(speech);
>
>   This essentially tells the speech recognition engine that you will 
> be feeding audio to it from
> then on. It MUST be called every time before you start feeding audio 
> to the speech structure.
>
> - Send audio to be recognized:
>
> int ast_speech_write(struct ast_speech *speech, void *data, int len)
>
> res = ast_speech_write(speech, fr->data, fr->datalen);
>
>   This writes audio to the speech structure that will then be 
> recognized. It must be written
> signed linear only at this time. In the future other formats may be 
> supported.
>
> - Checking for results:
>
>   The way the generic speech recognition API is written is that the 
> speech structure will
> undergo state changes to indicate progress of recognition. The states 
> are outlined below:
>
> AST_SPEECH_STATE_NOT_READY - The speech structure is not ready to 
> accept audio
> AST_SPEECH_STATE_READY - You may write audio to the speech structure
> AST_SPEECH_STATE_WAIT - No more audio should be written, and results 
> will be available soon.
> AST_SPEECH_STATE_DONE - Results are available and the speech structure 
> can only be used again by
> calling ast_speech_start
>
>   It is up to you to monitor these states. Current state is available 
> via a variable on the speech
> structure. (state)
>
>
> - SpeechBackground(Sound File|Timeout):
>
>   This application plays a sound file and waits for the person to 
> speak. Once they start
> speaking playback of the file stops, and silence is heard. Once they 
> stop talking the
> processing sound is played to indicate the speech recognition engine 
> is working. Note it is
> possible to have more then one result. The first argument is the sound 
> file and the second is the
> timeout. Note the timeout will only start once the sound file has 
> stopped playing.
> -----
>
>
> On Sat, Jan 24, 2009 at 2:42 PM, Renato Cassaca 
> <renato.cassaca at voiceinteraction.pt 
> <mailto:renato.cassaca at voiceinteraction.pt>> wrote:
> > I started to test the integration of my speech recognizer but not 
> everything
> > is going as expected....
> >
> > - Is ast_speech_engine->start supposed to be a synchronous function?
> >         exten => 1000,1,Answer()
> >         exten => 1000,n,SpeechCreate(Audimus)
> >         exten => 1000,n,SpeechActivateGrammar(digitos-unidades)
> >         exten => 1000,n,SpeechStart()
> >         exten => 1000,n,Background(hello-world)
> >         exten => 1000,n,SpeechDeactivateGrammar(digitos-unidades)
> >         exten => 1000,n,Goto(internal-${SPEECH_TEXT(0)})
> >
> >  I have the above dialpan (copied from docs) and what is happening is:
> >     -- Executing [1000 at phones:1] Answer("SIP/1000-0334ea70", "") in new
> > stack
> >     -- Executing [1000 at phones:2] SpeechCreate("SIP/1000-0334ea70",
> > "Audimus") in new stack
> >     -- Executing [1000 at phones:3] 
> SpeechActivateGrammar("SIP/1000-0334ea70",
> > "digitos-unidades") in new stack
> >     -- Executing [1000 at phones:4] SpeechStart("SIP/1000-0334ea70", "") 
> in new
> > stack
> >     -- Executing [1000 at phones:5] BackGround("SIP/1000-0334ea70",
> > "hello-world") in new stack
> >     -- <SIP/1000-0334ea70> Playing 'hello-world' (language 'en')
> >     -- Executing [1000 at phones:6]
> > SpeechDeactivateGrammar("SIP/1000-0334ea70", "digitos-unidades") in new
> > stack
> >
> >  There is no wait explicit wait for engine results and there's no call to
> > ast_speech_engine->write (no audio is being sent to the ASR).
> >
> >  From the functions in ast_speech_engine which of them should be
> > synchronous?
> >  How is ast_speech->state affecting the Asterisk behavior? (if you 
> indicate
> > me the source file, I can check it myself)
> >  What else should be done do to have audio streamed to my engine?
> >
> > Renato
> >
> >
> > Joshua Colp wrote:
> >
> > ----- "Renato Cassaca" wrote:
> >
> >  
> >
> > I'm finishing the ASR integration and I have a few more questions
> > (hopefully, the last ones):
> > - ast_speech_engine->get(...): returns the next available result or
> > all pending available results?
> >    
> >
> > It returns a linked list of results sorted by score.
> >  
> >  
> >
> > - ast_speech_engine->dtmf(...): what is the expected engine behavior?
> > - stop the recognition, ignoring the results that are being processed
> > (but not finalized yet)
> > - stop the recognition but produce all results that are being
> > processed (and can be finalized with the received audio)
> >    
> >
> > This callback is purely informational. You do not need to implement it.
> >  
> >  
> >
> > - ast_speech_engine->list: it's managed by Asterisk, I don't have to
> > do nothing with it. Right?
> >    
> >
> > Right.
> >  
> >  
> >
> > - ast_speech_engine->activate(...grammar...): the activated grammar is
> > exclusive or incremental?
> > That means, if the ASR has already an activated grammar, should the
> > new one be added to them or should all current ASR grammars be
> > replaced by the new one? The interpretation of this will influence the
> > implementation of deactivate...
> >    
> >
> > This depends on the engine itself... you can implement it whichever 
> way you
> > want. I would say have it so that you can have multiple grammars at once
> > though. This is what people would probably expect.
> >
> >  
> >
> > _______________________________________________
> > --Bandwidth and Colocation Provided by http://www.api-digital.com--
> >
> > asterisk-speech-rec mailing list
> > To UNSUBSCRIBE or update options visit:
> >   http://lists.digium.com/mailman/listinfo/asterisk-speech-rec
> >
>
>
>
> -- 
> _______________________________
> Allann J. O. Silva
>
> "I received the fundamentals of my education in school, but that was 
> not enough. My real education, the superstructure, the details, the 
> true architecture, I got out of the public library. For an 
> impoverished child whose family could not afford to buy books, the 
> library was the open door to wonder and achievement, and I can never 
> be sufficiently grateful that I had the wit to charge through that 
> door and make the most of it." (from I. Asimov, 1994)
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
>
> asterisk-speech-rec mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-speech-rec
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-speech-rec/attachments/20090126/83ba2ff0/attachment-0001.htm