[asterisk-speech-rec] New ASR
Renato Cassaca
renato.cassaca at voiceinteraction.pt
Mon Jan 26 15:49:52 CST 2009
I have now my speech recognizer integrated with Asterisk!
The engine is called Audimus and it's made by VoiceInteraction.pt.
It supports the following languages:
- European Portuguese
- Brazilian Portuguese
- Angolan Portuguese
- Spain Spanish
- Some Latin American Spanish variants
- US English
Thanks Joshua and Allann for the valuable help that you provided me!
Renato
Allann Jones wrote:
>
> You must change the recognizer state to AST_SPEECH_STATE_READY
> (speech->state) to recognizer to be ready. You control the recognizer
> states for it starts (ready to start), when it is waiting results,
> when it received the results, when it must stop, app_speech_utils.c
> works based on the recognizer states. I has implemented a recognition
> module for the enterprise that I work, the API functions better with
> function SpeechBackground (see the implementation in
> app_speech_utils.c - speech_background() function). SpeechStart
> (speech_start() in app_speech_utils.c) has a basic implementation of
> the recognizer and it doesn't start some resources needed as the
> speech_background does. SpeechBackground is more complete and
> convenient, you call it and change the recognizer states as needed, if
> you don't want to use a background playback (at this point can occur
> problems with echo depending on the telephony card that can be fixed
> with a echo cancellation if it is supported) you can use a empty audio
> file. See the references to AST_SPEECH_STATE_READY inside
> speech_background() in app_speech_utils.c and you will see the solution.
>
> The hierarchical tree is:
>
> app_speech_utils
> |
> v
> res_speech
>
> app_speech_utils.c implements functions that calls the res_speech.c
> functions. Attention to the recognizer states explanation. Begin
> implementing AST_SPEECH_STATE_READY, AST_SPEECH_STATE_NOT_READY and
> AST_SPEECH_STATE_DONE for a basic implementation, after this add the
> other states.
>
> The API only works with short linear audio as showed in the code from
> speech_create in app_speech_utils.c:
> speech = ast_speech_new <http://www.asterisk.org/doxygen/1.4/speech_8h.html#92756eef3e31400803fd6fb93c3eaaab>(data, AST_FORMAT_SLINEAR <http://www.asterisk.org/doxygen/1.4/frame_8h.html#a68ce7f14882005613a3e1fb0f4181b7>);
>
>
>
> See these parts of the documentation (attempt to the recoginizer state
> information):
>
> -----
> ast_speech_start(speech);
>
> This essentially tells the speech recognition engine that you will
> be feeding audio to it from
> then on. It MUST be called every time before you start feeding audio
> to the speech structure.
>
> - Send audio to be recognized:
>
> int ast_speech_write(struct ast_speech *speech, void *data, int len)
>
> res = ast_speech_write(speech, fr->data, fr->datalen);
>
> This writes audio to the speech structure that will then be
> recognized. It must be written
> signed linear only at this time. In the future other formats may be
> supported.
>
> - Checking for results:
>
> The way the generic speech recognition API is written is that the
> speech structure will
> undergo state changes to indicate progress of recognition. The states
> are outlined below:
>
> AST_SPEECH_STATE_NOT_READY - The speech structure is not ready to
> accept audio
> AST_SPEECH_STATE_READY - You may write audio to the speech structure
> AST_SPEECH_STATE_WAIT - No more audio should be written, and results
> will be available soon.
> AST_SPEECH_STATE_DONE - Results are available and the speech structure
> can only be used again by
> calling ast_speech_start
>
> It is up to you to monitor these states. Current state is available
> via a variable on the speech
> structure. (state)
>
>
> - SpeechBackground(Sound File|Timeout):
>
> This application plays a sound file and waits for the person to
> speak. Once they start
> speaking playback of the file stops, and silence is heard. Once they
> stop talking the
> processing sound is played to indicate the speech recognition engine
> is working. Note it is
> possible to have more then one result. The first argument is the sound
> file and the second is the
> timeout. Note the timeout will only start once the sound file has
> stopped playing.
> -----
>
>
> On Sat, Jan 24, 2009 at 2:42 PM, Renato Cassaca
> <renato.cassaca at voiceinteraction.pt
> <mailto:renato.cassaca at voiceinteraction.pt>> wrote:
> > I started to test the integration of my speech recognizer but not
> everything
> > is going as expected....
> >
> > - Is ast_speech_engine->start supposed to be a synchronous function?
> > exten => 1000,1,Answer()
> > exten => 1000,n,SpeechCreate(Audimus)
> > exten => 1000,n,SpeechActivateGrammar(digitos-unidades)
> > exten => 1000,n,SpeechStart()
> > exten => 1000,n,Background(hello-world)
> > exten => 1000,n,SpeechDeactivateGrammar(digitos-unidades)
> > exten => 1000,n,Goto(internal-${SPEECH_TEXT(0)})
> >
> > I have the above dialpan (copied from docs) and what is happening is:
> > -- Executing [1000 at phones:1] Answer("SIP/1000-0334ea70", "") in new
> > stack
> > -- Executing [1000 at phones:2] SpeechCreate("SIP/1000-0334ea70",
> > "Audimus") in new stack
> > -- Executing [1000 at phones:3]
> SpeechActivateGrammar("SIP/1000-0334ea70",
> > "digitos-unidades") in new stack
> > -- Executing [1000 at phones:4] SpeechStart("SIP/1000-0334ea70", "")
> in new
> > stack
> > -- Executing [1000 at phones:5] BackGround("SIP/1000-0334ea70",
> > "hello-world") in new stack
> > -- <SIP/1000-0334ea70> Playing 'hello-world' (language 'en')
> > -- Executing [1000 at phones:6]
> > SpeechDeactivateGrammar("SIP/1000-0334ea70", "digitos-unidades") in new
> > stack
> >
> > There is no wait explicit wait for engine results and there's no call to
> > ast_speech_engine->write (no audio is being sent to the ASR).
> >
> > From the functions in ast_speech_engine which of them should be
> > synchronous?
> > How is ast_speech->state affecting the Asterisk behavior? (if you
> indicate
> > me the source file, I can check it myself)
> > What else should be done do to have audio streamed to my engine?
> >
> > Renato
> >
> >
> > Joshua Colp wrote:
> >
> > ----- "Renato Cassaca" wrote:
> >
> >
> >
> > I'm finishing the ASR integration and I have a few more questions
> > (hopefully, the last ones):
> > - ast_speech_engine->get(...): returns the next available result or
> > all pending available results?
> >
> >
> > It returns a linked list of results sorted by score.
> >
> >
> >
> > - ast_speech_engine->dtmf(...): what is the expected engine behavior?
> > - stop the recognition, ignoring the results that are being processed
> > (but not finalized yet)
> > - stop the recognition but produce all results that are being
> > processed (and can be finalized with the received audio)
> >
> >
> > This callback is purely informational. You do not need to implement it.
> >
> >
> >
> > - ast_speech_engine->list: it's managed by Asterisk, I don't have to
> > do nothing with it. Right?
> >
> >
> > Right.
> >
> >
> >
> > - ast_speech_engine->activate(...grammar...): the activated grammar is
> > exclusive or incremental?
> > That means, if the ASR has already an activated grammar, should the
> > new one be added to them or should all current ASR grammars be
> > replaced by the new one? The interpretation of this will influence the
> > implementation of deactivate...
> >
> >
> > This depends on the engine itself... you can implement it whichever
> way you
> > want. I would say have it so that you can have multiple grammars at once
> > though. This is what people would probably expect.
> >
> >
> >
> > _______________________________________________
> > --Bandwidth and Colocation Provided by http://www.api-digital.com--
> >
> > asterisk-speech-rec mailing list
> > To UNSUBSCRIBE or update options visit:
> > http://lists.digium.com/mailman/listinfo/asterisk-speech-rec
> >
>
>
>
> --
> _______________________________
> Allann J. O. Silva
>
> "I received the fundamentals of my education in school, but that was
> not enough. My real education, the superstructure, the details, the
> true architecture, I got out of the public library. For an
> impoverished child whose family could not afford to buy books, the
> library was the open door to wonder and achievement, and I can never
> be sufficiently grateful that I had the wit to charge through that
> door and make the most of it." (from I. Asimov, 1994)
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
>
> asterisk-speech-rec mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-speech-rec
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-speech-rec/attachments/20090126/83ba2ff0/attachment-0001.htm
More information about the asterisk-speech-rec
mailing list