[asterisk-speech-rec] New ASR

Allann Jones allanjos at gmail.com
Sat Jan 24 23:37:59 CST 2009


You must change the recognizer state to AST_SPEECH_STATE_READY
(speech->state) to recognizer to be ready. You control the recognizer states
for it starts (ready to start), when it is waiting results, when it received
the results, when it must stop, app_speech_utils.c works based on the
recognizer states. I has implemented a recognition module for the enterprise
that I work, the API functions better with function SpeechBackground (see
the implementation in app_speech_utils.c - speech_background() function).
SpeechStart (speech_start() in app_speech_utils.c) has a basic
implementation of the recognizer and it doesn't start some resources needed
as the speech_background does. SpeechBackground is more complete and
convenient, you call it and change the recognizer states as needed, if you
don't want to use a background playback (at this point can occur problems
with echo depending on the telephony card that can be fixed with a echo
cancellation if it is supported) you can use a empty audio file. See the
references to AST_SPEECH_STATE_READY inside speech_background() in
app_speech_utils.c and you will see the solution.

The hierarchical tree is:

app_speech_utils
    |
    v
res_speech

app_speech_utils.c implements functions that calls the res_speech.c
functions. Attention to the recognizer states explanation. Begin
implementing AST_SPEECH_STATE_READY, AST_SPEECH_STATE_NOT_READY and
AST_SPEECH_STATE_DONE for a basic implementation, after this add the other
states.

The API only works with short linear audio as showed in the code from
speech_create in app_speech_utils.c:

speech = ast_speech_new
<http://www.asterisk.org/doxygen/1.4/speech_8h.html#92756eef3e31400803fd6fb93c3eaaab>(data,
AST_FORMAT_SLINEAR
<http://www.asterisk.org/doxygen/1.4/frame_8h.html#a68ce7f14882005613a3e1fb0f4181b7>);


See these parts of the documentation (attempt to the recoginizer state
information):

-----
ast_speech_start(speech);

  This essentially tells the speech recognition engine that you will be
feeding audio to it from
then on. It MUST be called every time before you start feeding audio to the
speech structure.

- Send audio to be recognized:

int ast_speech_write(struct ast_speech *speech, void *data, int len)

res = ast_speech_write(speech, fr->data, fr->datalen);

  This writes audio to the speech structure that will then be recognized. It
must be written
signed linear only at this time. In the future other formats may be
supported.

- Checking for results:

  The way the generic speech recognition API is written is that the speech
structure will
undergo state changes to indicate progress of recognition. The states are
outlined below:

AST_SPEECH_STATE_NOT_READY - The speech structure is not ready to accept
audio
AST_SPEECH_STATE_READY - You may write audio to the speech structure
AST_SPEECH_STATE_WAIT - No more audio should be written, and results will be
available soon.
AST_SPEECH_STATE_DONE - Results are available and the speech structure can
only be used again by
calling ast_speech_start

  It is up to you to monitor these states. Current state is available via a
variable on the speech
structure. (state)


- SpeechBackground(Sound File|Timeout):

  This application plays a sound file and waits for the person to speak.
Once they start
speaking playback of the file stops, and silence is heard. Once they stop
talking the
processing sound is played to indicate the speech recognition engine is
working. Note it is
possible to have more then one result. The first argument is the sound file
and the second is the
timeout. Note the timeout will only start once the sound file has stopped
playing.
-----


On Sat, Jan 24, 2009 at 2:42 PM, Renato Cassaca <
renato.cassaca at voiceinteraction.pt> wrote:
> I started to test the integration of my speech recognizer but not
everything
> is going as expected....
>
> - Is ast_speech_engine->start supposed to be a synchronous function?
>         exten => 1000,1,Answer()
>         exten => 1000,n,SpeechCreate(Audimus)
>         exten => 1000,n,SpeechActivateGrammar(digitos-unidades)
>         exten => 1000,n,SpeechStart()
>         exten => 1000,n,Background(hello-world)
>         exten => 1000,n,SpeechDeactivateGrammar(digitos-unidades)
>         exten => 1000,n,Goto(internal-${SPEECH_TEXT(0)})
>
>  I have the above dialpan (copied from docs) and what is happening is:
>     -- Executing [1000 at phones:1] Answer("SIP/1000-0334ea70", "") in new
> stack
>     -- Executing [1000 at phones:2] SpeechCreate("SIP/1000-0334ea70",
> "Audimus") in new stack
>     -- Executing [1000 at phones:3]
SpeechActivateGrammar("SIP/1000-0334ea70",
> "digitos-unidades") in new stack
>     -- Executing [1000 at phones:4] SpeechStart("SIP/1000-0334ea70", "") in
new
> stack
>     -- Executing [1000 at phones:5] BackGround("SIP/1000-0334ea70",
> "hello-world") in new stack
>     -- <SIP/1000-0334ea70> Playing 'hello-world' (language 'en')
>     -- Executing [1000 at phones:6]
> SpeechDeactivateGrammar("SIP/1000-0334ea70", "digitos-unidades") in new
> stack
>
>  There is no wait explicit wait for engine results and there's no call to
> ast_speech_engine->write (no audio is being sent to the ASR).
>
>  From the functions in ast_speech_engine which of them should be
> synchronous?
>  How is ast_speech->state affecting the Asterisk behavior? (if you
indicate
> me the source file, I can check it myself)
>  What else should be done do to have audio streamed to my engine?
>
> Renato
>
>
> Joshua Colp wrote:
>
> ----- "Renato Cassaca" wrote:
>
>
>
> I'm finishing the ASR integration and I have a few more questions
> (hopefully, the last ones):
> - ast_speech_engine->get(...): returns the next available result or
> all pending available results?
>
>
> It returns a linked list of results sorted by score.
>
>
>
> - ast_speech_engine->dtmf(...): what is the expected engine behavior?
> - stop the recognition, ignoring the results that are being processed
> (but not finalized yet)
> - stop the recognition but produce all results that are being
> processed (and can be finalized with the received audio)
>
>
> This callback is purely informational. You do not need to implement it.
>
>
>
> - ast_speech_engine->list: it's managed by Asterisk, I don't have to
> do nothing with it. Right?
>
>
> Right.
>
>
>
> - ast_speech_engine->activate(...grammar...): the activated grammar is
> exclusive or incremental?
> That means, if the ASR has already an activated grammar, should the
> new one be added to them or should all current ASR grammars be
> replaced by the new one? The interpretation of this will influence the
> implementation of deactivate...
>
>
> This depends on the engine itself... you can implement it whichever way
you
> want. I would say have it so that you can have multiple grammars at once
> though. This is what people would probably expect.
>
>
>
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
>
> asterisk-speech-rec mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-speech-rec
>



-- 
_______________________________
Allann J. O. Silva

"I received the fundamentals of my education in school, but that was not
enough. My real education, the superstructure, the details, the true
architecture, I got out of the public library. For an impoverished child
whose family could not afford to buy books, the library was the open door to
wonder and achievement, and I can never be sufficiently grateful that I had
the wit to charge through that door and make the most of it." (from I.
Asimov, 1994)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-speech-rec/attachments/20090125/694f1ed6/attachment-0001.htm 


More information about the asterisk-speech-rec mailing list