<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
I have now my speech recognizer integrated with Asterisk!<br>
<br>
The engine is called Audimus and it's made by VoiceInteraction.pt.<br>
It supports the following languages:<br>
- European Portuguese<br>
- Brazilian Portuguese<br>
- Angolan Portuguese<br>
- Spain Spanish<br>
- Some Latin American Spanish variants<br>
- US English<br>
<br>
Thanks Joshua and Allann for the valuable help that you provided me!<br>
<br>
Renato<br>
<br>
<br>
Allann Jones wrote:
<blockquote
cite="mid:7a2435710901242137m3e418b4sa0e2aeaee929599a@mail.gmail.com"
type="cite"><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">You must change the
recognizer state to AST_SPEECH_STATE_READY (speech->state) to
recognizer to be ready. You control the recognizer states for it starts
(ready to start), when it is waiting results, when it received the
results, when it must stop, app_speech_utils.c works based on the
recognizer states. I has implemented a recognition module for the
enterprise that I work, the API functions better with function
SpeechBackground (see the implementation in app_speech_utils.c -
speech_background() function). SpeechStart (speech_start() in
app_speech_utils.c) has a basic implementation of the recognizer and it
doesn't start some resources needed as the speech_background does.
SpeechBackground is more complete and convenient, you call it and
change the recognizer states as needed, if you don't want to use a
background playback (at this point can occur problems with echo
depending on the telephony card that can be fixed with a echo
cancellation if it is supported) you can use a empty audio file. See
the references to </span><span
style="font-family: courier new,monospace;">AST_SPEECH_STATE_READY
inside speech_background() in app_speech_utils.c and you will see the
solution.<br style="font-family: courier new,monospace;">
</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">The hierarchical
tree is:</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">app_speech_utils</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> |</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> v</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">res_speech</span><br
style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">app_speech_utils.c
implements functions that calls the res_speech.c functions. Attention
to the recognizer states explanation. Begin implementing
AST_SPEECH_STATE_READY, AST_SPEECH_STATE_NOT_READY and
AST_SPEECH_STATE_DONE for a basic implementation, after this add the
other states.</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">The API only works
with short linear audio as showed in the code from speech_create in
app_speech_utils.c:</span><br
style="font-family: courier new,monospace;">
<pre style="font-family: courier new,monospace;" class="fragment">speech = <a
moz-do-not-send="true" class="code"
href="http://www.asterisk.org/doxygen/1.4/speech_8h.html#92756eef3e31400803fd6fb93c3eaaab"
title="Create a new speech structure.">ast_speech_new</a>(data, <a
moz-do-not-send="true" class="code"
href="http://www.asterisk.org/doxygen/1.4/frame_8h.html#a68ce7f14882005613a3e1fb0f4181b7">AST_FORMAT_SLINEAR</a>);
</pre>
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">See these parts of
the documentation (attempt to the recoginizer state information):</span><br
style="font-family: courier new,monospace;">
<br>
-----<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">ast_speech_start(speech);</span><br
style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> This essentially
tells the speech recognition engine that you will be feeding audio to
it from </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">then on. It MUST be
called every time before you start feeding audio to the speech
structure.</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">- Send audio to be
recognized:</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> int
ast_speech_write(struct ast_speech *speech, void *data, int len)</span><br
style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> res =
ast_speech_write(speech, fr->data, fr->datalen);</span><br
style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> This writes audio
to the speech structure that will then be recognized. It must be
written </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">signed linear only
at this time. In the future other formats may be supported.</span><br
style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">- Checking for
results:</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> The way the
generic speech recognition API is written is that the speech structure
will </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">undergo state
changes to indicate progress of recognition. The states are outlined
below:</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">
AST_SPEECH_STATE_NOT_READY - The speech structure is not ready to
accept audio</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">
AST_SPEECH_STATE_READY - You may write audio to the speech structure</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">
AST_SPEECH_STATE_WAIT - No more audio should be written, and results
will be available soon.</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">
AST_SPEECH_STATE_DONE - Results are available and the speech structure
can only be used again by </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> calling
ast_speech_start</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> It is up to you
to monitor these states. Current state is available via a variable on
the speech </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">structure. (state)</span><br
style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">-
SpeechBackground(Sound File|Timeout):</span><br
style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> This application
plays a sound file and waits for the person to speak. Once they start </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">speaking playback
of the file stops, and silence is heard. Once they stop talking the </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">processing sound is
played to indicate the speech recognition engine is working. Note it is
</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">possible to have
more then one result. The first argument is the sound file and the
second is the </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">timeout. Note the
timeout will only start once the sound file has stopped playing.</span><br
style="font-family: courier new,monospace;">
-----<br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">On Sat, Jan 24,
2009 at 2:42 PM, Renato Cassaca <<a moz-do-not-send="true"
href="mailto:renato.cassaca@voiceinteraction.pt">renato.cassaca@voiceinteraction.pt</a>>
wrote:</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> I started to
test the integration of my speech recognizer but not everything</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> is going as
expected....</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> - Is
ast_speech_engine->start supposed to be a synchronous function?</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten
=> 1000,1,Answer()</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten
=> 1000,n,SpeechCreate(Audimus)</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten
=> 1000,n,SpeechActivateGrammar(digitos-unidades)</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten
=> 1000,n,SpeechStart()</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten
=> 1000,n,Background(hello-world)</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten
=> 1000,n,SpeechDeactivateGrammar(digitos-unidades)</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten
=> 1000,n,Goto(internal-${SPEECH_TEXT(0)})</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> I have the
above dialpan (copied from docs) and what is happening is:</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> --
Executing [1000@phones:1] Answer("SIP/1000-0334ea70", "") in new</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> stack</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> --
Executing [1000@phones:2] SpeechCreate("SIP/1000-0334ea70",</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> "Audimus") in
new stack</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> --
Executing [1000@phones:3] SpeechActivateGrammar("SIP/1000-0334ea70",</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">>
"digitos-unidades") in new stack</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> --
Executing [1000@phones:4] SpeechStart("SIP/1000-0334ea70", "") in new</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> stack</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> --
Executing [1000@phones:5] BackGround("SIP/1000-0334ea70",</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> "hello-world")
in new stack</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> --
<SIP/1000-0334ea70> Playing 'hello-world' (language 'en')</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> --
Executing [1000@phones:6]</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">>
SpeechDeactivateGrammar("SIP/1000-0334ea70", "digitos-unidades") in new</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> stack</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> There is no
wait explicit wait for engine results and there's no call to</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">>
ast_speech_engine->write (no audio is being sent to the ASR).</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> From the
functions in ast_speech_engine which of them should be</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> synchronous?</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> How is
ast_speech->state affecting the Asterisk behavior? (if you indicate</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> me the source
file, I can check it myself)</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> What else
should be done do to have audio streamed to my engine?</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> Renato</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> Joshua Colp
wrote:</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> ----- "Renato
Cassaca" wrote:</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> I'm finishing
the ASR integration and I have a few more questions</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> (hopefully,
the last ones):</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> -
ast_speech_engine->get(...): returns the next available result or</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> all pending
available results?</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> It returns a
linked list of results sorted by score.</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> -
ast_speech_engine->dtmf(...): what is the expected engine behavior?</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> - stop the
recognition, ignoring the results that are being processed</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> (but not
finalized yet)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> - stop the
recognition but produce all results that are being</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> processed (and
can be finalized with the received audio)</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> This callback
is purely informational. You do not need to implement it.</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> -
ast_speech_engine->list: it's managed by Asterisk, I don't have to</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> do nothing
with it. Right?</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> Right.</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> -
ast_speech_engine->activate(...grammar...): the activated grammar is</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exclusive or
incremental?</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> That means, if
the ASR has already an activated grammar, should the</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> new one be
added to them or should all current ASR grammars be</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> replaced by
the new one? The interpretation of this will influence the</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> implementation
of deactivate...</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> This depends
on the engine itself... you can implement it whichever way you</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> want. I would
say have it so that you can have multiple grammars at once</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> though. This
is what people would probably expect. </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">>
_______________________________________________</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> --Bandwidth
and Colocation Provided by <a moz-do-not-send="true"
href="http://www.api-digital.com--">http://www.api-digital.com--</a></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">>
asterisk-speech-rec mailing list</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> To UNSUBSCRIBE
or update options visit:</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> <a
moz-do-not-send="true"
href="http://lists.digium.com/mailman/listinfo/asterisk-speech-rec">http://lists.digium.com/mailman/listinfo/asterisk-speech-rec</a></span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br
style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">-- </span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">_______________________________</span><br
style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">Allann J. O. Silva</span><br
style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">"I received the
fundamentals of my education in school, but that was not enough. My
real education, the superstructure, the details, the true architecture,
I got out of the public library. For an impoverished child whose family
could not afford to buy books, the library was the open door to wonder
and achievement, and I can never be sufficiently grateful that I had
the wit to charge through that door and make the most of it." (from I.
Asimov, 1994)</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">
<pre wrap="">
<hr size="4" width="90%">
_______________________________________________
--Bandwidth and Colocation Provided by <a class="moz-txt-link-freetext" href="http://www.api-digital.com">http://www.api-digital.com</a>--
asterisk-speech-rec mailing list
To UNSUBSCRIBE or update options visit:
<a class="moz-txt-link-freetext" href="http://lists.digium.com/mailman/listinfo/asterisk-speech-rec">http://lists.digium.com/mailman/listinfo/asterisk-speech-rec</a></pre>
</blockquote>
</body>
</html>