<br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">You must change the recognizer state to AST_SPEECH_STATE_READY (speech->state) to recognizer to be ready. You control the recognizer states for it starts (ready to start), when it is waiting results, when it received the results, when it must stop, app_speech_utils.c works based on the recognizer states. I has implemented a recognition module for the enterprise that I work, the API functions better with function SpeechBackground (see the implementation in app_speech_utils.c - speech_background() function). SpeechStart (speech_start() in app_speech_utils.c) has a basic implementation of the recognizer and it doesn't start some resources needed as the speech_background does. SpeechBackground is more complete and convenient, you call it and change the recognizer states as needed, if you don't want to use a background playback (at this point can occur problems with echo depending on the telephony card that can be fixed with a echo cancellation if it is supported) you can use a empty audio file. See the references to </span><span style="font-family: courier new,monospace;">AST_SPEECH_STATE_READY inside speech_background() in app_speech_utils.c and you will see the solution.<br style="font-family: courier new,monospace;">
</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">The hierarchical tree is:</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">app_speech_utils</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> |</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> v</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">res_speech</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">app_speech_utils.c implements functions that calls the res_speech.c functions. Attention to the recognizer states explanation. Begin implementing AST_SPEECH_STATE_READY, AST_SPEECH_STATE_NOT_READY and AST_SPEECH_STATE_DONE for a basic implementation, after this add the other states.</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">The API only works with short linear audio as showed in the code from speech_create in app_speech_utils.c:</span><br style="font-family: courier new,monospace;">
<pre style="font-family: courier new,monospace;" class="fragment">speech = <a class="code" href="http://www.asterisk.org/doxygen/1.4/speech_8h.html#92756eef3e31400803fd6fb93c3eaaab" title="Create a new speech structure.">ast_speech_new</a>(data, <a class="code" href="http://www.asterisk.org/doxygen/1.4/frame_8h.html#a68ce7f14882005613a3e1fb0f4181b7">AST_FORMAT_SLINEAR</a>);<br>
</pre><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">See these parts of the documentation (attempt to the recoginizer state information):</span><br style="font-family: courier new,monospace;">
<br>-----<br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">ast_speech_start(speech);</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> This essentially tells the speech recognition engine that you will be feeding audio to it from </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">then on. It MUST be called every time before you start feeding audio to the speech structure.</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">- Send audio to be recognized:</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">        int ast_speech_write(struct ast_speech *speech, void *data, int len)</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">        res = ast_speech_write(speech, fr->data, fr->datalen);</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> This writes audio to the speech structure that will then be recognized. It must be written </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">signed linear only at this time. In the future other formats may be supported.</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">- Checking for results:</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> The way the generic speech recognition API is written is that the speech structure will </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">undergo state changes to indicate progress of recognition. The states are outlined below:</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">        AST_SPEECH_STATE_NOT_READY - The speech structure is not ready to accept audio</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">        AST_SPEECH_STATE_READY - You may write audio to the speech structure</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">        AST_SPEECH_STATE_WAIT - No more audio should be written, and results will be available soon.</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">        AST_SPEECH_STATE_DONE - Results are available and the speech structure can only be used again by </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">                                calling ast_speech_start</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> It is up to you to monitor these states. Current state is available via a variable on the speech </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">structure. (state)</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">- SpeechBackground(Sound File|Timeout):</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> This application plays a sound file and waits for the person to speak. Once they start </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">speaking playback of the file stops, and silence is heard. Once they stop talking the </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">processing sound is played to indicate the speech recognition engine is working. Note it is </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">possible to have more then one result. The first argument is the sound file and the second is the </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">timeout. Note the timeout will only start once the sound file has stopped playing.</span><br style="font-family: courier new,monospace;">
-----<br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">On Sat, Jan 24, 2009 at 2:42 PM, Renato Cassaca <<a href="mailto:renato.cassaca@voiceinteraction.pt">renato.cassaca@voiceinteraction.pt</a>> wrote:</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> I started to test the integration of my speech recognizer but not everything</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> is going as expected....</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> - Is ast_speech_engine->start supposed to be a synchronous function?</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten => 1000,1,Answer()</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> exten => 1000,n,SpeechCreate(Audimus)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten => 1000,n,SpeechActivateGrammar(digitos-unidades)</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> exten => 1000,n,SpeechStart()</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten => 1000,n,Background(hello-world)</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> exten => 1000,n,SpeechDeactivateGrammar(digitos-unidades)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> exten => 1000,n,Goto(internal-${SPEECH_TEXT(0)})</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> I have the above dialpan (copied from docs) and what is happening is:</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> -- Executing [1000@phones:1] Answer("SIP/1000-0334ea70", "") in new</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> stack</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> -- Executing [1000@phones:2] SpeechCreate("SIP/1000-0334ea70",</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> "Audimus") in new stack</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> -- Executing [1000@phones:3] SpeechActivateGrammar("SIP/1000-0334ea70",</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> "digitos-unidades") in new stack</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> -- Executing [1000@phones:4] SpeechStart("SIP/1000-0334ea70", "") in new</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> stack</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> -- Executing [1000@phones:5] BackGround("SIP/1000-0334ea70",</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> "hello-world") in new stack</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> -- <SIP/1000-0334ea70> Playing 'hello-world' (language 'en')</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> -- Executing [1000@phones:6]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> SpeechDeactivateGrammar("SIP/1000-0334ea70", "digitos-unidades") in new</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> stack</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> There is no wait explicit wait for engine results and there's no call to</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> ast_speech_engine->write (no audio is being sent to the ASR).</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> From the functions in ast_speech_engine which of them should be</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> synchronous?</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> How is ast_speech->state affecting the Asterisk behavior? (if you indicate</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> me the source file, I can check it myself)</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> What else should be done do to have audio streamed to my engine?</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> Renato</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> Joshua Colp wrote:</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> ----- "Renato Cassaca" wrote:</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> I'm finishing the ASR integration and I have a few more questions</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> (hopefully, the last ones):</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> - ast_speech_engine->get(...): returns the next available result or</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> all pending available results?</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> It returns a linked list of results sorted by score.</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> - ast_speech_engine->dtmf(...): what is the expected engine behavior?</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> - stop the recognition, ignoring the results that are being processed</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> (but not finalized yet)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> - stop the recognition but produce all results that are being</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> processed (and can be finalized with the received audio)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> This callback is purely informational. You do not need to implement it.</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> - ast_speech_engine->list: it's managed by Asterisk, I don't have to</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> do nothing with it. Right?</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> Right.</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> - ast_speech_engine->activate(...grammar...): the activated grammar is</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> exclusive or incremental?</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> That means, if the ASR has already an activated grammar, should the</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> new one be added to them or should all current ASR grammars be</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> replaced by the new one? The interpretation of this will influence the</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> implementation of deactivate...</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> This depends on the engine itself... you can implement it whichever way you</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> want. I would say have it so that you can have multiple grammars at once</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> though. This is what people would probably expect. </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> _______________________________________________</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> --Bandwidth and Colocation Provided by <a href="http://www.api-digital.com--">http://www.api-digital.com--</a></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> asterisk-speech-rec mailing list</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">> To UNSUBSCRIBE or update options visit:</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">> <a href="http://lists.digium.com/mailman/listinfo/asterisk-speech-rec">http://lists.digium.com/mailman/listinfo/asterisk-speech-rec</a></span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">></span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">-- </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">_______________________________</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">Allann J. O. Silva</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">"I received the fundamentals of my education in school, but that was not enough. My real education, the superstructure, the details, the true architecture, I got out of the public library. For an impoverished child whose family could not afford to buy books, the library was the open door to wonder and achievement, and I can never be sufficiently grateful that I had the wit to charge through that door and make the most of it." (from I. Asimov, 1994)</span><br style="font-family: courier new,monospace;">
<br style="font-family: courier new,monospace;">