[asterisk-speech-rec] Speech Recognition Problems

Mon May 19 08:29:22 CDT 2008

Joshua,

Thanks for your insight.

I started looking at your app_speech_utils.c and wanted to use it as base
code to customize as my first version of nuance decoder just takes a file
and outputs the result.

After looking at the code and making first attempt, I then realized that its
much easier to add the connector module rather than customizing
speechbackground as it relies on so many things.  Speech_create etc.

I made a dummy module which just prints the status of each callback. Here is
the trace.

   -- Executing [3010 at default:1] SpeechCreate("SIP/station1-08ce3998",
"nuance") in new stack
[May 19 18:27:17] WARNING[8638]: app_mc.c:94 create: Creating Speech Engine
[May 19 18:27:17] WARNING[8638]: app_mc.c:96 create: nuance
    -- Executing [3010 at default:2]
SpeechActivateGrammar("SIP/station1-08ce3998", "company-directory") in new
stack
[May 19 18:27:17] WARNING[8638]: app_mc.c:137 activate: Activating Grammar
Name [May 19 18:27:17] WARNING[8638]: app_mc.c:139 activate: nuance
company-directory
    -- Executing [3010 at default:3] SpeechStart("SIP/station1-08ce3998", "")
in new stack
[May 19 18:27:17] WARNING[8638]: app_mc.c:163 start: Starting Engine [May 19
18:27:17] WARNING[8638]: app_mc.c:165 start: nuance
    -- Executing [3010 at default:4] SpeechBackground("SIP/station1-08ce3998",
"AppointmentTomorrow") in new stack
[May 19 18:27:17] WARNING[8638]: app_mc.c:163 start: Starting Engine [May 19
18:27:17] WARNING[8638]: app_mc.c:165 start: nuance
[May 19 18:27:17] WARNING[8638]: format_wav.c:156 check_header: Unexpected
freqency 16000
[May 19 18:27:17] WARNING[8638]: file.c:316 fn_wrapper: Unable to open
format wav
    -- Saved useragent "SJphone/1.65.377a (SJ Labs)" for peer station1
[May 19 18:28:00] WARNING[8638]: app_mc.c:105 destroy: Destroying Speech
Engine [May 19 18:28:00] WARNING[8638]: app_mc.c:107 destroy: nuance
    --

I saw two problems and few questions. SpeechStart is called twice. It is
also called from SpeechBackground. I am not sure why is it being called.

int start(struct ast_speech *speech)
 {
         ast_log(LOG_WARNING, "Starting Engine ");
         if(speech != NULL  && speech->engine != NULL )
            ast_log(LOG_WARNING, "%s\n", speech->engine->name);

        return 0;
 }

Write callback is never called. I see that in your speech_utils code that
you write to engine only when AST_SPEECH_STATE_READY is set. When is the
right time to set this? Should we set it in speechcreate? Which function
call is the right way? I cannot use change callback as it will in turn call
me again.

I thought the internal engine should call this when it detects that the
caller has started talking and then does it automatically.

if (ast_test_flag(speech, AST_SPEECH_QUIET)) ;; Who sets this flag? Does
your underlying dsp engine in asterisk processes and sets it or is the
connector module incharge to detect using its own algorithm and then set it?
I hope its the dsp engine. The reason I ask you is because the speech didn't
stop in my case at all.

I'm getting there. Thanks Joshua.

On Thu, May 15, 2008 at 9:44 PM, Joshua Colp <jcolp at digium.com> wrote:

> ----- "praveen kumar" <pbx.kumar at gmail.com> wrote:
>
> > Hello,
> >
> > Kudos to the asterisk community.
> >
>
> Greetings and salutations.
>
> > - the present solution has a big drawback. The users have to wait till
> > the end of the prompt before they can start recording. They cannot
> > speak in between and this is causing a problem. Is there a way to stop
> > playback of prompt when the speech is detected from the callee end.
> >
>
> There's nothing really built in to do speaker detection in this instance
> and stop it.
>
> > - I see that speech api is available and connectors can be written but
> > there is no proper documentation. Can we check how lumenvox has done
> > the connector? Since its GPL licensed - I am assuming it should be
> > shared.
>
> The Lumenvox connector is a binary module and is not under a GPL license.
> The source is therefore not available. As for documentation for the API it
> is correct there is no example connector module but the API is intuitive
> enough that you should be able to figure it out if you are a developer.
>
> You create an ast_speech_engine structure with callbacks to everything that
> your engine can handle. Create/destroy callbacks exist for when a speech
> object is created and destroyed. Load/unload/activate/deactivate callbacks
> exist for grammars. The start callback is called when res_speech is going to
> start feeding audio into your engine. The write callback gets called with
> audio that be fed into your engine. The get callback is called when
> res_speech wants to get results of the code. It is up to your engine to set
> flags on the speech object to indicate various things. AST_SPEECH_QUIET
> signals that the person is speaking and AST_SPEECH_HAVE_RESULTS signals that
> your engine has results from the decode.
>
> This is a rough view of things.
>
> > - backgrounddetect does stop during play but it jumps to talk
> > extension. we want the entire speech to be recorded and leaving out
> > the first fragment which triggered to jump to talk extension will not
> > server the purpose in speech detection. further, since its agi , its
> > not extension driven.
> >
>
> If you want to approach it this way you will need to do some custom coding.
>
> Joshua Colp
> Software Developer
> Digium, Inc.
>
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
>
> asterisk-speech-rec mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-speech-rec
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-speech-rec/attachments/20080519/d66797c8/attachment.htm