[asterisk-speech-rec] Speech Recognition Problems

Thu May 15 11:14:28 CDT 2008

----- "praveen kumar" <pbx.kumar at gmail.com> wrote:

> Hello,
> 
> Kudos to the asterisk community.
> 

Greetings and salutations.

> - the present solution has a big drawback. The users have to wait till
> the end of the prompt before they can start recording. They cannot
> speak in between and this is causing a problem. Is there a way to stop
> playback of prompt when the speech is detected from the callee end.
>

There's nothing really built in to do speaker detection in this instance and stop it.

> - I see that speech api is available and connectors can be written but
> there is no proper documentation. Can we check how lumenvox has done
> the connector? Since its GPL licensed - I am assuming it should be
> shared.

The Lumenvox connector is a binary module and is not under a GPL license. The source is therefore not available. As for documentation for the API it is correct there is no example connector module but the API is intuitive enough that you should be able to figure it out if you are a developer.

You create an ast_speech_engine structure with callbacks to everything that your engine can handle. Create/destroy callbacks exist for when a speech object is created and destroyed. Load/unload/activate/deactivate callbacks exist for grammars. The start callback is called when res_speech is going to start feeding audio into your engine. The write callback gets called with audio that be fed into your engine. The get callback is called when res_speech wants to get results of the code. It is up to your engine to set flags on the speech object to indicate various things. AST_SPEECH_QUIET signals that the person is speaking and AST_SPEECH_HAVE_RESULTS signals that your engine has results from the decode.

This is a rough view of things.

> - backgrounddetect does stop during play but it jumps to talk
> extension. we want the entire speech to be recorded and leaving out
> the first fragment which triggered to jump to talk extension will not
> server the purpose in speech detection. further, since its agi , its
> not extension driven.
> 

If you want to approach it this way you will need to do some custom coding.

Joshua Colp
Software Developer
Digium, Inc.