[asterisk-dev] Text-to-Speech and Speech-to-Text

Ben Ford bford at digium.com
Mon Mar 22 16:10:47 CDT 2021


  To build on Josh's response, app_config is essentially the arbitrary data
field that can pass all the information you want to the application from
Asterisk. We wanted to leave this open-ended intentionally, with the idea
in mind that it could be used for other things besides TTS and STT.

Will there be a mechanism to stop the TTS on one stream when speech to text
> detects someone speaking?
>

We'll update the wiki page for this! There should be another response type
to handle this, with defaults in place if Asterisk doesn't receive this
particular kind of response from the application.

On Mon, Mar 22, 2021 at 4:01 PM Joshua C. Colp <jcolp at digium.com> wrote:

> On Mon, Mar 22, 2021 at 5:54 PM Dan Cropp <dan at amtelco.com> wrote:
>
>> Thank you Ben.
>>
>>
>>
>> Looking at the TTS, would that language property be language and
>> country?  Example en-US, en-GB, etc.
>>
>> Will we use SSML to specify a specific voice for the language?  Examplt,
>> Amazon Polly en-US language supports 4 female and 4 male voices.  Or might
>> this be an additional parameter (similar to the language)?
>>
>
> These are arbitrary example values. The values in app_config aren't
> defined within the protocol, they are opaque:
>
> The app_config section contains arbitrary configuration options and are
> not defined by this protocol. They will be able to be set by the user, and
> then consumed by the external application.
>
> Depending on the external applications and what develops we could probably
> standardize some.
>
>
>>
>>
>> Will there be a mechanism to stop the TTS on one stream when speech to
>> text detects someone speaking?  Many people will interrupt automated phone
>> systems.  Example, the system answers the call and plays something, a
>> person familiar with the system will start speaking and they expect the
>> TTS/prompts to stop playing.
>>
>
> This is a good point. It's probably something we should add to the
> protocol, so it can communicate back 1. If it can do it and 2. When it
> occurs. We can use the same thing func_talkdetect uses as a fallback if the
> external application doesn't support it. The core speech stuff itself
> already supports handling when they speak and stopping playback.
>
> --
> Joshua C. Colp
> Asterisk Technical Lead
> Sangoma Technologies
> Check us out at www.sangoma.com and www.asterisk.org
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-dev



-- 
Benjamin Ford
Software Engineer
256-428-6147
Check us out at www.sangoma.com and www.asterisk.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20210322/5ce5d713/attachment.html>


More information about the asterisk-dev mailing list