[asterisk-dev] Text-to-Speech and Speech-to-Text

Joshua C. Colp jcolp at digium.com
Mon Mar 22 16:01:24 CDT 2021

On Mon, Mar 22, 2021 at 5:54 PM Dan Cropp <dan at amtelco.com> wrote:

> Thank you Ben.
> Looking at the TTS, would that language property be language and country?
> Example en-US, en-GB, etc.
> Will we use SSML to specify a specific voice for the language?  Examplt,
> Amazon Polly en-US language supports 4 female and 4 male voices.  Or might
> this be an additional parameter (similar to the language)?

These are arbitrary example values. The values in app_config aren't defined
within the protocol, they are opaque:

The app_config section contains arbitrary configuration options and are not
defined by this protocol. They will be able to be set by the user, and then
consumed by the external application.

Depending on the external applications and what develops we could probably
standardize some.

> Will there be a mechanism to stop the TTS on one stream when speech to
> text detects someone speaking?  Many people will interrupt automated phone
> systems.  Example, the system answers the call and plays something, a
> person familiar with the system will start speaking and they expect the
> TTS/prompts to stop playing.

This is a good point. It's probably something we should add to the
protocol, so it can communicate back 1. If it can do it and 2. When it
occurs. We can use the same thing func_talkdetect uses as a fallback if the
external application doesn't support it. The core speech stuff itself
already supports handling when they speak and stopping playback.

Joshua C. Colp
Asterisk Technical Lead
Sangoma Technologies
Check us out at www.sangoma.com and www.asterisk.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20210322/9d6c8381/attachment.html>

More information about the asterisk-dev mailing list