[asterisk-dev] Text-to-Speech and Speech-to-Text

Mon Mar 22 15:37:02 CDT 2021

Hi Ben,

I'm really excited to see this and wanted to bring up a couple use-cases
that we have implemented to see if it's something you think aligns with the
goals of the Asterisk implementation. We primarily do real time
speech-to-text for calls or conferences however a big limitation with
Asterisk and Speech APIs is speaker diarization. We have gotten around this
with phone calls by assigning each party of the call to the left and right
channel of a stereo recording, which some Speech APIs support. However,
this is inadequate for multi-party conferences. Being able to use this new
speech to text feature on a particular channel, and then including that
channel ID in the protocol would be helpful. Or perhaps even better,
support a generic user_data JSON property for us to pass custom application
specific data to the external applicatication.

I think another interesting use case would be real-time translation of a
phone call. For example, if the external application was receiving audio
from Asterisk and then sending back audio that's been translated to another
language, it would be very powerful. The audio could be sent to a separate
channel so that speakers of different languages could hear what was being
said without a translator.

Lastly, I'd love some clarification on the intended use cases of this
versus the Audio_Socket Application and EAGI, perhaps those are the more
appropriate tools for these use cases.

Benjamin Fitzgerald

ᐧ
ᐧ

On Mon, Mar 22, 2021 at 12:14 PM Ben Ford <bford at digium.com> wrote:

> Hello everyone,
>
> The Asterisk team has been working on planning better text-to-speech and
> speech-to-text functionality for Asterisk. We’ll be using a speech service
> in conjunction with an external application that connects it to Asterisk.
> More information on the protocol used for this and the overall project can
> be found here:
>
> https://wiki.asterisk.org/wiki/pages/viewpage.action?pageId=45482453
>
> After reading the wiki page, if there is anything you feel could be
> improved, we’d love to hear about it. The goal for the protocol is to make
> it generic enough to where we would be able to use it for other things
> besides text-to-speech and speech-to-text in the future. This means it
> should remain as simple as possible. We tried to come up with basic
> scenarios and give examples of what it might look like, but this may not
> cover all bases. If you see a case that the protocol would not be able to
> handle, we want to hear about that, too!
>
>
> --
> Benjamin Ford
> Software Engineer
> 256-428-6147
> Check us out at www.sangoma.com and www.asterisk.org
>
> --
> _____________________________________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20210322/ee4b254c/attachment.html>