[asterisk-dev] Text-to-Speech and Speech-to-Text

Joshua C. Colp jcolp at digium.com
Mon Mar 22 15:46:33 CDT 2021

On Mon, Mar 22, 2021 at 5:37 PM Benjamin Fitzgerald <ben at letscorp.us> wrote:

> Hi Ben,
> I'm really excited to see this and wanted to bring up a couple use-cases
> that we have implemented to see if it's something you think aligns with the
> goals of the Asterisk implementation. We primarily do real time
> speech-to-text for calls or conferences however a big limitation with
> Asterisk and Speech APIs is speaker diarization. We have gotten around this
> with phone calls by assigning each party of the call to the left and right
> channel of a stereo recording, which some Speech APIs support. However,
> this is inadequate for multi-party conferences. Being able to use this new
> speech to text feature on a particular channel, and then including that
> channel ID in the protocol would be helpful. Or perhaps even better,
> support a generic user_data JSON property for us to pass custom application
> specific data to the external applicatication.

The implementation is for the existing speech functionality present in
Asterisk for speech to text so this is not a use case currently, however in
the future this could be expanded. The protocol also allows generic
parameters to be passed to the external application, and those will be able
to be provided by the user through the speech functionality.

> I think another interesting use case would be real-time translation of a
> phone call. For example, if the external application was receiving audio
> from Asterisk and then sending back audio that's been translated to another
> language, it would be very powerful. The audio could be sent to a separate
> channel so that speakers of different languages could hear what was being
> said without a translator.

This is not a current use case but the protocol is purposely easy to extend
and made to be fairly generic, so if core functionality to support this was
added then the protocol could be extended if need be (for example a
translation type) and then used.

> Lastly, I'd love some clarification on the intended use cases of this
> versus the Audio_Socket Application and EAGI, perhaps those are the more
> appropriate tools for these use cases.

If you're purely conveying audio back and forth, then audiosocket would
most likely be the best current solution. EAGI potentially for speech to
text, but using AGI can come with a cost of forking the process and EAGI
itself is not something commonly used.

Joshua C. Colp
Asterisk Technical Lead
Sangoma Technologies
Check us out at www.sangoma.com and www.asterisk.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20210322/fbafa466/attachment.html>

More information about the asterisk-dev mailing list