[Asterisk-Dev] Text to speech on Asterisk

astdev at newww.com astdev at newww.com
Thu Nov 13 09:51:24 MST 2003


I'd like to kick around some ideas about using various text to speech
engines with *.

First a question, I few weeks ago someone made a comment about using
Cepstral with *. Is the Cepstral TTS for Linux a direct replacement
for Festival? or is there a app_cepstral that exists somewhere? The
quality of Cepstral is such that I wouldn't mind spending $30 just to
try it out.  So I'm looking for info on how to use it with *

Second - I'd like to get a discussion going about TTS with Asterisk in
general.

I've been using * since mid-summer and got festival working more or
less right away. Not wanting to start a battle over the "Quality" of
speech generated by Festival - using it as configured right out of the
box - it's not good enough for my application. Of course assessing the
"Quality" of TTS is a rather subjective - it depends on the nature of
the text being spoken. For instance, I've used festival to read long
blocks of text, such as email or fiction and it is rather
comprehensible, but when used to read names and addresses, I often
can't understand it.

There was a thread back in July with this post:
<http://lists.digium.com/pipermail/asterisk-users/2003-July/015962.html>
that mentions that festival out of the box is using a "demo" voice and
that one can create new voices and embed markup etc. that will improve
festival. I think most of us don't have the time or expertise to get
this done and would rather contribute/fund/test or buy a commercial
package. Has anyone on this list worked on improving festival that is
willing to share?

Festival appears to be the only open-source TTS option for use with
*. There are other commercial TTS that range in price and quality
several of which are based on festival. It seems to me that it would
be a good approach to create a generic app_tts that could then use
whatever backend engine the developer desires. This would be similar
to the way Perl handles database engines thru the DBI/DBD
modules. Anyone could write a TTS "driver" for a particular engine
that could then be plugged in / specified by the app_tts. If the
general opinion is that this is a good idea, I'd be happy to take the
project on.

SO - What TTS engines are currently being used with Asterisk? So far
I've only found evidence of Festival and Cepstral. Others mentioned in
the list are AT&T Natural Voice, ScanSoft RealSpeak.

The other side of this is getting the commercial vendors to work with
us. I've started talking to both Rhetorical and Elanspeech - with
minor success. They need to be convinced that there is enough demand
to make working with us viable and in general they want NDA's
signed. By having an architecture that splits the diver from the
interface - companies that insist on keeping the API's private could
not release the source. I think that if enough interest is shown a
company could be convinced to support the development of a TTS driver
and wouldn't mind having their API exposed to the extent it was used
in the driver (the writer of the driver wouldn't be publishing the API
docs, the driver would just be an example of using the commercial API).

Depending on where this all leads, we would need to document /
demonstrate real (ie. revenue generating) interest in commercial
TTS. This is likely another thread in the future. 

Please don't take this as an Open vs. Commercial fight - I'd like to
see the TTS choices for Asterisk, both commercial and open, greatly
increased and I believe that this would be a very good thing for
Asterisk, in much the same way as the choice of backend database is
wide open for Perl. (oh yeah, it would also be nice to have a similar
way to plug any database into *).

What do you think?

         Dave






More information about the asterisk-dev mailing list