[Asterisk-Dev] Text to speech on Asterisk

Fri Nov 14 11:07:16 MST 2003

It's great to see all of this interest in TTS for *! 

I've added a list of companies that I have info for at the end of this
post. Cepstral is by far the cheapest solution but currently limited
only to English. ElanSpeech is high quality and covers a lot of
languages and of course is more expensive. Rhetorical has
"pre-packaged libraries containing vocabulary and functions for
specific applications" (whatever that means). They have a demo for
addresses which I assume is an example of this. Eric Wieling mentioned
IBM's ViaVoice for Linux, but it appears that it is no longer available
from ScanSoft.

I didn't see any comments on the idea of a "pluggable" TTS engine
architecture. I think this is the way to go as it allows the use of
festival for development and by folks that don't want to spend the
bucks for commercial TTS. Any comments on being able to change/select
TTS engines without having to modify apps that use TTS? The tts_app
would provide a uniform way to use multiple voices, speak numbers,
addresses, read files, speak prompts etc. Not sure how much is needed
or practical but it seems like writing code to change "N. Main Street"
to "North Main Street" should only be done once. 

Languages - As I expected, there is a demand for fair selection of
languages, * is being used all over the world. Of the companies I have
checked out, ElanSpeech has the best coverage of multiple languages. I
only speak english so I have no idea how good the other languages
are. Maybe some of you that are interested in other languages could try
the demos and give some feedback.

Costs - Cepstral is the least expensive of the bunch, but is only
available in English. Actual license costs for each of the TTS engines
depends on how you intend to use it. These guys charge multiple
$1000's for high volume / high port TTS use. Additionally most want
you to sign an NDA before you get started. I believe that if we
demonstrate enough interest (meaning the willingness to purchase
runtime licenses) that one or more of these companies will work with
us, possibly with lower costs for small volume use. 

Maybe I'm a little ahead of myself here - but it appears that there
are a number of people on the list that are using Asterisk in high
volume / large installations, and might be willing to spend a few $K on
TTS.

Perhaps the best way for me to proceed from here would be to develop a
TTS app that can use either Festival or Cepstral. This would allow the
community to choose free Festival or $30 Cepstral. It would also
provide a way to add other more expensive TTS engines in the future.

The goal would be to release a open source TTS "driver" for Cepstral,
presumably one would have to buy the SDK to build the
driver. Depending on the licensing for the SDK it might be possible
to release a binary that was linked against the SDK libs. 

The other option would be to skip the SDK API's and just build an
AGI/PERL modules that talk to a TTS engine as Eric Wieling mentioned
he is working on.

If there is enough interest in an app_cepstral on it's own - I'll ask 
Cepstral about distributing a binary built against the SDK (if thats how
it works out), then I'll ask folks to chip in and help buy the SDK.

   Dave

TTS that I've investigated follows 

(if you have info on other TTS engines, please send it in a similar format):

----------------------------------------------------------------------
Cepstral <http://www.cepstral.com/>

Dev kit $299, each voice $29.99, (I'm guessing a 4 port license?  web
site doesn't say). Voices available:

  English:
    Duncan
    Emily
    Frank
    Linda
    Robin
    Walter

  French Canadian:
    Isabelle
    Jean-Pierre

  The French Canadian voices are listed as "Beta" in the demo section
  and are not available in at online store. So it looks like Cepstral
  is English only (American accent/dialect?). If anyone knows better
  chime in.

----------------------------------------------------------------------
Elanspeech <http://www.elanspeech.com>

Tempo 

  Language coverage     (with both male and female voices)
    American English
    British English
    German
    Dutch
    Continental French
    Castilian Spanish
    Italian
    Polish
    Russian
    Arabic
    Latin American Spanish
    Brazillian Portuguese

  They state that Temp is for high density applications. I believe
  that it is a little more "computer" sounding than their other TTS
  product but it uses less resources and is hence "high density".

Sayso

  Language coverage 
    American English
    French
    Spanish
    German

    (additional languages are under development)

  Is for "very high end applications", I listened to the English voice
  at SpeachTek and it was very good - although it was a little
  sing-songish, sounded slightly like Dracula!

  The Linux SDK costs $1500, I imagine that runtime licenses cost in
  the thousands and depends on the number of ports.

----------------------------------------------------------------------
Rhetorical <http://www.rhetorical.com/>

rVoice

  Language coverage (20 voices available)
    English
    Greek
    German
    Spanish

  They state that they have a flexible pricing that can be based on
  per-port, usage-based or server based. I believe the SDK and
  licensing costs are similar to ElanSpeach.