[Asterisk-Dev] Text to speech on Asterisk
astdev at newww.com
astdev at newww.com
Fri Nov 14 11:07:16 MST 2003
It's great to see all of this interest in TTS for *!
I've added a list of companies that I have info for at the end of this
post. Cepstral is by far the cheapest solution but currently limited
only to English. ElanSpeech is high quality and covers a lot of
languages and of course is more expensive. Rhetorical has
"pre-packaged libraries containing vocabulary and functions for
specific applications" (whatever that means). They have a demo for
addresses which I assume is an example of this. Eric Wieling mentioned
IBM's ViaVoice for Linux, but it appears that it is no longer available
from ScanSoft.
I didn't see any comments on the idea of a "pluggable" TTS engine
architecture. I think this is the way to go as it allows the use of
festival for development and by folks that don't want to spend the
bucks for commercial TTS. Any comments on being able to change/select
TTS engines without having to modify apps that use TTS? The tts_app
would provide a uniform way to use multiple voices, speak numbers,
addresses, read files, speak prompts etc. Not sure how much is needed
or practical but it seems like writing code to change "N. Main Street"
to "North Main Street" should only be done once.
Languages - As I expected, there is a demand for fair selection of
languages, * is being used all over the world. Of the companies I have
checked out, ElanSpeech has the best coverage of multiple languages. I
only speak english so I have no idea how good the other languages
are. Maybe some of you that are interested in other languages could try
the demos and give some feedback.
Costs - Cepstral is the least expensive of the bunch, but is only
available in English. Actual license costs for each of the TTS engines
depends on how you intend to use it. These guys charge multiple
$1000's for high volume / high port TTS use. Additionally most want
you to sign an NDA before you get started. I believe that if we
demonstrate enough interest (meaning the willingness to purchase
runtime licenses) that one or more of these companies will work with
us, possibly with lower costs for small volume use.
Maybe I'm a little ahead of myself here - but it appears that there
are a number of people on the list that are using Asterisk in high
volume / large installations, and might be willing to spend a few $K on
TTS.
Perhaps the best way for me to proceed from here would be to develop a
TTS app that can use either Festival or Cepstral. This would allow the
community to choose free Festival or $30 Cepstral. It would also
provide a way to add other more expensive TTS engines in the future.
The goal would be to release a open source TTS "driver" for Cepstral,
presumably one would have to buy the SDK to build the
driver. Depending on the licensing for the SDK it might be possible
to release a binary that was linked against the SDK libs.
The other option would be to skip the SDK API's and just build an
AGI/PERL modules that talk to a TTS engine as Eric Wieling mentioned
he is working on.
If there is enough interest in an app_cepstral on it's own - I'll ask
Cepstral about distributing a binary built against the SDK (if thats how
it works out), then I'll ask folks to chip in and help buy the SDK.
Dave
TTS that I've investigated follows
(if you have info on other TTS engines, please send it in a similar format):
----------------------------------------------------------------------
Cepstral <http://www.cepstral.com/>
Dev kit $299, each voice $29.99, (I'm guessing a 4 port license? web
site doesn't say). Voices available:
English:
Duncan
Emily
Frank
Linda
Robin
Walter
French Canadian:
Isabelle
Jean-Pierre
The French Canadian voices are listed as "Beta" in the demo section
and are not available in at online store. So it looks like Cepstral
is English only (American accent/dialect?). If anyone knows better
chime in.
----------------------------------------------------------------------
Elanspeech <http://www.elanspeech.com>
Tempo
Language coverage (with both male and female voices)
American English
British English
German
Dutch
Continental French
Castilian Spanish
Italian
Polish
Russian
Arabic
Latin American Spanish
Brazillian Portuguese
They state that Temp is for high density applications. I believe
that it is a little more "computer" sounding than their other TTS
product but it uses less resources and is hence "high density".
Sayso
Language coverage
American English
French
Spanish
German
(additional languages are under development)
Is for "very high end applications", I listened to the English voice
at SpeachTek and it was very good - although it was a little
sing-songish, sounded slightly like Dracula!
The Linux SDK costs $1500, I imagine that runtime licenses cost in
the thousands and depends on the number of ports.
----------------------------------------------------------------------
Rhetorical <http://www.rhetorical.com/>
rVoice
Language coverage (20 voices available)
English
Greek
German
Spanish
They state that they have a flexible pricing that can be based on
per-port, usage-based or server based. I believe the SDK and
licensing costs are similar to ElanSpeach.
More information about the asterisk-dev
mailing list