[Asterisk-Users] Text to Speech - Someone needs to do this
Chris Albertson
chrisalbertson90278 at yahoo.com
Wed Jul 16 10:11:40 MST 2003
--- Moshe Yudkowsky <speech at pobox.com> wrote:
<SNIP>
>
> The real trick is to get the correct posidy. Here's three sentences
> with
> the same words but each with different prosidy:
>
> "I said 'yes.'
>
> "I said yes?"
>
> "_I_ said '_yes_'"???!!
>
> Both formative and concatenative systems add prosidy. Adding prosidy
> to
> whole-word concatentative systems is difficult.
The thing is that _people_ don't do text to speech. If you were to
simply read one word at a time you'd sound bad too.
Try it: if, ... you. ...were, ... to, ... simply, ...read, ...
You sound like a robot. No, we people know what it is we are
trying to comunicate if you want a synthetic voice to sound
natural you will have to tell the software the _intent_ of the words
not just the words. You would need a markup language for that
<emph> I </emph> said <quote><questionword> yes </quote></questionword>
now the system can apply some transformations to the pitch, speed
and loudness. For interactive systems markup works because the
software generating the text "knows" _why_ it is generating the text
Reading a book for the blind is a much harder problem. The
TTS system has to do the same job as a voice actor which even
includes understands the emotions of characters in a novel. Very
hard to do for a computer.
But interactive systems can use markup to get the "expresson
right.
And don't put down festival. Many (most?) of the comercial systems
_are_ festival.
you,
=====
Chris Albertson
Home: 310-376-1029 chrisalbertson90278 at yahoo.com
Cell: 310-990-7550
Office: 310-336-5189 Christopher.J.Albertson at aero.org
KG6OMK
__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com
More information about the asterisk-users
mailing list