[Asterisk-Users] Text to Speech - Someone needs to do this

Wed Jul 16 10:11:40 MST 2003

--- Moshe Yudkowsky <speech at pobox.com> wrote:
<SNIP>
> 
> The real trick is to get the correct posidy. Here's three sentences
> with 
> the same words but each with different prosidy:
> 
> "I said 'yes.'
> 
> "I said yes?"
> 
> "_I_ said '_yes_'"???!!
> 
> Both formative and concatenative systems add prosidy. Adding prosidy
> to 
> whole-word concatentative systems is difficult.

The thing is that _people_ don't do text to speech.  If you were to 
simply read one word at a time you'd sound bad too.

Try it:  if, ... you. ...were, ... to, ... simply, ...read, ...
You sound like a robot.  No, we people know what it is we are
trying to comunicate if you want a synthetic voice to sound
natural you will have to tell the software the _intent_ of the words
not just the words.  You would need a markup language for that

<emph> I </emph> said <quote><questionword> yes </quote></questionword>

now the system can apply some transformations to the pitch, speed
and loudness.  For interactive systems markup works because the
software generating the text "knows" _why_ it is generating the text

Reading a book for the blind is a much harder problem.  The
TTS system has to do the same job as a voice actor which even
includes understands the emotions of characters in a novel.  Very
hard to do for a computer.

But interactive systems can use markup to get the "expresson
right.

And don't put down festival.  Many (most?) of the comercial systems
_are_ festival.

you,

=====
Chris Albertson
  Home:   310-376-1029  chrisalbertson90278 at yahoo.com
  Cell:   310-990-7550
  Office: 310-336-5189  Christopher.J.Albertson at aero.org
  KG6OMK

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com