[asterisk-dev] SaySentence/SoundPack Proposal

Wed Nov 27 14:14:12 CST 2013

My responses:

On Wed, Nov 27, 2013 at 10:14 AM, Tzafrir Cohen <tzafrir.cohen at xorcom.com>wrote:

> On Tue, Nov 26, 2013 at 10:04:58PM -0700, Steve Murphy wrote:
> > Hello--
> >
> > Boy, it's been a long time since I posted to the dev mailing list!
> >
> > I'd like to announce a proposal to the Asterisk Community, that I
> > introduced at Astridevcon last month. It is a new API for playing sound
> > files (mainly speech). A pdf describing the Proposal in some detail is
> at:
> >
> > http://www.gmvoices.com/downloads/steve/sayscripts.pdf
>
> My initial reaction to this is that it is a reimplementation of
> text-to-speech.
>

It will make Asterisk capable of using larger sound file sets,
sounding more natural than even text to speech.

Someday text to speech packages (Ivona, for example, which are plain
AWESOME), will
be indistinguishable from natural speech. But they haven't quite
reached that status yet. Until they do, better quality sound sets
will be possible in Asterisk. Not only for special IVR's, but for
Asterisk itself.

But, that being said, this is far from being its sole purpose. Right
now, translations are very difficult to generate, within Asterisk,
and via IVR's. The SaySentence stuff will make all this much easier.

> It also removes some existing functionality (saying a date and such).
>

Absolutely false! All the date pronouncing capabilities presented
by the Say stuff is all covered in the SaySentence spec. Check it out.
It may even cover all this better than the assembled existing say stuff.

>
> The proposal also ignores say.conf . When talking with Steve at the
> DevConf I made the mistake of assuming that as say.conf is only
> referenced in app_playback, it is only used by Playback(), which is not
> the case.
>

Wrong again. I didn't IGNORE it. This proposal replaces it. I built
the whole spec around what already exists in the various "say" stuff.
If I missed anything, let me know. As far as I know, I added some cool
stuff to the mix, like special combinations that have a different
pronunciation than just the concatenation of the individual pieces, for
example...

>
> Languages in say.conf:
>
> $ grep '^\[' configs/say.conf.sample
> [general]
> [digit-base](!)         ; base rule for digit strings
> [date-base](!)          ; base rules for dates and times
> [en-base](!)
> [en_GB](date-base,digit-base,en-base)
> [it](digit-base,date-base)
> [en](en-base,date-base,digit-base)
> [de](date-base,digit-base)
> [hu](digit-base,date-base)
> [fr](date-base,digit-base)
> [es](date-base,digit-base)
> [da](date-base,digit-base)
>
> Why isn't it more commonly used?
>

Personally, after reviewing all the stuff in the say.conf package,
my first guess is that it is not very well documented, nor advertised.
It is not used in the Asterisk core, and does not help you with stuff
hard wired into the Asterisk source.

Not only that, it is still oriented around the "group by phrase" concept,
instead of "group by full sentence". In the end, while this stuff might
help with localizing some of the substructure, it isn't tied directly to
full sentences. How can you translate anything properly without knowing
the full context? Most of the complexity introduced with gender/topic
variations
in rendering dates, numbers, etc, and codified in C all fall away when you
step back and pronounce sentences in a single unit. The basics *are* all in
the Say and say.conf related stuff, but it lacks the unifying concepts. It
is
not algorithmic, and still depends on the C coded source in say.c, which to
add
to Asterisk, means months of work and waiting as new releases roll out.
Soundpacks,
on the other hand, completely encapsulate a new language. No code
submissions to
Asterisk. Just put the sound pack together, debug, and publish. If all goes
well,
you may be able to publish sound sets not only for Asterisk, but other
phone systems
as well, using the same methodology. But, that's for the future.

>
> (On a personal note: writing the language support is a nice janitorial
> task for a native speaker of the language with a moderate level of C, as
> it easy to test and has no inherent concurrency issues)
>

The same applies to the SaySentence. Personally, I think the whole say.c
methodology is a sinking ship. It's good stuff, don't misunderstand me,
but needed a new base, a new philosophy, and some reorganization.

And, yes, there is a lot of locale-specific stuff to glean from the various
language specific routines now in Asterisk (See say.c, app_voicemail.c,
etc).
It is a bit of work to do that, and even at that, a bit buggy on top of
that.
Whether to throw it away and start from scratch, or to glean, is up to those
willing to work on these issues. I can do some, but I will need help.

> --
>                Tzafrir Cohen
>
-- 

Steve Murphy
ParseTree Corporation
✉  murf
 at 
parsetree
 dot 
com <murf at parsetree.com>
☎ 307-899-5535
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20131127/368665f3/attachment.html>