revision of 'say' Re: [asterisk-dev] bugs a plenty - discuss....

Fri Mar 10 11:30:33 MST 2006

posting here but maybe should go to -users as well because
people might help from there as well.

this is a call to all non-english speakers to
help designing the config files for the text-based
"say" implementation.

What we basically need is a set of rules that map
numbers, dates and times in the individual components.
These rules are written in a way that is similar to
dialplan entries - each rule will have on the left
side a pattern to match some numbers/dates/times, and
on the right side a sequence of components that should be
spelled out.

As an example, i am attaching a simple configuration for
english and italian numbers, and enumerations.

If you have a look at the comments, perhaps you can
come out with a description for your language,
and point out exceptions that you don't know how
to represent with this scheme (so we need to find
a solution or enrich the scheme to support your
requirement).

Keep in mind that in the end, in asterisk pronouncing
a number or a date or a time means mapping it into
a sequence of files that must be played out and contain
the components of the number.

Your feedback required. If someone feels like posting
this to -user, please do so.

Feedback to me or to the list, as you like

And if you want to try the code that implements this
stuff, it is in my branch team/rizzo/base, with the
configuration file in configs/say.conf.sample
(to be copied into say.conf), and the actions can
be triggered by dialplan lines of the kind

	exten => _X.,1,PlayBack(${EXTEN}|say)	; for numbers
	exten => _X.,1,PlayBack(date|say)	; for dates

remember, you have to fill say.conf with your patterns.

	cheers
	luigi

----------------------------------------------------

The configuration for each language is in a section
named with the language in the file say.conf.

Take the case for english numbers, we have the following:
- the section name is 

	[en]

  (for italian it would have been [it] )

- leading zeros are not significant, so we skip them and
  pronounce the remaining part. The corresponding rule is

        _0. => say:${SAY:1}

  where the left matches a 0 digit followed by 1 or more components,
  and the right side is just a recursive invocation of 'say' with an
  argument which is the string (in a variable named SAY)minus the
  first character (ordinary asterisk variable syntax)

- single-digit numbers are prononced as they are, so the rule is

        _X => digits/${SAY}

  where the pattern on the left matches any single digit (including 0)
  and the right side is a filename

- two-digit numbers between 10 and 19 are pronounced as a single word, so

        _1X => digits/${SAY}

  same as above, the left pattern matches those numbers, the right hand
  side maps to a filename

- also multiples of 10 are a single word, so we have a similar rule

        _[2-9]0 =>  digits/${SAY}

- other two-digit numbers are two words e,g, 83 is 80 followed by 3,
  and the rule is the following:

        _[2-9][1-9] =>  digits/${SAY:0:1}0, say:${SAY:1}

  here as you see the right hand side has two parts, a file name and
  a recursive invocation. I could have written an equivalent rule

        _[2-9][1-9] =>  digits/${SAY:0:1}0, digits/${SAY:1:1}

- three-digit numbers are made of three words, as follows

        _XXX => say:${SAY:0:1}, digits/hundred, say:${SAY:1}

  or equivalently

        _XXX => digits/${SAY:0:1}, digits/hundred, say:${SAY:1}

  Note that in writing this rule we rely on the fact that asterisk
  has a 'shortest pattern match' algorithm - ie a number such as
  053 would also match _0. pattern, which is shorter thus gets selected.
  If we don't rely on that, we should write the pattern as

        _0XX => digits/hundred, say:${SAY:1}
        _[1-9]XX => say:${SAY:0:1}, digits/hundred, say:${SAY:1}

- and so on for thousands and millions...

        _XXXX => say:${SAY:0:1}, digits/thousand, say:${SAY:1}
        _XXXXX => say:${SAY:0:2}, digits/thousand, say:${SAY:2}
        _XXXXXX => say:${SAY:0:3}, digits/thousand, say:${SAY:3}

        _XXXXXXX => say:${SAY:0:1}, digits/million, say:${SAY:1}
        _XXXXXXXX => say:${SAY:0:2}, digits/million, say:${SAY:2}
        _XXXXXXXXX => say:${SAY:0:3}, digits/million, say:${SAY:3}

        _XXXXXXXXXX => say:${SAY:0:1}, digits/billion, say:${SAY:1}
        _XXXXXXXXXXX => say:${SAY:0:2}, digits/billion, say:${SAY:2}
        _XXXXXXXXXXXX => say:${SAY:0:3}, digits/billion, say:${SAY:3}

Enumerations are identified by a special prefix string 'enum' but other
than that the same reasoning applies: we select the pattern on the
left, and play (directly or recursively) its components, which can
be plain numbers or enumerations. So the rules start with

        ; enumeration
        ; single digit
        _enum:X => digits/h-${SAY}
        ; eleventh..nineteenth
        _enum:1X => digits/h-${SAY}
        ; twentyeth, thirtyeth...
        _enum:[2-9]0 => digits/h-${SAY}
        ; twenty first, twenty second... ninety ninth
        _enum:[2-9][1-9] => say:${SAY:0:1}0, digits/h-${SAY:1}
        ; X hundred twenty fifth ...
        _enum:[1-9]XX => say:${SAY:0:1}, digits/hundred, say:enum:${SAY:1}

For dates and times, we have special prefixes.
say_date or say_time translates in the pattern 'date' or 'time',
and its components (day, day of week, minutes...) in the right hand
side can be identified by %x where x is a character from strftime
(e.g. %Y means 'the full year number). This parameter then matches
rules with the prefix _c: (e.g. _Y:. for the year) and at this point
the variable SAY contains a number which can be used to build a
file name or pronounce a number. As an example the syntax for date
in italian order (day, month, year) is

        ; any date is prononuced as day month year
        _date => say:%d, say:%m, say:%Y

        ; any day irrespective of the value is pronounced as a number
        _d:. => say:${SAY}

        ; in fact if we are picky, the first of the month is an ordinal
        ; so the rule should be

        _d:1 => digits/h-1
        _d:[2-9] => digits/${SAY}
        _d:[1-3][0-9] => digits/${SAY:0:1}, digits/${SAY:1:1}

        ; then the month is just the month's name
        _m:. => digits/mon-${SAY}

        ; and the year is just a number
        _Y:. => say:${SAY}

        ; whereas in english it would be something like
        _Y:1[1-9]XX => say:${SAY:0:2}, digits/hundred, say:${SAY:2:2}
        _Y:20XX => say:2000, say:${SAY:2:2}

etc.

Hope you get the idea.