[asterisk-dev] [svn-commits] murf: branch murf/utf8-whatif r89574 - /team/murf/utf8-whatif/

Steve Murphy murf at parsetree.com
Mon Nov 26 18:32:10 CST 2007


On Mon, 2007-11-26 at 20:19 +0100, Johansson Olle E wrote:
> 26 nov 2007 kl. 16.19 skrev SVN commits to the Digium repositories:
> 
> > Author: murf
> > Date: Mon Nov 26 09:19:17 2007
> > New Revision: 89574
> >
> > URL: http://svn.digium.com/view/asterisk?view=rev&rev=89574
> > Log:
> > I'm creating this branch for i18n experiments. The Doc by Olle &  
> > Leif was intriguing.
> > (alphanumeric extensions) It says:
> >
> > • SIP uri’s are UTF 8
> > • Caller ID names are ISO8859-1
> > • Asterisk dial patterns are not standardized, but can
> >  be defined to be ASCII (we guess)
> > • IAX2 dial strings are then also ASCII
> >
> > so... what if all config files are decreed to be utf-8?


> A decision that was made based on this was that IAX2 is now all UTF8,
> including extensions, contexts and caller ID names.
> 
> Which in fact means a lot of changes to Asterisk.
> 
> Anyone with experience of Unicode/Utf8 development out there?
> Known pitfalls?

I have some experience. utf-8 is the most non-intrusive way to implement
multiple charsets. It's pretty transparent, and a LOT of utilities in
linux (and even windows support it), including editors, viewers,
terminal emulators, etc. You don't really have to change anything,
unless you need to analyze things char by char; it may take from 1 to 4
chars to form a single glyph, or char.

I don't see why contexts can't be utf8; extensions, I can understand.
That's
very char-specific stuff, and char ranges now can involve tens of
thousands (or more) of elements, etc. As I see it, we should convert the
8859 callerid at the boundaries, so they are proper utf8 like everything
else internally.

Internally, operations on utf-8 are the same as what they normally are.
They are non-zero bytes in strings. Nothing special to do, usually. All
of the current fixed strings we use in asterisk for matching config file
values/entries are exactly the same in utf8.

There will be some issues, but... I don't see any show-stoppers. We just
have to review things and look for places where there might be trouble.


It does look like the Pastry Chef on IRC has trouble with utf8, tho! Try
to find the mangled output of the above list in the commit messages on
#asterisk-commits!


murf


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3239 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20071126/72388600/attachment-0001.bin 


More information about the asterisk-dev mailing list