[Asterisk-Dev] IAX spec: Text formats and character sets

Tilghman Lesher tilghman at mail.jeffandtilghman.com
Fri Apr 29 21:00:10 MST 2005


On Friday 29 April 2005 21:43, Michael Giagnocavo wrote:
> >Michael Giagnocavo wrote:
> >> Hmm, you're right. That's doesn't look bad at all.
> >>
> >> But... what about for comparisons and other Unicode operations?
> >> Do the libraries available support some UTF-8 version of strcmp,
> >> strchr, strcasecmp, etc.?
> >
> >Some of them are easy (strcmp, for example). Most of them are
> > harder, because they either need to know character boundaries, or
> > need case mappings (strcasecmp, for example). Any function that
> > searches for a 'char' in a string also won't work if the
> > character being searched for is a multi-byte one.
>
> Not even strcmp works, because you have things like combinations
> where you can represent in Unicode a character using different code
> points, but it's still considered the same. Say, a Latin o with an
> accent mark. Using wide char internally solves these issues, and is
> most likely faster, depending on the data.
>
> >I think it's safe to document that the on-wire format is UTF-8,
> > but that the current implementations only support the single-byte
> > subset of UTF-8. Any implementation is free to be extended to
> > fully support the entire UTF-8 character space, providing
> > suitable libraries can be found (or written).
>
> Shouldn't say that implementations SHOULD (or preferably MUST)
> support UTF-8?

Since it's obvious that this is an unresolved issue, we should
avoid the issue at this juncture in the IAX2 spec and simply specify
that ASCII is the character format.  If at some point in the future
these arguments are resolved, then at that time a revision may be made
to the IAX2 specification allowing UTF-8 or another character set.

-- 
Tilghman



More information about the asterisk-dev mailing list