[Asterisk-Dev] IAX spec: Text formats and character sets
Michael Giagnocavo
mgg-digium at atrevido.net
Fri Apr 29 19:43:46 MST 2005
>Michael Giagnocavo wrote:
>> Hmm, you're right. That's doesn't look bad at all.
>>
>> But... what about for comparisons and other Unicode operations? Do the
>> libraries available support some UTF-8 version of strcmp, strchr,
>> strcasecmp, etc.?
>>
>
>Some of them are easy (strcmp, for example). Most of them are harder,
>because they either need to know character boundaries, or need case
>mappings (strcasecmp, for example). Any function that searches for a
>'char' in a string also won't work if the character being searched for
>is a multi-byte one.
Not even strcmp works, because you have things like combinations where you
can represent in Unicode a character using different code points, but it's
still considered the same. Say, a Latin o with an accent mark. Using wide
char internally solves these issues, and is most likely faster, depending on
the data.
>I think it's safe to document that the on-wire format is UTF-8, but that
>the current implementations only support the single-byte subset of
>UTF-8. Any implementation is free to be extended to fully support the
>entire UTF-8 character space, providing suitable libraries can be found
>(or written).
Shouldn't say that implementations SHOULD (or preferably MUST) support
UTF-8?
-Michael
More information about the asterisk-dev
mailing list