[Asterisk-Dev] IAX spec: Text formats and character sets

Michael Giagnocavo mgg-digium at atrevido.net
Fri Apr 29 19:43:46 MST 2005


>Michael Giagnocavo wrote:
>> Hmm, you're right. That's doesn't look bad at all.
>> 
>> But... what about for comparisons and other Unicode operations? Do the
>> libraries available support some UTF-8 version of strcmp, strchr,
>> strcasecmp, etc.?
>>
>
>Some of them are easy (strcmp, for example). Most of them are harder, 
>because they either need to know character boundaries, or need case 
>mappings (strcasecmp, for example). Any function that searches for a 
>'char' in a string also won't work if the character being searched for 
>is a multi-byte one.

Not even strcmp works, because you have things like combinations where you
can represent in Unicode a character using different code points, but it's
still considered the same. Say, a Latin o with an accent mark. Using wide
char internally solves these issues, and is most likely faster, depending on
the data.

>I think it's safe to document that the on-wire format is UTF-8, but that 
>the current implementations only support the single-byte subset of 
>UTF-8. Any implementation is free to be extended to fully support the 
>entire UTF-8 character space, providing suitable libraries can be found 
>(or written).

Shouldn't say that implementations SHOULD (or preferably MUST) support
UTF-8? 

-Michael





More information about the asterisk-dev mailing list