[Asterisk-Dev] IAX spec: Text formats and character sets

Sat Apr 30 03:07:16 MST 2005

Michael Giagnocavo wrote:

>>Michael Giagnocavo wrote:
>>    
>>
>>>Hmm, you're right. That's doesn't look bad at all.
>>>
>>>But... what about for comparisons and other Unicode operations? Do the
>>>libraries available support some UTF-8 version of strcmp, strchr,
>>>strcasecmp, etc.?
>>>
>>>      
>>>
>>Some of them are easy (strcmp, for example). Most of them are harder, 
>>because they either need to know character boundaries, or need case 
>>mappings (strcasecmp, for example). Any function that searches for a 
>>'char' in a string also won't work if the character being searched for 
>>is a multi-byte one.
>>    
>>
>
>Not even strcmp works, because you have things like combinations where you
>can represent in Unicode a character using different code points, but it's
>still considered the same. Say, a Latin o with an accent mark. Using wide
>char internally solves these issues, and is most likely faster, depending on
>the data.
>  
>
Too right. Look at IBM's internationalisation classes for Unicode. It 
takes megabytes of code to compare two strings.

Regards,
Steve