[Asterisk-Dev] IAX spec: Text formats and character sets

Steve Underwood steveu at coppice.org
Sat Apr 30 03:08:53 MST 2005


Tilghman Lesher wrote:

>On Friday 29 April 2005 21:43, Michael Giagnocavo wrote:
>  
>
>>>Michael Giagnocavo wrote:
>>>      
>>>
>>>>Hmm, you're right. That's doesn't look bad at all.
>>>>
>>>>But... what about for comparisons and other Unicode operations?
>>>>Do the libraries available support some UTF-8 version of strcmp,
>>>>strchr, strcasecmp, etc.?
>>>>        
>>>>
>>>Some of them are easy (strcmp, for example). Most of them are
>>>harder, because they either need to know character boundaries, or
>>>need case mappings (strcasecmp, for example). Any function that
>>>searches for a 'char' in a string also won't work if the
>>>character being searched for is a multi-byte one.
>>>      
>>>
>>Not even strcmp works, because you have things like combinations
>>where you can represent in Unicode a character using different code
>>points, but it's still considered the same. Say, a Latin o with an
>>accent mark. Using wide char internally solves these issues, and is
>>most likely faster, depending on the data.
>>
>>    
>>
>>>I think it's safe to document that the on-wire format is UTF-8,
>>>but that the current implementations only support the single-byte
>>>subset of UTF-8. Any implementation is free to be extended to
>>>fully support the entire UTF-8 character space, providing
>>>suitable libraries can be found (or written).
>>>      
>>>
>>Shouldn't say that implementations SHOULD (or preferably MUST)
>>support UTF-8?
>>    
>>
>
>Since it's obvious that this is an unresolved issue, we should
>avoid the issue at this juncture in the IAX2 spec and simply specify
>that ASCII is the character format.  If at some point in the future
>these arguments are resolved, then at that time a revision may be made
>to the IAX2 specification allowing UTF-8 or another character set.
>
>  
>
That is the worst option of all. If you don't pin it down solidly now, 
people with internationise in a 1000 different incompatible ways.

Regards,
Steve




More information about the asterisk-dev mailing list