[Asterisk-Dev] IAX spec: Text formats and character sets

Michael Giagnocavo mgg-digium at atrevido.net
Fri Apr 29 08:25:14 MST 2005


>> Well, it is easy to implement our own strncpy_utf8() that copies only up
>> to and including the last utf-8 character not going over the maximum
>> specified byte length. Then we could also fix it to actually
>> zero-terminate the copy (strncpy() doesn't always zero-terminate the
>> destination as I am _sure_ everyone remebers :-).
>
>I think 'easy' is an overstatement here. Any function that does this 
>needs to understand the _entire_ UTF-8 space to know which characters 
>are multibyte, and how many bytes they take up. This is not trivial, 
>although it's also not very complicated... just some tables and keeping 
>track of where you are so you can backtrack if needed.
>
>The bigger issue is the performance hit this function will cause... if 
>we do it at all, it will have to be compile-time selectable as to 
>whether is uses raw strncpy() or utf8strnpcy().

Well, what if you use wide chars? UTF-8 is great for a common-denominator,
on-the-wire format, but it's less than ideal for manipulation. With wide
chars with you can use wcsncpy and the rest of the wc* functions. 

-Michael





More information about the asterisk-dev mailing list