[Asterisk-Dev] IAX spec: Text formats and character sets
Michael Giagnocavo
mgg-digium at atrevido.net
Fri Apr 29 08:25:14 MST 2005
>> Well, it is easy to implement our own strncpy_utf8() that copies only up
>> to and including the last utf-8 character not going over the maximum
>> specified byte length. Then we could also fix it to actually
>> zero-terminate the copy (strncpy() doesn't always zero-terminate the
>> destination as I am _sure_ everyone remebers :-).
>
>I think 'easy' is an overstatement here. Any function that does this
>needs to understand the _entire_ UTF-8 space to know which characters
>are multibyte, and how many bytes they take up. This is not trivial,
>although it's also not very complicated... just some tables and keeping
>track of where you are so you can backtrack if needed.
>
>The bigger issue is the performance hit this function will cause... if
>we do it at all, it will have to be compile-time selectable as to
>whether is uses raw strncpy() or utf8strnpcy().
Well, what if you use wide chars? UTF-8 is great for a common-denominator,
on-the-wire format, but it's less than ideal for manipulation. With wide
chars with you can use wcsncpy and the rest of the wc* functions.
-Michael
More information about the asterisk-dev
mailing list