[Asterisk-Dev] IAX spec: Text formats and character sets

Kevin P. Fleming kpfleming at digium.com
Fri Apr 29 07:26:48 MST 2005


Kristian Nielsen wrote:

> Well, it is easy to implement our own strncpy_utf8() that copies only up
> to and including the last utf-8 character not going over the maximum
> specified byte length. Then we could also fix it to actually
> zero-terminate the copy (strncpy() doesn't always zero-terminate the
> destination as I am _sure_ everyone remebers :-).

I think 'easy' is an overstatement here. Any function that does this 
needs to understand the _entire_ UTF-8 space to know which characters 
are multibyte, and how many bytes they take up. This is not trivial, 
although it's also not very complicated... just some tables and keeping 
track of where you are so you can backtrack if needed.

The bigger issue is the performance hit this function will cause... if 
we do it at all, it will have to be compile-time selectable as to 
whether is uses raw strncpy() or utf8strnpcy().



More information about the asterisk-dev mailing list