[asterisk-dev] Unicode in Text frames - how to fix?

Gunnar Hellström gunnar.hellstrom at omnitor.se
Mon Apr 30 03:08:48 MST 2007


The text frames could be defined to contain UTF-8. It would match your
proposal no 2 - to encode it in some way, since UTF-8 is Unicode
transformed.

It is not exactly defined if UTF-8 even is free from null bytes. No other
character than 0000 will transform to UTF-8 00, so if we can say that
Unicode 0000 shall not be used within text, we are safe.

UTF-8 would also suit the T.140 use of text frames, because T.140 is defined
to always be UTF-8 coded.

Gunnar
-------------------------------------------------------------------
Gunnar Hellström
Omnitor
gunnar.hellstrom at omnitor.se
----------------------------------------------------------------------
Subject: Re: [asterisk-dev] Unicode in Text frames - how to fix?


30 apr 2007 kl. 11.28 skrev Tim Panton:

> Asterisk's handling of text frames does not support unicode.
>
> We discovered this by accident, our Java IAX stack sends IAX text  
> frames in
> unicode (ascii is deprecated in Java) without a terminating '\0' byte.
> The IAX draft rfc (link) says that text frames should be in unicode.
> Asterisk however requires (but doesn't test for) a '\0' byte as the  
> traditional
> 'C' end of string marker, and determines the length of the text  
> string with
> strlen(data).
>
> Although we found it in the case of IAX text frames it looks like
> this is a general problem.
>
> At first glance it looks easy to fix, just add a lenght attribute  
> to the text frame.
> However this would change the channel api, so isn't to be done  
> lightly.
>
> Other options would be:
> 	1) change the IAX rfc to state that text frames are null  
> terminated ascii and reject
> any packets that aren't. (I.e. drop unicode)
> 	2) carry the unicode by encoding it in some way (like in html) and  
> mandate this.
> 	3) ??? ideas ????
>
As Mark said, the IAX draft is the specification, now it's up to the  
developer community
to make sure that the Asterisk implementation supports the draft,  
which it currently
does not.

We really need to take a deeper look into character sets both for  
text frames and Caller ID names.
SIP Caller ID Names, display names, are also UTF8, like the IAX  
protocol. The ZAP Caller ID names
are not, so we will need transcoding between these character sets.

/O

_______________________________________________
--Bandwidth and Colocation provided by Easynews.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

__________ NOD32 2229 (20070430) Information __________

Detta meddelande dr genomsvkt av NOD32 Antivirus.
http://www.nod32.com




More information about the asterisk-dev mailing list