[asterisk-bugs] [Asterisk 0016299]: [patch] pedantic sip checking needed to generate valid messages (but broken)

Wed Dec 23 15:43:29 CST 2009

A NOTE has been added to this issue. 
====================================================================== 
https://issues.asterisk.org/view.php?id=16299 
====================================================================== 
Reported By:                wdoekes
Assigned To:                dvossel
====================================================================== 
Project:                    Asterisk
Issue ID:                   16299
Category:                   Channels/chan_sip/General
Reproducibility:            always
Severity:                   trivial
Priority:                   normal
Status:                     acknowledged
Target Version:             1.6.1.13
Asterisk Version:           SVN 
JIRA:                       SWP-451 
Regression:                 No 
Reviewboard Link:            
SVN Branch (only for SVN checkouts, not tarball releases):  trunk 
SVN Revision (number only!):  
Request Review:              
====================================================================== 
Date Submitted:             2009-11-21 15:03 CST
Last Modified:              2009-12-23 15:43 CST
====================================================================== 
Summary:                    [patch] pedantic sip checking needed to generate
valid messages (but broken)
Description: 
In function 'initreqprep' in channels/chan_sip.c, the following code can be
found:

if (sip_cfg.pedanticsipchecking) {
  ast_uri_encode(n, tmp_n, sizeof(tmp_n), 0);
  n = tmp_n;
  ast_uri_encode(l, tmp_l, sizeof(tmp_l), 0);
  l = tmp_l;
}
<...snip...>
snprintf(from, sizeof(from), "\"%s\" <sip:%s@%s>;tag=%s", n, l, d,
p->tag);

The function ast_uri_encode encodes chars < 32 and > 127 -- perhaps one
should replace that with ((signed char)*ptr < 32) ;-) -- as %HH hex
escapes.

A couple of problems (all minor):
- ast_uri_encode forgets to escape % and 0x7F (RFC2396 2.4.2 and 2.4.3)
- ast_uri_encode does not escape <, >, @ and some other characters that
'l' would've liked to be escaped
- 'n' is not supposed to be hex-escaped (RFC4475 3.1.1.5 writes """The
display name portion of the To and From header fields is "%Z%45". Note that
this is not the same as %ZE.""")
- 'n' does however like the double-quote to be escaped, by a backslash
- ast_uri_decode is called on entire messages, not on already broken up
parts

Browsing through chan_sip.c, I see pedanticsipchecking used in these
cases:
- allow blanks between the header key and the colon
- allow multiline sip headers
- compare the from-tag/to-tag/branches as well instead of only the
call-id
- check that a packet really is for us (handle_incoming)
- encode/decode reserved characters

In my humble opinion, I don't think creating valid output (correctly
encoding illegal characters) should be enabled only by a flag that is
reported as being 'slow'. And, not as relevant to me in this case, but
decoding valid hex-escapes from peers does not sound like too much to ask,
either.

What to do?
- I can easily write a patch that fixes my minor issue: always -- not
dependent on the pedanticsipchecking -- run a s/"/\\"/g (instead of
ast_uri_encode) on the name part in the From.
- I can also easily fix ast_uri_encode to escape %, 0x7f and the others as
mentioned in RFC2396 2.4.3.
- Fixing all ast_uri_decode to operate first after the data has been
broken up is a bit more tedious, so I can't promise I'll do that.

Regards,
Walter Doekes
OSSO B.V.
====================================================================== 

---------------------------------------------------------------------- 
 (0115752) wdoekes (reporter) - 2009-12-23 15:43
 https://issues.asterisk.org/view.php?id=16299#c115752 
---------------------------------------------------------------------- 
(Pretty much all of my arguments below depend on my assumption that
asterisk is encoding-agnostic and does not forcefully strip the eighth bit
in the core of the application. If those assumptions do not hold true, you
can ignore the rest of this post.)

- You do realize that the beauty of UTF8 is that an application that is
encoding-agnostic can do UTF8 without any work at all? As far as I can tell
-- please correct me in this if I'm wrong -- asterisk is for the most part
encoding-agnostic (exceptions being things like mail content encoding in
app_voicemail). So changing of "ALL strings" is not at all necessarily the
case. Unless you're referring to the increased size you *might* need, which
should be looked at, but is a moot point as you mention the current usage
of other multibyte encodings that also use more bytes than characters.

- That people might want an anglified name might be the case, but that's
also a moot point. They do not have two names in the current setup either.

- I did already notice that the Linksys on my desktop does not do UTF8
like it should. Yes, if the phone for some bad reason does speak latin1,
forcing utf8 in the chan_sip.c output does make it problematic to send
extended characters to the phone. But do note that the <=127 range still
works fine.

- For the record: it is not me who is requiring asterisk to know about
UTF8. I only speak SIP at the moment and as long as asterisk doesn't mangle
my bytes I can speak UTF8 without asterisk having to know about it. And
yes, even though I claim that you do not need to touch all strings, I do
realise that this requires many many hours to get done. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2009-12-23 15:43 wdoekes        Note Added: 0115752                          
======================================================================