[asterisk-bugs] [Asterisk 0016299]: [patch] pedantic sip checking needed to generate valid messages (but broken)

Wed Dec 23 14:28:09 CST 2009

A NOTE has been added to this issue. 
====================================================================== 
https://issues.asterisk.org/view.php?id=16299 
====================================================================== 
Reported By:                wdoekes
Assigned To:                dvossel
====================================================================== 
Project:                    Asterisk
Issue ID:                   16299
Category:                   Channels/chan_sip/General
Reproducibility:            always
Severity:                   trivial
Priority:                   normal
Status:                     acknowledged
Target Version:             1.6.1.13
Asterisk Version:           SVN 
JIRA:                       SWP-451 
Regression:                 No 
Reviewboard Link:            
SVN Branch (only for SVN checkouts, not tarball releases):  trunk 
SVN Revision (number only!):  
Request Review:              
====================================================================== 
Date Submitted:             2009-11-21 15:03 CST
Last Modified:              2009-12-23 14:28 CST
====================================================================== 
Summary:                    [patch] pedantic sip checking needed to generate
valid messages (but broken)
Description: 
In function 'initreqprep' in channels/chan_sip.c, the following code can be
found:

if (sip_cfg.pedanticsipchecking) {
  ast_uri_encode(n, tmp_n, sizeof(tmp_n), 0);
  n = tmp_n;
  ast_uri_encode(l, tmp_l, sizeof(tmp_l), 0);
  l = tmp_l;
}
<...snip...>
snprintf(from, sizeof(from), "\"%s\" <sip:%s@%s>;tag=%s", n, l, d,
p->tag);

The function ast_uri_encode encodes chars < 32 and > 127 -- perhaps one
should replace that with ((signed char)*ptr < 32) ;-) -- as %HH hex
escapes.

A couple of problems (all minor):
- ast_uri_encode forgets to escape % and 0x7F (RFC2396 2.4.2 and 2.4.3)
- ast_uri_encode does not escape <, >, @ and some other characters that
'l' would've liked to be escaped
- 'n' is not supposed to be hex-escaped (RFC4475 3.1.1.5 writes """The
display name portion of the To and From header fields is "%Z%45". Note that
this is not the same as %ZE.""")
- 'n' does however like the double-quote to be escaped, by a backslash
- ast_uri_decode is called on entire messages, not on already broken up
parts

Browsing through chan_sip.c, I see pedanticsipchecking used in these
cases:
- allow blanks between the header key and the colon
- allow multiline sip headers
- compare the from-tag/to-tag/branches as well instead of only the
call-id
- check that a packet really is for us (handle_incoming)
- encode/decode reserved characters

In my humble opinion, I don't think creating valid output (correctly
encoding illegal characters) should be enabled only by a flag that is
reported as being 'slow'. And, not as relevant to me in this case, but
decoding valid hex-escapes from peers does not sound like too much to ask,
either.

What to do?
- I can easily write a patch that fixes my minor issue: always -- not
dependent on the pedanticsipchecking -- run a s/"/\\"/g (instead of
ast_uri_encode) on the name part in the From.
- I can also easily fix ast_uri_encode to escape %, 0x7f and the others as
mentioned in RFC2396 2.4.3.
- Fixing all ast_uri_decode to operate first after the data has been
broken up is a bit more tedious, so I can't promise I'll do that.

Regards,
Walter Doekes
OSSO B.V.
====================================================================== 

---------------------------------------------------------------------- 
 (0115750) wdoekes (reporter) - 2009-12-23 14:28
 https://issues.asterisk.org/view.php?id=16299#c115750 
---------------------------------------------------------------------- 
dvossel:

(4) If we're going to poke around in the ast_uri_encode function, I'd say
that upper case hex chars are more common and would suggest replacing the
"%%02x" with "%%02X". This might trigger breakage for users of bad code
however ;-)

pkempgen:

I'm with Nick_Lewis on the utf2ascii issue:
Normalization could be possible, but I think it will do too little good
for too much work. For western european languages where only few characters
are >127, a ? here and there might be inconvenient yet readable, but for
eastern european or (worse) asian languages, I fear the meaning could get
lost completely if all accents are dropped and simply displaying ???? would
be just as meaningful.

Nick_Lewis / oej:

My two cents on the UTF8 issue:
If you take away the problem of backward compatibility(*), the easiest way
to go is to:
- declare that all strings used in asterisk are UTF8 and,
- ensure that all protocols that do not speak UTF8 are encoded to and
decoded from their specific encodings(**) just before they leave and enter
asterisk, respectively.

One alternative is using wide characters internally, but that suffers from
the same issues and is far more work to implement.

(*) and (**) are mostly a problem in homogenic setups where everyone
speaks the same encoding but no one has told the application which encoding
that is. 

In my humble opinion, to continue this fruitfully, it would be wise to
identify where exactly breakage will occur when you switch and how this can
be mitigated:
- SIP shouldn't be an issue, you are already speaking UTF8 if you're using
the international characters.
- Other protocols (I am too unfamiliar with other protocols) might need an
explicit character set setting and users should be forced/warned *early* to
set this correctly.

Regards,
Walter 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2009-12-23 14:28 wdoekes        Note Added: 0115750                          
======================================================================