[asterisk-dev] [Code Review] Introduce function for parsing ABNF structure {name-andor-addr = name-addr / addr-spec} in sip messages

Mark Michelson mmichelson at digium.com
Fri Mar 12 09:34:07 CST 2010



> On 2010-03-11 14:09:12, Mark Michelson wrote:
> > This is my first pass through this review request. Even though there are many areas that need improvement, I have to say that this is an excellent addition to the code, and you get bonus points from me for including unit tests along with the code changes.
> > 
> > In addition to what I marked below, please also make an effort when posting your next diff to remove as much trailing whitespace as you can from the diff so that there aren't as many red blotches on the review.
> > 
> > Thanks!
> 
> Nick Lewis wrote:
>     Thank you very much for all your comments
>     
>     Are there any formatting tools that you would recommend for whitespace, tabbing etc?

I don't have any sort of recommendation there. I use vim as my editor, and its default behavior is to indent using a tab instead of X number of spaces. doc/CODING-GUIDELINES has a sample command that you can use for the 'indent' utility that may be helpful for you.


> On 2010-03-11 14:09:12, Mark Michelson wrote:
> > trunk/channels/sip/include/reqresp_parser.h, line 54
> > <https://reviewboard.asterisk.org/r/549/diff/1/?file=8526#file8526line54>
> >
> >     I have a suggestion for this function.
> >     
> >     Seeing the unruly number of parameters to this function, I think a better approach would be to wrap all the output parameters in a struct instead of having each individual one returned.
> >     
> >     Also, "remainder" is spelled incorrectly here.
> 
> Mark Michelson wrote:
>     In case the sheer number of parameters wasn't enough of a reason for you to make this change, there are also some other reasons, mostly having to do with the fact that all output pointers can be specified NULL. There are several cases where you can end up with less-than-desirable results.
>     
>     For instance let's take the following URI into account:
>     
>     sip:user at host:5060;param1=value1?header1=value2
>     
>     Let's look at a particular combination that has some unexpected results:
>     If you call parse_uri_full with headers == NULL and uriparams == NULL, but remainder (or residue as it's called in the function definition) non-NULL, then on returning, residue will be set to "host" and port will be set to "5060;param1=value1?header1=value2"
>     
>     If you change the function prototype to take a struct into which all the output parameters are placed, then there's no worry about the various combinations of NULL and non-NULL pointers to have to deal with.
> 
> Nick Lewis wrote:
>     I think you have described a bug with residue="host". Would my change at line 133 of the .c fix this?
>     
>     I was copying two aspects of the original parse_uri function here. The first was the separate variables and the second was the ability to pass NULL if one did not want to terminate the string at the detected uri component. The former made it a bit easier to reimplement the parse_uri function and, although I have no idea why one would want to do it, the latter is retained for backward compatibility.
>     
>     If the requirement for passing NULL **char pointers can be dropped then I will change to a struct
>

> "I think you have described a bug with residue="host". Would my change at line 133 of the .c fix this?"

No, that will not help here. On a second look, I made a small mistake in my analysis, but the results are still less than ideal. When the port is parsed, you set "uri" to point to the port. Then, if params and headers are both NULL (and by that, I mean the function parameters with those names), then there is no further parsing done. The result is that the port is not properly null-terminated, so that it is set to "5060;param1=value1?header1=value2." In addition, since we never updated the uri pointer any further, residue points to the same string as port, meaning it also is set to "5060;param1=value1?header1=value2."


> On 2010-03-11 14:09:12, Mark Michelson wrote:
> > trunk/channels/sip/reqresp_parser.c, lines 162-205
> > <https://reviewboard.asterisk.org/r/549/diff/1/?file=8528#file8528line162>
> >
> >     It is somewhat common for certain parameters to not contain an "=" sign, particularly the "lr" parameter. If I have read this loop correctly, we will
> >     
> >     1) Not properly parse any parameter that does not contain an "="
> >     2) Not parse any further parameters if we come across one that does not contain an "="
> >     
> >     I think this loop needs to be rewritten in a manner that does not depend on an "=" being present for all parameters.
> 
> Nick Lewis wrote:
>     Thanks for identifying this issue. I had not realised that "lr" was a special case. I will try again

Yeah, "lr" is kind of a strange thing to deal with. There have been several threads on the sip-implementors mailing list that deal with ways this is used. Some UAs will simply just place "lr" in the URI, while others will do something like "lr=on". From what I have gleaned, it's the presence or absence of this header that is meaningful rather than its value. With that in mind, it may be a good idea to change the uriparams structure so that the "lr" field is an int indicating whether the parameter is present rather than a string that stores the parameter's value.


> On 2010-03-11 14:09:12, Mark Michelson wrote:
> > trunk/channels/sip/reqresp_parser.c, lines 211-213
> > <https://reviewboard.asterisk.org/r/549/diff/1/?file=8528#file8528line211>
> >
> >     Based on sip_parse_uri_fully_test, this may actually be intended behavior, but I'll still remark on this anyway.
> >     
> >     If you have a uri like:
> >     
> >     name at host;param1=apple;transport=tcp;param2=banana
> >     
> >     Then the transport value will be stored in the uriparams struct, and "param2=banana" will be stored in residue. However, "param1=apple" is just lost in this case.
> >     
> >     If this is being done on purpose, can you add a comment somewhere, preferably to the doxygen for this function why this is being done this way?
> 
> Nick Lewis wrote:
>     Yes this is on purpose. Many message-headers contain their own parameters after the uri which need to be returned for further parsing. If the uri is in addr-spec format then the divide between uri parameters and message-header parameters can only be determined after uri parsing e.g. 
>     
>     To: user at host;param1=apple;transport=tcp;tag=1234
>     
>     hence the need for the residue. param1 however is unambiguously not a message-header parameter but an unsupported uri parameter

All right. Thank you for the explanation.


> On 2010-03-11 14:09:12, Mark Michelson wrote:
> > trunk/channels/sip/reqresp_parser.c, line 221
> > <https://reviewboard.asterisk.org/r/549/diff/1/?file=8528#file8528line221>
> >
> >     So, I made all my comments on your other unit test first, but most of them apply here, especially with regards to
> >     
> >     1) Indentation inconsistency
> >     2) Use of an open coded linked list
> >     3) Odd structure of the while loop
> >     4) Use of a giant compound if statement instead of many small ones.
> 
> Nick Lewis wrote:
>     Can you please help me a little more to understand comments 2&3 
>     - scrub that, it is clear below
>     
>     (I think 4 is consistent with other unit tests in this file) 
>     
>

Sorry if you were trying for consistency. I'd say for now that you can just ignore number 4 then. A small-scale improvement to the file for a future date would be to make such a change to all tests in the file. For now, consistency is better.


> On 2010-03-11 14:09:12, Mark Michelson wrote:
> > trunk/channels/sip/reqresp_parser.c, lines 918-920
> > <https://reviewboard.asterisk.org/r/549/diff/1/?file=8528#file8528line918>
> >
> >     This change makes no sense to me.
> >     
> >     Edit: Okay, after looking further in the function, I see what you are doing here, although I personally like the way it was better.
> 
> Nick Lewis wrote:
>     I will check this but I think I needed first_bracket incremented irrespective of the if statement

The old way was to have first_bracket actually point to the first bracket. Then, when we search for items after the first bracket, we would do something like 
strchr(first_bracket + 1, '>');
This, to me is more clear because first_bracket actually points to the first bracket instead of the position immediately after the first bracket.


> On 2010-03-11 14:09:12, Mark Michelson wrote:
> > trunk/channels/sip/reqresp_parser.c, lines 932-934
> > <https://reviewboard.asterisk.org/r/549/diff/1/?file=8528#file8528line932>
> >
> >     Can you give an example where this happens?
> 
> Nick Lewis wrote:
>     I was/am looking at whether the display-name can be identified in-place rather than copied into another part of memory as done currently by get_calleridname(). In so doing I found that the trailing quote of a quoted display-name and the trailing space of a token display-name could be used for the string terminator. A common error though is omitting the trailing space in a token display name e.g. 
>     Nick Lewis<1234 at myself.com>
>     Rather than reject this it would be better if the display-name were terminated at the bracket.

Great example. Thank you!


> On 2010-03-11 14:09:12, Mark Michelson wrote:
> > trunk/channels/sip/reqresp_parser.c, lines 1217-1227
> > <https://reviewboard.asterisk.org/r/549/diff/1/?file=8528#file8528line1217>
> >
> >     It would be better if each of the conditions between ||s could be in its own if statement. That way, you can use ast_test_status_update to more specifically tell which part of the parsing failed.
> 
> Nick Lewis wrote:
>     The code was lifted from pre-existing unit tests in this file but I agree that it would be far more useful with specific test fail messages so will change

As I said in another comment, if the other unit tests in the file act this way, it's a better idea to go for consistency for now. A separate code change on a later date can then break up the giant compound if statements in ALL the unit tests instead of having to pick and choose which ones to work on.


- Mark


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/549/#review1659
-----------------------------------------------------------


On 2010-03-11 03:12:54, Nick Lewis wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/549/
> -----------------------------------------------------------
> 
> (Updated 2010-03-11 03:12:54)
> 
> 
> Review request for Asterisk Developers and David Vossel.
> 
> 
> Summary
> -------
> 
> Many sip headers in many sip methods contain the ABNF structure
>  name-andor-addr = name-addr / addr-spec
> Examples include the to-header, from-header, contact-header, replyto-header, referto-header, referredby-header, and passertedid-header
> 
> At the moment chan_sip.c makes various different attempts to parse this name-andor-addr structure for each header type and for each sip method with sometimes limited degrees of success.
> 
> I here introduce a dedicated function for parsing the name-andor-addr ABNF structure that can be used irrespective of the specific method or header that contains the structure
> 
> (The function is also suited to parsing the name-addr ABNF structure without the addr-spec option. Examples in chan_sip.c include the recordroute-header, route-header, remotepartyid-header and diversion-header.)
> 
> 
> This addresses bug 16708.
>     https://issues.asterisk.org/view.php?id=16708
> 
> 
> Diffs
> -----
> 
>   trunk/channels/sip/include/reqresp_parser.h 249060 
>   trunk/channels/sip/include/sip.h 249060 
>   trunk/channels/sip/reqresp_parser.c 249060 
> 
> Diff: https://reviewboard.asterisk.org/r/549/diff
> 
> 
> Testing
> -------
> 
> Tested using the corresponding unit test function
> 
> 
> Thanks,
> 
> Nick
> 
>




More information about the asterisk-dev mailing list