[asterisk-dev] [Code Review] 3990: CDRs/Dial: Fix an assertion caused by advancing a neutral state channel straight into dial pending without going through dial

Matt Jordan reviewboard at asterisk.org
Tue Sep 16 12:21:48 CDT 2014


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3990/#review13312
-----------------------------------------------------------



/branches/12/include/asterisk/dial.h
<https://reviewboard.asterisk.org/r/3990/#comment23790>

    Ideally, none of this should be public.
    
    The fact that we have to manage a masquerade datastore during a dial operation should be as hidden as possible. With the current structure of dial, that means we have two options:
    (1) Implement the datastore fix ups in each place that performs a dial operation (Dial, FollowMe, Queue, and the dial API)
    (2) Implement the datastore fix ups in some shared location that all of those use: as you've done here, in stasis_channels.
    
    The drawbacks for the first option are obvious. The drawbacks for the second are a bit less so - the largest being a jump in the scope of what stasis_channels does. Currently it is only responsible for publishing dial messages over the Stasis message bus: with this patch, it would expand to having a channel datastore/fixup placed on it, with some obvious locking concerns as well.
    
    My inclination is to go with option #2, although I can't say I love it. Ideally this would all just be in the dial API, but until everything is converted over to using that, it isn't a viable option (and in fact, the need for this patch is much more likely to go away quicker via an early bridge than said conversion).



/branches/12/main/dial.c
<https://reviewboard.asterisk.org/r/3990/#comment23793>

    I can tell you that this structure will (someday) be re-used for a masquerade in the opposite direction as well - namely, what occurs during call pickup. That's another bug for another day (and one with less bad ramifications than what you are fixing here).
    
    I'd rename this something more friendly to that however - namely, remove the word "breakdown".



/branches/12/main/dial.c
<https://reviewboard.asterisk.org/r/3990/#comment23791>

    Generally, we prefer to place comments before the items, particuarly when they are long:
    
    struct dial_foo {
        /* Indication that a masquerade already occurred */
        int consumed;
    
     etc.
    



/branches/12/main/dial.c
<https://reviewboard.asterisk.org/r/3990/#comment23792>

    The comment is a bit ambiguous. What was triggered? What was set?
    
    If what you mean is that a masquerade already occurred, why would we not just remove the datastore when that occurs?



/branches/12/main/dial.c
<https://reviewboard.asterisk.org/r/3990/#comment23799>

    You need to track n peer_channels, as a single channel may be in a parallel dial, and someone could perform a blonde transfer.
    
    Use a list of ast_channel/char* tuples:
    
    struct dial_target {
        struct ast_channel *peer;
        char *dialstring;
        AST_LIST_ENTRY(dial_target) entry;
    }
    
    struct dial_masquerade_datastore {
        AST_LIST_HEAD(dialled_peers, dial_target);
    }
    
    This will obviously ripple through your code here.



/branches/12/main/dial.c
<https://reviewboard.asterisk.org/r/3990/#comment23795>

    Instead of setting the consume flag, remove the datastore and free it:
    
    if (!ast_channel_datastore_remove(old_chan, ds)) {
        ast_datastore_free(ds);
    }
    
    Note that it is safe to do that here, as the list traversal in channel.c assumes that this can occur.
    
    That also removes the need for de-refing the peer_chan here, as your destructor will take care of it for you.



/branches/12/main/dial.c
<https://reviewboard.asterisk.org/r/3990/#comment23796>

    Remember that some things are going to want to do a string comparison against the type name. Use a more reasonable string name.
    
    Also: CDRs are a bad name here. You aren't doing this just for CDRs - you're doing it so that Stasis preserves a sane view of the world for all of its message consumers. CDRs just happens to be the noisiest consumer.



/branches/12/main/dial.c
<https://reviewboard.asterisk.org/r/3990/#comment23797>

    I'm with Richard, do the assignment out of the if statement:
    
    datastore = ast_channel_datastore_find(...)
    if (!datastore) {
        return NULL;
    }



/branches/12/main/dial.c
<https://reviewboard.asterisk.org/r/3990/#comment23798>

    Same findings here on allocation outside of the if statement.



/branches/12/main/dial.c
<https://reviewboard.asterisk.org/r/3990/#comment23800>

    If you already have a datastore on the channel, you don't want to remove the contents - you want to add another peer_chan to the list of channels being dialled.


- Matt Jordan


On Sept. 11, 2014, 4:59 p.m., Jonathan Rose wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/3990/
> -----------------------------------------------------------
> 
> (Updated Sept. 11, 2014, 4:59 p.m.)
> 
> 
> Review request for Asterisk Developers, Matt Jordan and rmudgett.
> 
> 
> Bugs: ASTERISK-24237
>     https://issues.asterisk.org/jira/browse/ASTERISK-24237
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> Reproduction: 
> pj123 calls 1601
> pj123 does a SIP blonde transfer to 1603
> 1603 answers
> FRACK occurs
> all are PJSIP endpoints.
> 
> Basically what happens is there is a second outbound dial from another pj123 channel. Before the dial is answered, the pj123 gets masqueraded out of the picture and replaced with a channel that isn't in the dial state.
> 
> This patch fixes that by advancing the incoming channel to the dial state in the channel breakdown function of a datastore on the pj123 channel. Honestly, I'm not sure this is entirely adequate, but it seems to work in all of the conditions I've tried so far and it doesn't impede normal attended transfers. I might need to try and figure out what happens if I masquerade in a channel that is already dialing though. I'm not sure if that's something we can really expect to happen under normal conditions, but that seems like something that would screw up this approach.
> 
> Note that I think this patch is the right idea, I just don't know if I need to account for the possibility that the channel that masquerades into pj123's dialing channel might not be in the neutral state.
> 
> 
> Diffs
> -----
> 
>   /branches/12/main/stasis_channels.c 422882 
>   /branches/12/main/dial.c 422882 
>   /branches/12/include/asterisk/dial.h 422882 
> 
> Diff: https://reviewboard.asterisk.org/r/3990/diff/
> 
> 
> Testing
> -------
> 
> Ran against reproduction of the above scenario. Assertion was gone and the CDR results were as follows:
> 
> "","123","1601","default",""""" <123>","PJSIP/pj123-00000000","PJSIP/1601-00000001","Dial","PJSIP/1601,,tT","2014-09-11 21:48:51","2014-09-11 21:48:53","2014-09-11 21:48:57",5,4,"ANSWERED","DOCUMENTATION","1410472131.0",""
> "","123","","default",""""" <123>","PJSIP/pj123-00000002","PJSIP/1603-00000003","Dial","PJSIP/1603","2014-09-11 21:48:55",,"2014-09-11 21:48:57",2,0,"NO ANSWER","DOCUMENTATION","1410472135.6",""
> "","1601","1603","default",""""" <1601>","PJSIP/1601-00000001","PJSIP/1603-00000003","AppDial","(Outgoing Line)","2014-09-11 21:48:57","2014-09-11 21:48:57","2014-09-11 21:49:04",6,6,"ANSWERED","DOCUMENTATION","1410472131.1",""
> 
> And dial events reported by AMI:
> http://pastebin.com/tWuwL7xa
> (note that there is a mismatch between the number of dial end and dial begin events... not sure if this is a problem, but I might be able to fix it just by updating the old chan, not sure what status code to use though).
> 
> Ran against normal attended transfer, feature attended transfers, and blind transfers with no noticeable effect.
> 
> Ran against testsuite blonde transfer tests, some attended transfer tests, some blind transfer tests, and all pjsip transfer tests (at least ones that will run on my box... a few won't due to sipp version requirements that I really need to get around to fixing eventually) for comparison purposes. All passed exhibiting the same behavior as before as far as warning messages and such go.
> 
> 
> Thanks,
> 
> Jonathan Rose
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20140916/98f9f840/attachment-0001.html>


More information about the asterisk-dev mailing list