[asterisk-bugs] [JIRA] (ASTERISK-21356) Segfault during bridge channel proxy inspection in a masquerade caused by an AMI Redirect of two channels

William luke (JIRA) noreply at issues.asterisk.org
Tue Apr 2 11:43:01 CDT 2013


    [ https://issues.asterisk.org/jira/browse/ASTERISK-21356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=204863#comment-204863 ] 

William luke commented on ASTERISK-21356:
-----------------------------------------

This issue has occurred again today, so it's certainly happening a LOT more frequently that I'd initially suspected.

I'll not put another backtrace in, as it's almost identical looking over it.

One thing I did notice (not sure if it's relevance) is that in each crash the channel which seemed to lead to the null pointer being dereferenced after a dual redirect (the SIP/gl-asgw... channel) had started of life in a Queue. It had then been ripped out of the Queue by means of an AMI bridge. The two channels in the bridge had been speaking away for some time before then being Dual Redirected and crashing. This *may* just be coincidence, as I suspect 90% or so of our calls which are Dual Redirected in this manner come from a Queue, but thought I'd mention it anyway.

As this is happening with alarming frequency now, I'd be immensely grateful for any insight into ways I could even go about avoiding the issue pending a more longterm fix. I am just not familiar enough with the code, and especially the masquerade stuff to tell what might help and what might not.
Would doing a single redirect rather than a dual redirect, combined with some dialplan logic to get the second channel there be a possible fix? Am thinking we'd still end up inside "__ast_channel_masquerade" and so hit the same problem.

At first glance it seems the section of code where things go a bit crazy is checking for channel proxy (chan_agent is mentioned in the comments); since I'm not using any chan_agent's can I comment this code out? 

Upon deeper inspection, this must be some sort of race condition (not sure how as we've locked both original and clongchan), but:

 if (ast_channel_internal_bridged_channel(clonechan)
                        && (ast_channel_internal_bridged_channel(clonechan) != ast_bridged_channel(clonechan))
                        && (ast_channel_internal_bridged_channel(ast_channel_internal_bridged_channel(clonechan)) != clonechan)) {
                        final_clone = ast_channel_internal_bridged_channel(clonechan);
                }

So the check is done first against ast_channel_internal_bridged_channel(clonechan) being null (and it must not be null at that point, or else lazy eval would stop right there).
Then by the time we evaluate the third condition, it's now returning null, which is then dereferenced when passed to ast_channel_internal_bridged_channel as an argument.

So.... could we do a check against null pointer inside ast_channel_internal_bridged_channel itself? Or would that be equally susceptible to whatever rogue thread is changing things without getting a lock first?



                
> Segfault during bridge channel proxy inspection in a masquerade caused by an AMI Redirect of two channels 
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: ASTERISK-21356
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21356
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: General
>    Affects Versions: 11.3.0
>         Environment: CentOS 6.4
>            Reporter: William luke
>            Assignee: William luke
>            Severity: Critical
>         Attachments: backtrace.txt, debuglog.tar.gz
>
>
> Asterisk segfaulted during normal operation.
> After speaking to mjordan on #asterisk he advised that it looks like the a redirect thread (Thread 1) is trying to access a bridged channel using a NULL pointer.
> The scenario is that two channels are in a bridge, and then we execute a dual Redirect (via AMI) to put them both into a ConfBridge, so that we can bring a 3rd party into the call.
> This happens successfully hundreds of times a day on this system, so must be some sort of edge case when it goes wrong.
> I also *should* have a full debug log available, but it's quite a busy system and would be rather a large file (Circa 50Gb/day). I'm sure I could make it available if it would help.
> [gl-agentconf]
> exten => _X.,1,Set(ConfBridgeNumber=99${EXTEN})
>         same => n,NoOp(Putting AGENT into ConfBridge ${ConfBridgeNumber})
> ;       same => n,Set(JITTERBUFFER(fixed)=default);Adaptive with defaults.
>         same => n,Answer()
>         same => n,ConfBridge(${ConfBridgeNumber})
>         same => n,NoOp(Putting AGENT back into their Dialling Hub. Exten: ${EXTEN})
>         same => n,Goto(gl-diallinghub,${EXTEN},agentinhub)
> [gl-customerconf]
> exten => _X.,1,Set(ConfBridgeNumber=99${EXTEN})
> ;       same => n,Set(JITTERBUFFER(fixed)=default);Adaptive with defaults.
>         same => n,NoOp(Putting CUSTOMER into ConfBridge ${ConfBridgeNumber})
>         same => n,Answer()
>         same => n,ConfBridge(${ConfBridgeNumber})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list