[asterisk-dev] Possible Masquerade/Masquerade Conflict? (Local optimise clone is Redirect original)
Dave Woolley
david.woolley at bts.co.uk
Fri May 14 10:31:07 CDT 2010
Masquerading is one of the difficult areas in understanding Asterisk. I
think I may have found a problem with it that is leading to a double
free of a channel, specifically where the same channel is both the clone
channel for a local channel optimisation, and the original channel for
an AMI Redirect. However, for a number of reasons (see below), I don't
think I'm yet in a position to submit a formal bug report. What I would
appreciate is some views on whether I'm on the right track, suggestions
of how to verify it in lab conditions, and some more insight into
masquerading (particularly the reason for having separate prepare and
execute steps). Obviously, if someone recognizes they have fixed this,
that is also useful information.
Local channel optimisation involves masquerading the channel bridged
from one side of the local channel (original) into the other side of the
channel (clone). The basic current hypothesis is that, because Local
channel optimisation uses an implicit do_masquerade, and releases locks
after the preparation step, it is possible for an Redirect applied to
the clone channel to intervene, but the optimise to still try to
complete. The Redirect takes the optimise clone channel and masquerades
it into a newly created channel. It explicitly does the do_masquerade,
retaining locks through the whole process. I speculate that this
confuses the upstream channel of the optimise clone, which sees a hangup
because of the Redirect masquerade, but given the right timing, by the
time that it tries to hang it up, it is dealing with a channel that has
had information overwritten by the optimise masquerade. The net effect
is a double free on the optimise clone channel, with a resulting call of
abort().
The prepare stage of masquerading checks to see whether the original
channel is an original for any other masquerade and similarly whether
the clone is a clone for any other masquerade. However, my speculation
is that it should also check to see whether the original is a clone for
another one, and probably vice versa.
Details of the scenario
We have a call that is set up as an Originate into a local channel that
then runs the Queue application. The ;1 side of that local channel then
runs a second local channel that dials a SIP device. From now on, I
don't think the fact that there is a first local channel matters. It is
the thread that is running its ;1 side that ends up issuing abort and
there is no involvement of the ;2 side. In the CLI we see:
[2010-05-11 10:02:18.249] VERBOSE[10673] app_dial.c: --
SIP/siptrunk1-b6791708 answered Local/99999999999 at xxx-e908;2
Which I believe starts the optimise sequence, although it requires an
ast_write to actually start the masquerade, and the trace doesn't show
where that happens. I hypothesize that the prepare step of the optimise
masquerade happens in this interval.
[2010-05-11 10:02:18.273] VERBOSE[10674] pbx.c: -- Executing
[702 at yyy:1] NoOp("Local/99999999999 at xxx-e908;1) in new stack
Which is the first step of the Redirected dialplan, indicating that a
complete masquerade of the ;1 side of the second local channel has
happened. We wouldn't have initiated the Redirect in this way if we had
known the call was answered, so there is a race condition here, but one
we would have coped with.
Further steps in this bit of dialplan are here, ending in a ParkedCall
call, which I don't think is specifically relevant.
Then things get messy. Note this thread is the one that should be
hanging up because of the Redirect masquerade - it is the thread running
Dial, which is dialing the channel that is becoming a zombie:
[2010-05-11 10:02:18.278] WARNING[10672] channel.c: Channel
'SIP/siptrunk1-b6791708' may not have been hung up properly
[2010-05-11 10:02:18.279] VERBOSE[10674] features.c: -- Channel
Local/99999999999 at xxx;1 connected to parked call 702
This channel name is that of the channel that is being optimised out
(although its original physical channel structure is the optimise
clone).
At this point 10672 aborts, in the channel free from the ast_hangup from
the normal end of call processing in app_dial.
The dump shows that it is SIP/siptrunk1-b6791708 that it is trying to
hang up. It has softhangup set and no bridge. This is the optimised
out state of the local channel. tech_pvt is still set (consistent with
the original SIP channel). My speculation is that the SIP channel never
hung up, but this is actually the optimise clone channel that has had
the cloning completed after it got soft hung up in its role as original
channel for the Redirect masquerade.
On this theory, checking for both masqr and masq in the preparation
stage for the masquerade may avoid the problem, but as the comments say,
masquerading is a seriously wacked out operation, and I don't understand
it well enough to be sure.
The other question is, when do you actually need to separate the
preparation for the masquerade from the do_masquerade. If the local
channel had done both, with the locks held, there would have been no gap
in which things could go wrong.
Reasons for not submitting this as a bug report yet are:
- it has only happened once in several weeks of use and we
don't know an efficient way of reproducing it;
- it was on a customer system which means, for example, we
can't run valgrind on it;
- that system is based on 1.6.1.0;
- going to a later version of Asterisk is likely to cost
several man weeks in terms of re-integrating local modifications and
re-running the system test, and we are reluctant to do that for anything
except the next long terms stable release.
Also, I cannot see any code changes in the areas relating to our current
hypothesis.
As this is a public mailing list, please ignore the part of the
following relating to confidentiality.
--
Dave Woolley
BTS Holdings Plc
Tel: +44 (0)20 8401 9000 Fax: +44 (0)20 8401 9100
http://www.bts.co.uk <http://www.bts.co.uk/>
This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of the company. If you are not the intended recipient of this email, you must take no action based upon it, nor must you copy or show it to anyone. Please contact the sender if you believe you have received this email in error. In accordance with English Law, email communications may be monitored. All reasonable precautions have been taken to ensure that no viruses are present in this email; however, the company cannot accept responsibility for loss or damage arising from the use of this email. We recommend that you subject this email to your own virus checking procedures. BTS Holdings PLC is registered in England 1517630, VAT No 523 5092 66. Registered office, BTS House, Manor Road, Wallington, SM6 0DD
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-dev/attachments/20100514/072e50a4/attachment-0001.htm
More information about the asterisk-dev
mailing list