[asterisk-dev] [Code Review] [17949] Error that mutex 'dialog' has been free'd more than it's been locked in add_header_max_forwards
Russell Bryant
russell at digium.com
Mon Nov 8 12:00:37 CST 2010
On Sat, 2010-11-06 at 09:18 +0100, Olle E. Johansson wrote:
> I think we need to document the lessons learned here. When coding, I never thought twice about accessing a channel variable in that piece of the code. We do access channel variables in so many places... In this case, the issue was we did it during an ongoing call, which we don't do elsewhere.
>
> If I understand the situation right, in this case we have an existing pvt with an on-going call. The ->owner (ast_channel) can have a lot of parallell stuff going on, incoming and outgoing messages. We just couldn't figure out how to access the ast_channel properly for an outbound SIP request here. For this issue we found another way of solving it, since there's no point in supporting a change of the channel variable during the call.
>
> If we do not solve the locking issues involved here, it means that we can not add any function that changes headers in in-dialog messages based on channel variables unless we solve this. Like manager setting a channel variable DURING the call that will affect the BYE or an in-dialog re-invite, MESSAGE or any other request.
>
> Do we wan't to solve this or should we just document it as "kids, don't try this at home"?
It would be good to solve it, but it's a huge mess and we might as well
push it off until we hit a situation that we really need to do it during
the call. Here's the situation ...
It *is* safe to access a channel variable during the call. There is
nothing fundamentally wrong with that. The trick is getting the locking
right. In fact, if you look at the way add_header_max_forwards() was
written, the locking there is "correct". It did the right things to
ensure that it safely got a lock on the channel without the potential
for a deadlock.
The problem was that it assumed that the sip_pvt was locked prior to
calling into this function. Really, that's the right assumption to
make. The problem was that it wasn't always true. There are many code
paths that lead to that code, and some of them were not holding the
sip_pvt lock. When that happened, it would result in a deadlock.
At this point, we had two options to fix it. The first was fix the code
paths that didn't have the lock held (the first approach). The second
was what you proposed, which was to just move it to call setup, which
was far simpler and less risk for regression, so we just went that
route.
As for documenting lessons learned, I would summarize it with:
1) Locking in Asterisk is complicated. The issues with dealing with
locks between an ast_channel and a tech_pvt are covered in the
coding guidelines document already.
2) chan_sip has some locking inconsistencies. If the situation was
clear enough to document, the problem wouldn't have happened.
Let me know if you have any other comments or questions.
--
Russell Bryant
Digium, Inc. | Engineering Manager, Open Source Software
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
jabber: rbryant at digium.com -=- skype: russell-bryant
www.digium.com -=- www.asterisk.org -=- blogs.asterisk.org
More information about the asterisk-dev
mailing list