[asterisk-dev] [Code Review] Deadlock in channel masquerade handling

David Vossel dvossel at digium.com
Mon Oct 5 18:28:33 CDT 2009


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/387/
-----------------------------------------------------------

(Updated 2009-10-05 18:28:33.750250)


Review request for Asterisk Developers.


Changes
-------

Issues this update resolves...

1. tech_pvt locks held during a masquerade:  Tech pvt objects must be unlocked before calling a masquerade as this breaks locking order (lock channel then channel pvt).

2. ast_do_masquerade race condition: A new race condition surfaced after requiring the channel to be unlocked before issuing a masquerade.  Now it is very possible that ast_do_masquerade can be called twice by different threads, which would leave one waiting while the other completes. As it was, this caused all sorts of trouble because do_masquerade had no check to determine if the masquerade had already taken place or not.  It could also result in the channels being linked/unlinked incorrectly.  The fix for this involves holding the channels ao2_container locked while determining if a masquerade has already occurred or not.  If the masquerade is not required, then we unlock and exit, if the masquerade is required the function continues holding the ao2_container lock until the both channels are unlinked and locked.

The solution for issue 2 is rather complex as it involves proper use of ast_do_masquerade to guarantee efficient behavior.  In functions which are called often that contain ast_do_masqurade(), such as ast_read() or ast_hangup(), the channel's masq pointer should always be checked before issuing the masquerade as it is very expensive to lock the channel ao2_container just to check if the masq is required or not.


Summary
-------

In trunk, channels are stored in an ao2_container.  When accessing an item within an ao2_container the proper locking order is to first lock the container, and then the items within it.

In ast_do_masquerade both the clone and original channel must be locked for the entire duration of the function.  The problem with this is that it attempts to unlink and link these channels back into the ao2_container when one of the channel's name changes.  This is invalid locking order as the process of unlinking and linking will lock the ao2_container while the channels are locked!!! Now, both the channels in do_masquerade are unlinked from the ao2_container and then locked for the entire function.  At the end of the function both channels are unlocked and linked back into the container with their new names as hash values.

This new method of requiring all channels to be unlocked before ast_do_masquerade or ast_change_name required several changes throughout the code base.  I started by fixing every instance where these two functions were used, and then attempted to spiral out from there verifying no additional channel locks were held outside of the functions that called them.  This was a complex task and I believe I found all the obvious violations of this rule... It is possible by some series of indirection that I may have missed code paths that could again cause a problem.


This addresses bug 15911.
    https://issues.asterisk.org/view.php?id=15911


Diffs (updated)
-----

  /trunk/channels/chan_misdn.c 222151 
  /trunk/channels/chan_sip.c 222151 
  /trunk/include/asterisk/channel.h 222151 
  /trunk/main/channel.c 222151 
  /trunk/main/features.c 222151 
  /trunk/main/pbx.c 222151 

Diff: https://reviewboard.asterisk.org/r/387/diff


Testing
-------

I completed an attended transfer in chan_sip.


Thanks,

David




More information about the asterisk-dev mailing list