[asterisk-dev] Interlocking of individual channels and the channels list in chan_sip.

David Vossel dvossel at digium.com
Thu Jun 16 12:32:47 CDT 2011


----- Original Message -----
> From: "Kirill Katsnelson" <kkm at adaptiveai.com>
> To: "Asterisk Developers Mailing List" <asterisk-dev at lists.digium.com>
> Sent: Thursday, June 16, 2011 2:41:33 AM
> Subject: [asterisk-dev] Interlocking of individual channels and the channels list in chan_sip.
> I got a deadlock, and do not understand how it can be prevented.
> Either
> that's deep in the design, or I am misunderstanding the rules of
> function calling that are violated.
> 
> The do_monitor thread gets a REFER on a channel X, then in
> handle_refer() gets its bridged channels locked, and calls
> ast_async_goto() on that channel. The latter creates a new channel,
> attempts to lock the channels list to insert the newly created one,
> and
> deadlocks.
> 
> Because another thread is in do_devstate_changes(), calls
> ast_channel_get_by_name_prefix(), which goes into __ao2_find(channels,
> ...), that locks the channels list, then invokes ast_channel_cmp_cb()
> on
> every channel (including the channel X) and locks it. It deadlocks on
> an
> attempt to lock the channel. Now the 2 threads are in a first class
> deadlock.
> 
> I have not filed a bug yet, working on that. But I do not understand
> how
> this type of a deadlock is generally prevented. Since find_channel_...
> can be randomly called, the rule should be to never cause an attempted
> locking of the channels list while holding a lock on any channel.
> Correct?

Correct, We should never hold a channel lock, and then attempt to find a channel in the channel's container (which locks the channel container)

The locking order for everything in Asterisk is always.

1. Lock Container
2. Lock Element in container

There is no reason to ever hold a channel lock while searching the channels container.  Channel's are ref counted, so we don't have to worry about a channel lock being required to prevent channel destruction or anything weird like that.  Any place this sort of locking inversion occurs is a bug.

Part of the problem we have with locking is that our critical sections are not precise enough.  We tend to grab a lock and hold on to it as long as possible.  This makes it difficult to track down when something is locked or not and what all the lock is even necessary for.

-- 
David Vossel
Digium, Inc. | Software Developer, Open Source Software
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: www.digium.com & www.asterisk.org
The_Boy_Wonder in #asterisk-dev



More information about the asterisk-dev mailing list