<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>After doing some code review and responding to changes to this list, and then the following discussion on the #asterisk-dev channel, it has occurred to me that the fundamental locking rules for the boundary between channel drivers and the Asterisk core have not sufficiently been documented. Thus, I propose the following addition to the coding guidelines document.</div><div><br></div><div>I welcome any feedback. If anything is not clear, I will do my best to clarify it. If anything seems wrong, then we have an even bigger problem, because this describes how the code is written today. This is essentially an attempt to document some of what has come out of the evolution of the Asterisk code base, not what I would consider ideal.</div><div><br></div><div>---------------------------------------------------------</div><div>--- Locking</div>---------------------------------------------------------<br><div><br class="webkit-block-placeholder"></div><div>A) Locking Fundamentals</div><div><br></div><div>Asterisk is a heavily multithreaded application. It makes extensive use of locking to ensure safe access to shared resources between different threads.</div><div><br></div><div>When more that one lock is involved in a given code path, there is the potential for deadlocks. A deadlock occurs when a thread is stuck waiting for a resource that it will never acquire. Here is a classic example of a deadlock:</div><div><br></div><div> Thread 1 Thread 2</div><div> ------------ ------------</div><div> Holds Lock A Holds Lock B</div><div> Waiting for Lock B Waiting for Lock A</div><div><br></div><div>In this case, there is a deadlock between threads 1 and 2. However, this deadlock would have been avoided if both threads 1 and 2 had agreed that one must acquire Lock A before Lock B, or the other way around.</div><div><br></div><div>We have now arrived at the fundamental rule for dealing with multiple locks.</div><div><br></div><div> Thou must not violate the rules of locking order.</div><div><br></div><div>The agreed upon locking order depends on the code, but when using multiple locks, an order _must_ be established, and once established, must be respected.</div><div><br></div><div><br></div><div>B) Minding the boundary between channel drivers and the Asterisk core</div><div><br></div><div>The #1 cause of deadlocks in Asterisk is by not properly following the locking rules that exist at the boundary between channel drivers and the Asterisk core. The Asterisk core allocates an ast_channel, and channel drivers allocate technology specific private data that is associated with an ast_channel. Typically, both the ast_channel and technology pvt (private) data have their own lock. There are _many_ code paths that require both objects to be locked.</div><div><br></div><div>The locking order in this situation is the following:</div><div><br></div><div> 1) ast_channel</div><div> 2) technology pvt data</div><div><br></div><div>Channel drivers implement the ast_channel_tech interface to provide a channel implementation for Asterisk. Most of the channel_tech interface callbacks are called with the associated ast_channel locked. When accessing technology pvt data, the pvt can be locked directly.</div><div><br></div><div>The complication arises in the opposite code path. Consider the following situation:</div><div><br></div><div> 1) A message comes in over the "network"</div><div> 2) The CD (channel driver) monitor thread receives the message</div><div> 3) The CD associates the message with a pvt struct and locks the pvt</div><div> 4) While processing the message, the CD must do something that requires</div><div> locking the ast_channel</div><div><font class="Apple-style-span" face="'Courier New'"><br></font></div><div>This is the point that must be handled carefully. Given the specified locking order, it seems as if the following psuedo-code would be sufficient:</div><div><br></div><div> unlock(pvt);</div><div> lock(ast_channel);</div><div> lock(pvt);</div><div><br></div><div>Unfortunately, that's not good enough in Asterisk. All locks (in the case of mutexes, not rwlocks) are configured as recursive locks. That means that the same thread can lock the same lock multiple times, which essentially increments a counter. So, an unlock() may not actually make the thread release a lock.</div><div><br></div><div>So, instead, we rely on a different construct to ensure that both locks are acquired safely.</div><div><br></div><div> while (trylock(ast_channel) == FAILURE) {</div><div> unlock(pvt);</div><div> usleep(1); /* yield to other threads */</div><div> lock(pvt);</div><div> }</div><div><br></div><div>Technically, we are not following the specified locking order. We will acquire the pvt lock before the ast_channel lock. However, by looping in this fashion, we are safe from deadlocks. We never block waiting on the ast_channel lock. If we fail to acquire the ast_channel, we give other threads a chance to grab the pvt lock. This is to allow alternate code paths in other threads that have the ast_channel lock and call into the channel driver to safely acquire the pvt.</div><div><br></div><div><br></div><div>C) Locking multiple channels.</div><div><br></div><div>The next situation to consider is what to do when you need a lock on multiple ast_channels. If we did not use recursive mutexes, the following construct would be sufficient:</div><div><br></div><div> lock(MIN(chan1, chan2));</div><div> lock(MAX(chan1, chan2));</div><div><br></div><div>That type of code would follow an established locking order of always locking the channel that has a lower address first. Again, since we use recursive mutexes, this isn't safe. We use the same construct as described at the end of section B, instead.</div><div><br></div><div> lock(chan1);</div><div> while (trylock(chan2) == FAILURE) {</div><div> unlock(chan1);</div><div> usleep(1);</div><div> lock(chan1);</div><div> }</div><div><br></div><div>D) Documenting locking rules.</div><div><br></div><div>Finally, it is required that locking order rules are documented for every lock introduced into Asterisk. This is done almost nowhere in the existing code. However, it will be expected to be there for newly introduced code. Over time, this information should be added for all of the existing lock usage.</div><div><br></div><div><div>---------------------------------------------------------</div><div><div>---------------------------------------------------------</div><div><br></div></div></div><div> <span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>--</div><div>Russell Bryant</div><div>Senior Software Engineer</div><div>Open Source Team Lead</div><div>Digium, Inc.</div><div><br></div></div></span><br class="Apple-interchange-newline"></div></span><br class="Apple-interchange-newline"> </div><br></body></html>