<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 solid;">
<tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="https://reviewboard.asterisk.org/r/1706/">https://reviewboard.asterisk.org/r/1706/</a>
</td>
</tr>
</table>
<br />
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" style="background-image: url('https://reviewboard.asterisk.org/media/rb/images/review_request_box_top_bg.png'); background-position: left top; background-repeat: repeat-x; border: 1px black solid;">
<tr>
<td>
<div>Review request for Asterisk Developers.</div>
<div>By a_villacis.</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">The affected system is running a script that periodically invokes the AMI action "Agents", which is handled by action_agents() in channels/chan_agent.c:1499. This function traverses the agent list, and for each one first takes a lock on struct agent_pvt *p (chan_agent.c:1516), then attempts to take a lock on p->owner (a channel of type Agent, I think) at chan_agent.c:1534, in order to check whether this is a bridged channel. This second lock is the one that is introduced by the patch that "fixes" ASTERISK-18092.
Meanwhile, in another thread, some frames need to be written to the Agent/xxxx channel, at ast_write() in main/channel.c:4767 . In channel.c:4774, a lock is taken on the channel (which happens to be the one at p->owner), and then the tech-specific write method is invoked at channel.c:5032. For Agent channels, this method is agent_write() at channels/chan_agent.c:691. This method extracts tech_pvt from the channel (which happens to be the one picked up in the other thread at line 1516), then attempts to take a lock on it. Therefore, a deadlock.
This patch adds the DEADLOCK_AVOIDANCE pattern to action_agents() (previously patched to fix ASTERISK-18092), and also to the handlers of the console commands "agent show" and "agent show online", which are vulnerable to the same crash as ASTERISK-18092, but currently unpatched. The DEADLOCK_AVOIDANCE pattern was used because it was the shortest patch that fixes the bug, and is also used elsewhere in chan_agent.c. The alternative requires a rethink of the locking order in the entirety of chan_agent.</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">RPM with patched chan_agent.c is now running at an user machine where the problem was originally spotted. Before the patch, asterisk deadlocked within one hour. Now two hours have passed without incident.</pre>
</td>
</tr>
</table>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Bugs: </b>
<a href="https://issues.asterisk.org/jira/browse/ASTERISK-19285">ASTERISK-19285</a>
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>/branches/1.8/channels/chan_agent.c <span style="color: grey">(353648)</span></li>
</ul>
<p><a href="https://reviewboard.asterisk.org/r/1706/diff/" style="margin-left: 3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>