[asterisk-dev] [Code Review] Fix deadlock between AMI action Agents and writing of frames to Agent channel

Mark Michelson reviewboard at asterisk.org
Wed Feb 1 15:02:47 CST 2012


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1706/#review5370
-----------------------------------------------------------


I'm going to close this review because review 1708 addresses this issue in a way that is more consistent with the preferred way of dealing with lock inversion.

- Mark


On Feb. 1, 2012, 12:38 p.m., a_villacis wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1706/
> -----------------------------------------------------------
> 
> (Updated Feb. 1, 2012, 12:38 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Summary
> -------
> 
> The affected system is running a script that periodically invokes the AMI action "Agents", which is handled by action_agents() in channels/chan_agent.c:1499. This function traverses the agent list, and for each one first takes a lock on struct agent_pvt *p (chan_agent.c:1516), then attempts to take a lock on p->owner (a channel of type Agent, I think) at chan_agent.c:1534, in order to check whether this is a bridged channel. This second lock is the one that is introduced by the patch that "fixes" ASTERISK-18092.
> 
> Meanwhile, in another thread, some frames need to be written to the Agent/xxxx channel, at ast_write() in main/channel.c:4767 . In channel.c:4774, a lock is taken on the channel (which happens to be the one at p->owner), and then the tech-specific write method is invoked at channel.c:5032. For Agent channels, this method is agent_write() at channels/chan_agent.c:691. This method extracts tech_pvt from the channel (which happens to be the one picked up in the other thread at line 1516), then attempts to take a lock on it. Therefore, a deadlock.
> 
> This patch adds the DEADLOCK_AVOIDANCE pattern to action_agents() (previously patched to fix ASTERISK-18092), and also to the handlers of the console commands "agent show" and "agent show online", which are vulnerable to the same crash as ASTERISK-18092, but currently unpatched. The DEADLOCK_AVOIDANCE pattern was used because it was the shortest patch that fixes the bug, and is also used elsewhere in chan_agent.c. The alternative requires a rethink of the locking order in the entirety of chan_agent.
> 
> 
> This addresses bug ASTERISK-19285.
>     https://issues.asterisk.org/jira/browse/ASTERISK-19285
> 
> 
> Diffs
> -----
> 
>   /branches/1.8/channels/chan_agent.c 353648 
> 
> Diff: https://reviewboard.asterisk.org/r/1706/diff
> 
> 
> Testing
> -------
> 
> RPM with patched chan_agent.c is now running at an user machine where the problem was originally spotted. Before the patch, asterisk deadlocked within one hour. Now two hours have passed without incident.
> 
> 
> Thanks,
> 
> a_villacis
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20120201/b3cbb579/attachment-0001.htm>


More information about the asterisk-dev mailing list