[asterisk-dev] [Code Review] Fix deadlock between AMI action Agents and writing of frames to Agent channel

a_villacis reviewboard at asterisk.org
Wed Feb 1 12:38:38 CST 2012


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1706/
-----------------------------------------------------------

Review request for Asterisk Developers.


Summary
-------

The affected system is running a script that periodically invokes the AMI action "Agents", which is handled by action_agents() in channels/chan_agent.c:1499. This function traverses the agent list, and for each one first takes a lock on struct agent_pvt *p (chan_agent.c:1516), then attempts to take a lock on p->owner (a channel of type Agent, I think) at chan_agent.c:1534, in order to check whether this is a bridged channel. This second lock is the one that is introduced by the patch that "fixes" ASTERISK-18092.

Meanwhile, in another thread, some frames need to be written to the Agent/xxxx channel, at ast_write() in main/channel.c:4767 . In channel.c:4774, a lock is taken on the channel (which happens to be the one at p->owner), and then the tech-specific write method is invoked at channel.c:5032. For Agent channels, this method is agent_write() at channels/chan_agent.c:691. This method extracts tech_pvt from the channel (which happens to be the one picked up in the other thread at line 1516), then attempts to take a lock on it. Therefore, a deadlock.

This patch adds the DEADLOCK_AVOIDANCE pattern to action_agents() (previously patched to fix ASTERISK-18092), and also to the handlers of the console commands "agent show" and "agent show online", which are vulnerable to the same crash as ASTERISK-18092, but currently unpatched. The DEADLOCK_AVOIDANCE pattern was used because it was the shortest patch that fixes the bug, and is also used elsewhere in chan_agent.c. The alternative requires a rethink of the locking order in the entirety of chan_agent.


This addresses bug ASTERISK-19285.
    https://issues.asterisk.org/jira/browse/ASTERISK-19285


Diffs
-----

  /branches/1.8/channels/chan_agent.c 353648 

Diff: https://reviewboard.asterisk.org/r/1706/diff


Testing
-------

RPM with patched chan_agent.c is now running at an user machine where the problem was originally spotted. Before the patch, asterisk deadlocked within one hour. Now two hours have passed without incident.


Thanks,

a_villacis

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20120201/acf39ca4/attachment-0001.htm>


More information about the asterisk-dev mailing list