[asterisk-dev] Safe to backport fix for ASTERISK-18101 to 1.6.2.20?

Alex Villací­s Lasso a_villacis at palosanto.com
Thu Sep 22 11:06:26 CDT 2011


I know of a customer that is using Asterisk 1.6.2.20, with a few dozen SIP extensions, heavy use of queues, and static agents that log into those queues. Every one to two days, they experienced lockups of the asterisk process - extensions became 
unresponsive and eventually log out due to timeouts, the 'asterisk -r' command showed no output, and the only fix was to restart Asterisk. These events occur mainly during the prolonged use of the queues. I suspected that this was an instance of 
https://issues.asterisk.org/jira/browse/ASTERISK-18101, originally reported for asterisk-1.8.x, but that this case showed that 1.6.2.20 was affected too. After some local tests, I backported the patch to 1.6.2.20 (which consisted of removal of 
ao_lock(queues) and ao_unlock(queues) in places that were similar in 1.8.6.0 and 1.6.2.20), and then installed the patched 1.6.2.20 on the customer's server. For 4 days it worked correctly, but then they experienced a different kind of problem - extensions 
went unresponsive for a few minutes, and then "recovered", meaning then seemed responsive, but they could not call each other or log into the queues, even though they could place calls to a SIP trunk. The AMI console also seemed responsive, but the logs 
showed no activity when an extension tried to contact another extension. Activity was shown when placing calls into the SIP trunk.

Was my analysis that ASTERISK-18101 affects 1.6.2.20 correct?
Was it correct to remove the ao_lock/ao_unlock in the corresponding places in app_queue.c of 1.6.2.20, as done for 1.8.6.0? If correct, then can somebody give a clue on where in the code to investigate this issue?



More information about the asterisk-dev mailing list