[asterisk-dev] Safe to backport fix for ASTERISK-18101 to 1.6.2.20?
Alex Villacís Lasso
a_villacis at palosanto.com
Thu Sep 22 11:06:26 CDT 2011
I know of a customer that is using Asterisk 1.6.2.20, with a few dozen SIP extensions, heavy use of queues, and static agents that log into those queues. Every one to two days, they experienced lockups of the asterisk process - extensions became
unresponsive and eventually log out due to timeouts, the 'asterisk -r' command showed no output, and the only fix was to restart Asterisk. These events occur mainly during the prolonged use of the queues. I suspected that this was an instance of
https://issues.asterisk.org/jira/browse/ASTERISK-18101, originally reported for asterisk-1.8.x, but that this case showed that 1.6.2.20 was affected too. After some local tests, I backported the patch to 1.6.2.20 (which consisted of removal of
ao_lock(queues) and ao_unlock(queues) in places that were similar in 1.8.6.0 and 1.6.2.20), and then installed the patched 1.6.2.20 on the customer's server. For 4 days it worked correctly, but then they experienced a different kind of problem - extensions
went unresponsive for a few minutes, and then "recovered", meaning then seemed responsive, but they could not call each other or log into the queues, even though they could place calls to a SIP trunk. The AMI console also seemed responsive, but the logs
showed no activity when an extension tried to contact another extension. Activity was shown when placing calls into the SIP trunk.
Was my analysis that ASTERISK-18101 affects 1.6.2.20 correct?
Was it correct to remove the ao_lock/ao_unlock in the corresponding places in app_queue.c of 1.6.2.20, as done for 1.8.6.0? If correct, then can somebody give a clue on where in the code to investigate this issue?
More information about the asterisk-dev
mailing list