[asterisk-bugs] [JIRA] (ASTERISK-26755) Random queues disappear on "core reload queue all"
Kirill Katsnelson (JIRA)
noreply at issues.asterisk.org
Wed Jan 25 21:52:10 CST 2017
Kirill Katsnelson created ASTERISK-26755:
--------------------------------------------
Summary: Random queues disappear on "core reload queue all"
Key: ASTERISK-26755
URL: https://issues.asterisk.org/jira/browse/ASTERISK-26755
Project: Asterisk
Issue Type: Bug
Security Level: None
Components: Applications/app_queue
Affects Versions: 13.13.1
Environment: $ uname -a
Linux qa1-asterisk1 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Kirill Katsnelson
We have 500+ queues, the "core reload queue all" command is sent every 2 minutes, and sometimes a queue disappears on reload: it is in the queues.conf, but just not there until the next reload.
----
The issue is very easy to reproduce in a matter of a second. First, create 1000 queues:
{code}
#!/bin/bash
ASTROOT=~/asterisk/myroot
(
cat << EOF
[general]
persistentmembers = no
autofill = yes
updatecdr = no
EOF
seq -f "[Q%03.0f]" 0 999
cat << EOF
timeout = 1
retry = 1
autopause = no
ringinuse = no
setqueuevar = yes
strategy = random
announce-frequency = 0
EOF
) > ${ASTROOT}/etc/asterisk/queues.conf
{code}
Then make two torturously tight loops; the first in extensions.ael trying to enter the queue:
{code}
context from-sip {
796 => {
Queue(Q999,,,,0.01);
jump ${EXTEN};
}
}
{code}
and the second reloading the queue files
{code}
#!/bin/bash
ASTROOT=~/asterisk/myroot
while :; do
# Reload queues
touch ${ASTROOT}/etc/asterisk/queues.conf
${ASTROOT}/sbin/asterisk -rx "queue reload parameters"
done
{code}
Call the first, run the second, and there will be a lot of failures reported from Queue() complaining the queue Q999 does not exist.
-----
This is a race condition in app_queues.c. When reloading, all queues are first marked dead, and then resurrected as soon as each is loaded from config. At the same time, the dead flag is checked on a queue whenever the Queue() app returns, for lame-ducking out of service on a deleted queue, such that the queue is unlinked when it has no calls, which is our case. Both pieces hold locks... but these are different locks!
-----
I am sending a patch against the 13 branch that fixed a problem for us (under the above artificial test conditions). It is in QA now, not yet under a production load. I'll post the progress.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list