[asterisk-bugs] [JIRA] (ASTERISK-26755) Random queues disappear on "core reload queue all"

Kirill Katsnelson (JIRA) noreply at issues.asterisk.org
Wed Jan 25 21:52:10 CST 2017


Kirill Katsnelson created ASTERISK-26755:
--------------------------------------------

             Summary: Random queues disappear on "core reload queue all"
                 Key: ASTERISK-26755
                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26755
             Project: Asterisk
          Issue Type: Bug
      Security Level: None
          Components: Applications/app_queue
    Affects Versions: 13.13.1
         Environment: $ uname -a
Linux qa1-asterisk1 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

            Reporter: Kirill Katsnelson


We have 500+ queues, the "core reload queue all" command is sent every 2 minutes, and sometimes a queue disappears on reload: it is in the queues.conf, but just not there until the next reload.

----

The issue is very easy to reproduce in a matter of a second. First, create 1000 queues:

{code}
#!/bin/bash
ASTROOT=~/asterisk/myroot
  (
    cat << EOF
[general]
persistentmembers = no
autofill = yes
updatecdr = no
EOF
    seq -f "[Q%03.0f]" 0 999
    cat << EOF
timeout = 1
retry = 1
autopause = no
ringinuse = no
setqueuevar = yes
strategy = random
announce-frequency = 0
EOF
  ) > ${ASTROOT}/etc/asterisk/queues.conf
{code}

Then make two torturously tight loops; the first in extensions.ael trying to enter the queue:

{code}
context from-sip {
  796 => {
    Queue(Q999,,,,0.01);
    jump ${EXTEN};
  }
}
{code}

and the second reloading the queue files

{code}
#!/bin/bash
ASTROOT=~/asterisk/myroot
while :; do
  # Reload queues
  touch ${ASTROOT}/etc/asterisk/queues.conf
  ${ASTROOT}/sbin/asterisk -rx "queue reload parameters"
done
{code}

Call the first, run the second, and there will be a lot of failures reported from Queue() complaining the queue Q999 does not exist.

-----

This is a race condition in app_queues.c. When reloading, all queues are first marked dead, and then resurrected as soon as each is loaded from config. At the same time, the dead flag is checked on a queue whenever the Queue() app returns, for lame-ducking out of service on a deleted queue, such that the queue is unlinked when it has no calls, which is our case. Both pieces hold locks... but these are different locks!

-----

I am sending a patch against the 13 branch that fixed a problem for us (under the above artificial test conditions). It is in QA now, not yet under a production load. I'll post the progress.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list