[asterisk-bugs] [JIRA] (ASTERISK-28708) app_queue: Deadlock with "queue show" and "shared_lastcall" option

Joshua C. Colp (JIRA) noreply at issues.asterisk.org
Tue Jan 21 04:58:25 CST 2020


    [ https://issues.asterisk.org/jira/browse/ASTERISK-28708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=249501#comment-249501 ] 

Joshua C. Colp commented on ASTERISK-28708:
-------------------------------------------

I believe this is because of the shared lastcall functionality. The device state callback is wanting the queues container so it can iterate and update other queues. The "queue show" CLI command holds the queues container lock the entire time it is iterating and locking queues, causing a deadlock.

I think for this to be a problem you would need to have "shared_lastcall" option set to "yes" which is not the default and also do "queue show" at the same time as an extension state update.

> app_queue: Deadlock with "queue show" and "shared_lastcall" option
> ------------------------------------------------------------------
>
>                 Key: ASTERISK-28708
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-28708
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_queue
>    Affects Versions: 16.7.0
>         Environment: Running asterisk-16.7.0.tar.gz on 'CentOS Linux release 7.7.1908 (Core)' hyper-v VM with 8 E5-2630 cores and 8GB RAM.
>            Reporter: Marc Ketel
>            Severity: Minor
>         Attachments: pbx-dev7-stage1 crash 2020-01-11 #3.zip
>
>
> In production once every 2 weeks or so Asterisk freezes up. Asterisk process is running, but console commands are not responding and no calls are beeing processed.
> In development I tried crashing Asterisk faster. That succeeded eventually and can now reproduce issue in a few minutes. 
> Setup: 
> 1: 40 agents in a queue that pickup the calls within a second or so and hangup also within a second or so. Call duration does not seem to influence the deadlock.
> 2: Another Asterisk system initiates about 2 to 5 calls per second.
> Let this run for a few minutes and a deadlock occurs.
> At some point you will get some (possibly unrelated) error messages:
> Jan 10 07:47:25 pbx-dev-7-stage1 asterisk[206462]: WARNING[42989][C-000049e1]: channel.c:1124 in __ast_queue_frame: Exceptionally long voice queue length queuing to Local/3166 at queuebellen-002bccd8;1
> or
> Jan 10 07:47:57 pbx-dev-7-stage1 asterisk[206462]: WARNING[206485]: taskprocessor.c:1160 in taskprocessor_push: The 'stasis/m:devicestate:all-00000003' task processor queue reached 500 scheduled tasks.
> These error messages do no show every time in the reproduction on development, but in production the 'Exceptionally long voice queue length queuing' error seems to be a reliable indictator of the deadlock.
> I recompiled Asterisk with: MENUSELECT_CFLAGS=LOADABLE_MODULES DONT_OPTIMIZE BETTER_BACKTRACES DEBUG_THREADS
> Reproduction must take place with as many cpu cores as you can get. With 2 cores the issue cannot be reproduced, 8 or more seem to be needed to reproduce within minutes.
> On all reproductions I ended up with an empty locks.txt file, but only once the 'core show locks' on the Asterisk console did output a lock overview.
> The workaround I implemented is to use the taskset command to allow Asterisk to only 1 cpu core. This seems to prevent the deadlock. Using only 1 cpu core I cannot reproduce the issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list