[asterisk-bugs] [JIRA] (ASTERISK-28708) app_queue: Deadlock with "queue show" and "shared_lastcall" option

Joshua C. Colp (JIRA) noreply at issues.asterisk.org
Tue Jan 21 04:58:25 CST 2020


     [ https://issues.asterisk.org/jira/browse/ASTERISK-28708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joshua C. Colp updated ASTERISK-28708:
--------------------------------------

    Summary: app_queue: Deadlock with "queue show" and "shared_lastcall" option  (was: app_queue: Possible deadlock between try_calling() and 'queue show')

> app_queue: Deadlock with "queue show" and "shared_lastcall" option
> ------------------------------------------------------------------
>
>                 Key: ASTERISK-28708
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-28708
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_queue
>    Affects Versions: 16.7.0
>         Environment: Running asterisk-16.7.0.tar.gz on 'CentOS Linux release 7.7.1908 (Core)' hyper-v VM with 8 E5-2630 cores and 8GB RAM.
>            Reporter: Marc Ketel
>            Severity: Minor
>         Attachments: pbx-dev7-stage1 crash 2020-01-11 #3.zip
>
>
> In production once every 2 weeks or so Asterisk freezes up. Asterisk process is running, but console commands are not responding and no calls are beeing processed.
> In development I tried crashing Asterisk faster. That succeeded eventually and can now reproduce issue in a few minutes. 
> Setup: 
> 1: 40 agents in a queue that pickup the calls within a second or so and hangup also within a second or so. Call duration does not seem to influence the deadlock.
> 2: Another Asterisk system initiates about 2 to 5 calls per second.
> Let this run for a few minutes and a deadlock occurs.
> At some point you will get some (possibly unrelated) error messages:
> Jan 10 07:47:25 pbx-dev-7-stage1 asterisk[206462]: WARNING[42989][C-000049e1]: channel.c:1124 in __ast_queue_frame: Exceptionally long voice queue length queuing to Local/3166 at queuebellen-002bccd8;1
> or
> Jan 10 07:47:57 pbx-dev-7-stage1 asterisk[206462]: WARNING[206485]: taskprocessor.c:1160 in taskprocessor_push: The 'stasis/m:devicestate:all-00000003' task processor queue reached 500 scheduled tasks.
> These error messages do no show every time in the reproduction on development, but in production the 'Exceptionally long voice queue length queuing' error seems to be a reliable indictator of the deadlock.
> I recompiled Asterisk with: MENUSELECT_CFLAGS=LOADABLE_MODULES DONT_OPTIMIZE BETTER_BACKTRACES DEBUG_THREADS
> Reproduction must take place with as many cpu cores as you can get. With 2 cores the issue cannot be reproduced, 8 or more seem to be needed to reproduce within minutes.
> On all reproductions I ended up with an empty locks.txt file, but only once the 'core show locks' on the Asterisk console did output a lock overview.
> The workaround I implemented is to use the taskset command to allow Asterisk to only 1 cpu core. This seems to prevent the deadlock. Using only 1 cpu core I cannot reproduce the issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list