[asterisk-bugs] [JIRA] (ASTERISK-28708) Possible deadlock between try_calling() and 'queue show'
Marc Ketel (JIRA)
noreply at issues.asterisk.org
Tue Jan 21 03:26:25 CST 2020
[ https://issues.asterisk.org/jira/browse/ASTERISK-28708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=249497#comment-249497 ]
Marc Ketel commented on ASTERISK-28708:
---------------------------------------
O wow. I forgot a major step in the reproduction:
3: execute 'queue show' about 10.000 times a second. I used 10 scripts that execute 'queue show' parallel.
'queue show' in production executes every 2 seconds. So that explains the freeze in 2 weeks instead of minutes.
> Possible deadlock between try_calling() and 'queue show'
> --------------------------------------------------------
>
> Key: ASTERISK-28708
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-28708
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Applications/app_queue
> Affects Versions: 16.7.0
> Environment: Running asterisk-16.7.0.tar.gz on 'CentOS Linux release 7.7.1908 (Core)' hyper-v VM with 8 E5-2630 cores and 8GB RAM.
> Reporter: Marc Ketel
> Attachments: pbx-dev7-stage1 crash 2020-01-11 #3.zip
>
>
> In production once every 2 weeks or so Asterisk freezes up. Asterisk process is running, but console commands are not responding and no calls are beeing processed.
> In development I tried crashing Asterisk faster. That succeeded eventually and can now reproduce issue in a few minutes.
> Setup:
> 1: 40 agents in a queue that pickup the calls within a second or so and hangup also within a second or so. Call duration does not seem to influence the deadlock.
> 2: Another Asterisk system initiates about 2 to 5 calls per second.
> Let this run for a few minutes and a deadlock occurs.
> At some point you will get some (possibly unrelated) error messages:
> Jan 10 07:47:25 pbx-dev-7-stage1 asterisk[206462]: WARNING[42989][C-000049e1]: channel.c:1124 in __ast_queue_frame: Exceptionally long voice queue length queuing to Local/3166 at queuebellen-002bccd8;1
> or
> Jan 10 07:47:57 pbx-dev-7-stage1 asterisk[206462]: WARNING[206485]: taskprocessor.c:1160 in taskprocessor_push: The 'stasis/m:devicestate:all-00000003' task processor queue reached 500 scheduled tasks.
> These error messages do no show every time in the reproduction on development, but in production the 'Exceptionally long voice queue length queuing' error seems to be a reliable indictator of the deadlock.
> I recompiled Asterisk with: MENUSELECT_CFLAGS=LOADABLE_MODULES DONT_OPTIMIZE BETTER_BACKTRACES DEBUG_THREADS
> Reproduction must take place with as many cpu cores as you can get. With 2 cores the issue cannot be reproduced, 8 or more seem to be needed to reproduce within minutes.
> On all reproductions I ended up with an empty locks.txt file, but only once the 'core show locks' on the Asterisk console did output a lock overview.
> The workaround I implemented is to use the taskset command to allow Asterisk to only 1 cpu core. This seems to prevent the deadlock. Using only 1 cpu core I cannot reproduce the issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list