[asterisk-bugs] [JIRA] (ASTERISK-28708) Possible deadlock between try_calling() and 'queue show'
Marc Ketel (JIRA)
noreply at issues.asterisk.org
Tue Jan 21 03:19:25 CST 2020
Marc Ketel created ASTERISK-28708:
-------------------------------------
Summary: Possible deadlock between try_calling() and 'queue show'
Key: ASTERISK-28708
URL: https://issues.asterisk.org/jira/browse/ASTERISK-28708
Project: Asterisk
Issue Type: Bug
Security Level: None
Components: Applications/app_queue
Affects Versions: 16.7.0
Environment: Running asterisk-16.7.0.tar.gz on 'CentOS Linux release 7.7.1908 (Core)' hyper-v VM with 8 E5-2630 cores and 8GB RAM.
Reporter: Marc Ketel
In production once every 2 weeks or so Asterisk freezes up. Asterisk process is running, but console commands are not responding and no calls are beeing processed.
In development I tried crashing Asterisk faster. That succeeded eventually and can now reproduce issue in a few minutes.
Setup:
1: 40 agents in a queue that pickup the calls within a second or so and hangup also within a second or so. Call duration does not seem to influence the deadlock.
2: Another Asterisk system initiates about 2 to 5 calls per second.
Let this run for a few minutes and a deadlock occurs.
At some point you will get some (possibly unrelated) error messages:
Jan 10 07:47:25 pbx-dev-7-stage1 asterisk[206462]: WARNING[42989][C-000049e1]: channel.c:1124 in __ast_queue_frame: Exceptionally long voice queue length queuing to Local/3166 at queuebellen-002bccd8;1
or
Jan 10 07:47:57 pbx-dev-7-stage1 asterisk[206462]: WARNING[206485]: taskprocessor.c:1160 in taskprocessor_push: The 'stasis/m:devicestate:all-00000003' task processor queue reached 500 scheduled tasks.
These error messages do no show every time in the reproduction on development, but in production the 'Exceptionally long voice queue length queuing' error seems to be a reliable indictator of the deadlock.
I recompiled Asterisk with: MENUSELECT_CFLAGS=LOADABLE_MODULES DONT_OPTIMIZE BETTER_BACKTRACES DEBUG_THREADS
Reproduction must take place with as many cpu cores as you can get. With 2 cores the issue cannot be reproduced, 8 or more seem to be needed to reproduce within minutes.
On all reproductions I ended up with an empty locks.txt file, but only once the 'core show locks' on the Asterisk console did output a lock overview.
Let me try to somehow attach the files from "ast_coredumper --RUNNING --no-default-search", I do not seem to have this option in the Create Issue screen.
The workaround I implemented is to use the taskset command to allow Asterisk to only 1 cpu core. This seems to prevent the deadlock. Using only 1 cpu core I cannot reproduce the issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list