[asterisk-bugs] [JIRA] (ASTERISK-28708) Possible deadlock between try_calling() and 'queue show'

Asterisk Team (JIRA) noreply at issues.asterisk.org
Tue Jan 21 03:19:25 CST 2020


    [ https://issues.asterisk.org/jira/browse/ASTERISK-28708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=249495#comment-249495 ] 

Asterisk Team commented on ASTERISK-28708:
------------------------------------------

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

> Possible deadlock between try_calling() and 'queue show'
> --------------------------------------------------------
>
>                 Key: ASTERISK-28708
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-28708
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_queue
>    Affects Versions: 16.7.0
>         Environment: Running asterisk-16.7.0.tar.gz on 'CentOS Linux release 7.7.1908 (Core)' hyper-v VM with 8 E5-2630 cores and 8GB RAM.
>            Reporter: Marc Ketel
>
> In production once every 2 weeks or so Asterisk freezes up. Asterisk process is running, but console commands are not responding and no calls are beeing processed.
> In development I tried crashing Asterisk faster. That succeeded eventually and can now reproduce issue in a few minutes. 
> Setup: 
> 1: 40 agents in a queue that pickup the calls within a second or so and hangup also within a second or so. Call duration does not seem to influence the deadlock.
> 2: Another Asterisk system initiates about 2 to 5 calls per second.
> Let this run for a few minutes and a deadlock occurs.
> At some point you will get some (possibly unrelated) error messages:
> Jan 10 07:47:25 pbx-dev-7-stage1 asterisk[206462]: WARNING[42989][C-000049e1]: channel.c:1124 in __ast_queue_frame: Exceptionally long voice queue length queuing to Local/3166 at queuebellen-002bccd8;1
> or
> Jan 10 07:47:57 pbx-dev-7-stage1 asterisk[206462]: WARNING[206485]: taskprocessor.c:1160 in taskprocessor_push: The 'stasis/m:devicestate:all-00000003' task processor queue reached 500 scheduled tasks.
> These error messages do no show every time in the reproduction on development, but in production the 'Exceptionally long voice queue length queuing' error seems to be a reliable indictator of the deadlock.
> I recompiled Asterisk with: MENUSELECT_CFLAGS=LOADABLE_MODULES DONT_OPTIMIZE BETTER_BACKTRACES DEBUG_THREADS
> Reproduction must take place with as many cpu cores as you can get. With 2 cores the issue cannot be reproduced, 8 or more seem to be needed to reproduce within minutes.
> On all reproductions I ended up with an empty locks.txt file, but only once the 'core show locks' on the Asterisk console did output a lock overview.
> Let me try to somehow attach the files from "ast_coredumper --RUNNING --no-default-search", I do not seem to have this option in the Create Issue screen.
> The workaround I implemented is to use the taskset command to allow Asterisk to only 1 cpu core. This seems to prevent the deadlock. Using only 1 cpu core I cannot reproduce the issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list