[asterisk-bugs] [JIRA] (ASTERISK-22976) app_queue function queue_show() and find_queue_by_name_rt() cause deadlock

Leif Einar Aune (JIRA) noreply at issues.asterisk.org
Mon Sep 14 13:05:43 CDT 2020


    [ https://issues.asterisk.org/jira/browse/ASTERISK-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=252022#comment-252022 ] 

Leif Einar Aune commented on ASTERISK-22976:
--------------------------------------------

We are running 13.29.2, and also observe a deadlock in app_queue.c while busy hour of incoming calls in combination with  extensive use of queue_show.

Seeing that __queues_show() locks the global queues pointer mutex for a long time (effectively blocking other threads from iterating or finding queues), we tried to remove the lock and use a locking iterator instead of ao2_iterator_init(queues, AO2_ITERATOR_DONTLOCK);

The deadlock problem now vanished! 

It seems safe to use the locking iterator in the __queues_show() function instead of locking the queues mutex during the complete iteration, and we are considering providing a patch with this fix via gerrit.

> app_queue function queue_show() and find_queue_by_name_rt() cause deadlock 
> ---------------------------------------------------------------------------
>
>                 Key: ASTERISK-22976
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-22976
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_queue
>    Affects Versions: 1.8.24.0, 13.18.4
>         Environment: CentOS5.8 X64 DellR410 Asterisk 1.8 trunk version
>            Reporter: Aaron An
>            Assignee: Unassigned
>            Severity: Critical
>
> I use "queue show xxxx" to monitor queue status, and use realtime queue. concurrency is about 100 calls. deadlock will be occur after 10-30minutes.
> analysis result:
> in find_queue_by_name_rt() first lock single queue "ao2_lock(q);" and then lock global queues "queues_t_unlink(queues, q, "Unused; removing from container");";
> in __queues_show() first lock global queues "ao2_lock(queues);" then lock single queue "ao2_lock(q);"
> so it causes dead lock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list