[asterisk-bugs] [JIRA] (ASTERISK-28125) app_queue: Revert broken queue channel reference patch

Friendly Automation (JIRA) noreply at issues.asterisk.org
Mon Dec 3 15:09:47 CST 2018


     [ https://issues.asterisk.org/jira/browse/ASTERISK-28125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Friendly Automation closed ASTERISK-28125.
------------------------------------------

    Resolution: Fixed

> app_queue: Revert broken queue channel reference patch
> ------------------------------------------------------
>
>                 Key: ASTERISK-28125
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-28125
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_queue
>    Affects Versions: 16.0.0
>            Reporter: lvl
>            Assignee: lvl
>
> Commit 6409e7b11a2310196a9978b30a6b79e2760be592 introduces references from a queue to the caller & member channels, in an attempt to fix ASTERISK-25185. The patch is incomplete however and introduces regressions, so I propose reverting it until the root cause can be properly fixed. In the mean time, safely catch the fallout from this problem and report an error instead of crashing.
> This all started with ASTERISK-25185. It was reported that app_queue's handle_transfer methods are occasionally unable to retrieve a channel snapshot from stasis. This would segfault Asterisk because of referencing a NULL pointer. In an attempt to fix it, each queue call got a reference to the member_channel and caller_channel, which are supposed to get cleaned up after the queue call is done.
> However, that patch doesn't work properly. I'm not sure why, but despite the added channel references it's possible for `ast_channel_snapshot_get_latest(queue_data->caller_uniqueid)` or `ast_channel_snapshot_get_latest(queue_data->member_uniqueid)` to return NULL. I've experienced this myself on multiple occasions, and there are various other reports at ASTERISK-27006.
> Secondly, the patch introduces a regression where a queue call causes ghost channels that don't disappear until Asterisk is restarted. Again I'm not sure about the root cause, but apparently queue_stasis_data_alloc is not always executed for some scenarios. I suspect this is caused by the Local channel driver not always properly passing data through in case of transfers/bridges/masquerades/surrogates/etc. Other symptoms besides ghost channels are missing hangup events on local channels. This was already reported here: ASTERISK-25844
> I wrote a patch that reverts 6409e7b11a2310196a9978b30a6b79e2760be592 and adds proper NULL-checking for all queue event code, so a warning is logged when the scenario occurs. Not perfect, but certainly better than the segfaults that have been present for the past two years. I recommend merging this until the root cause for these stasis timing issues has been fixed. This patch passes all of my tests, I'll submit it to Gerrit so it's checked against the Asterisk testsuite as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list