[asterisk-bugs] [JIRA] (ASTERISK-28831) Leaking stasis subscriptions can linger indefinitely and brick Asterisk

Tue Apr 14 13:45:26 CDT 2020

     [ https://issues.asterisk.org/jira/browse/ASTERISK-28831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

George Joseph updated ASTERISK-28831:
-------------------------------------

    Status: Open  (was: Triage)

> Leaking stasis subscriptions can linger indefinitely and brick Asterisk
> -----------------------------------------------------------------------
>
>                 Key: ASTERISK-28831
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-28831
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/Stasis
>    Affects Versions: 16.9.0
>            Reporter: lvl
>
> Split off from ASTERISK-28829. I thought this deserves a separate ticket for clarity.
> A scenario such as the one in ASTERISK-28829 causes subscriptions to {{ast_channel_topic_all()}} to linger around indefinitely. These subscriptions come with a dedicated taskprocessor per subscriber and will receive *all* events from *all* active channels.
> After running with an affected Asterisk for a while (depending on how frequently your scenario occurs), you'll end up with hundreds of these lingering subscriptions. Running a "core show taskprocessors" will show hundreds of "stasis/p:channel:all" taskprocessors, with millions of processed events.
> At some point, Asterisk will be so busy delivering events to all these lingering subscribers that CPU usage will increase and regular call processing will start to fail.
> It's pretty hard to discover what's going on now. If you have chan_pjsip configured to reject calls when its taskprocessor is overloaded, you would see "Taskprocessor overload alert" but only on the debug level. The generic "Taskprocessor '%s' triggered the high water alert." message will trigger but also only on the debug level.
> Unless you know exactly where to look, your Asterisk will shorty become completely irresponsive to everything depending on stasis/task processors (pretty much everything) without any warnings at all.
> I propose that at the very least we should add more noticeable warning messages. For example..
> * When a task processor has processed more than X (millions of) items
> * When there are more than X (hundreds of) task processors
> * When the high water alert is reached (for a sustained period)
> Ideally, we would also prevent this scenario, because even if the root cause for ASTERISK-28829 is found and fixed, there might be more scenarios like it. For example..
> * Have a stasis subscription automatically detect that noone is really listening anymore
> .. but I am unsure to gauge how hard this would be.


--
This message was sent by Atlassian JIRA
(v6.2#6252)