[asterisk-bugs] [JIRA] (ASTERISK-29882) Occasional segfaults in production

William ML Leslie (JIRA) noreply at issues.asterisk.org
Sun Oct 23 20:41:09 CDT 2022


    [ https://issues.asterisk.org/jira/browse/ASTERISK-29882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=260485#comment-260485 ] 

William ML Leslie commented on ASTERISK-29882:
----------------------------------------------

Greetings, we have been hitting this issue too; I've found what's going on.  Here's the relevant portion of a traceback.

{code}
#7  0x00007f096c4439a5 in publish_chanspy_message (snoop=0x7f09540a8e58, start=0) at res_stasis_snoop.c:138
#8  0x00007f096c443f76 in snoop_hangup (chan=0x7f0954009900) at res_stasis_snoop.c:228
#9  0x0000564b94998096 in ast_hangup (chan=0x7f0954009900) at channel.c:2612
#10 0x00007f096c4447bf in stasis_app_control_snoop (chan=0x7f0968294b50, spy=STASIS_SNOOP_DIRECTION_IN, whisper=STASIS_SNOOP_DIRECTION_NONE, app=0x7f0954066340 "our-application", app_args=0x0, 
    snoop_id=0x7f095404f3d8 "1666316293.2651-in-snoop") at res_stasis_snoop.c:380
#11 0x00007f096c3709d8 in ari_channels_handle_snoop_channel (args_channel_id=0x7f0954026c8a "1666316293.2651", args_spy=0x7f095407ebf0 "in", args_whisper=0x7f09540accb0 "none", 
    args_app=0x7f0954066340 "our-application", args_app_args=0x0, args_snoop_id=0x7f095404f3d8 "1666316293.2651-in-snoop", response=0x7f092b7dab20) at ari/resource_channels.c:1638
{code}

Key here is that if stasis_app_control_snoop fails in some way, it hangs up on the fresh channel it has created.  The snoop channel has been set as this channel's tech on line 351:
{code}
ast_channel_tech_set(snoop->chan, &snoop_tech);
{code}

As part of the channel hangup process, the tech's hangup callback is invoked, so hanging up on the channel ends up calling snoop_hangup on a snoop that has not been fully initialised, notably that doesn't have spyee_chan set.

There are a couple of ways we could fix this.  I'll get a reproducer to see if I can rule either one out.

One possibility is simply checking for null snoop->spyee_chan in publish_chanspy_message.  The function is already set up to handle messaging with no spyee_snapshot, so it's valid in a schema sense.  The real question is whether the resulting messages will be confusing - talking about a channel that is being removed, with no association to an existing channel.

We could alternatively set snoop->spyee_chan earlier in stasis_app_control_snoop and bump its reference count.  This would lead to more sensible messages, and the additional reference would be destroyed when the function returns and deallocates the snoop.  I don't know if I'm missing any other invariants that prevent this from working.  This is probably the better approach.

I'm assuming that we do intend to send these messages if we fail to start a snoop.

> Occasional segfaults in production
> ----------------------------------
>
>                 Key: ASTERISK-29882
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-29882
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: General
>    Affects Versions: 18.8.0
>         Environment: Debian Stretch in Docker
>            Reporter: Duncan
>            Assignee: Unassigned
>         Attachments: coreFile-thread1.txt, Screenshot from 2021-12-20 09-09-29.png
>
>
> Hi team,
> We are seeing Asterisk instances occasionally segfault in production. This happens in our voicemail handling stasis app, which includes playing media to the caller, and then recording the caller, managed through ARI. We don't know what the steps are to reproduce the issue as we have never reproduced it ourselves, but suspect it is probably some kind of race condition.
> I have attached coreFile-thread1.txt from the core dump. We have all the other core dump *.txt files but would prefer not to post these publicly in case they could contain any sensitive data. I would be happy to provide these directly to a maintainer via email. We’re happy to assist debugging this issue if at all possible, and have some C experience but no experience working on Asterisk. Please get in touch if we can do anything to help troubleshoot further.
> Many thanks



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list