[asterisk-bugs] [JIRA] (ASTERISK-28972) FRACK! + task processor queue issue

Tue Jun 30 10:46:25 CDT 2020

    [ https://issues.asterisk.org/jira/browse/ASTERISK-28972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=251314#comment-251314 ] 

Cyril Ramière commented on ASTERISK-28972:
------------------------------------------

Hello Joshua,

Usage of the system is to handle calls from our customers.
Some of the impacted instances had very low trafic like 20-50 calls @ ~1 CPS
Others were more busy with 200-300 calls @ 2-3 CPS
We use realtime, so the dialplan is a one-liner that goes to stasis when a call enters.
It never happened before during the testing of this version, it was deployed since about 10 days.
On the impacted instances, even without any active calls, trying to do an outgoing call using "channel originate PJSIP [...]" (without stasis) resulted in a 9-10 sec delay between the time I pressed enter and the time our SBC received the INVITE.
We use WebRTC & recording.

After reviewing the logs I have one instance on which we don't use WebRTC and it has the issue too:

[Jun 30 09:00:47] WARNING[3924][C-00020378] taskprocessor.c: The 'stasis/m:rtp:all-00000646' task processor queue reached 500 scheduled tasks again.
[Jun 30 09:01:21] WARNING[2154] taskprocessor.c: The 'stasis/m:rtp:all-00000646' task processor queue reached 500 scheduled tasks again.
[Jun 30 09:01:41] ERROR[2889] chan_pjsip.c: Session already DISCONNECTED [reason=200 (OK)]
[Jun 30 09:01:56] WARNING[4521][C-000203a9] taskprocessor.c: The 'stasis/m:rtp:all-00000646' task processor queue reached 500 scheduled tasks again.
[Jun 30 09:02:39] WARNING[2154] taskprocessor.c: The 'stasis/m:rtp:all-00000646' task processor queue reached 500 scheduled tasks again.
[later .. FRACK/ bad magic error]

So, it seems that is not related to WebRTC.

> FRACK! + task processor queue issue
> -----------------------------------
>
>                 Key: ASTERISK-28972
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-28972
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_pjsip
>    Affects Versions: 16.10.0
>         Environment: AWS Ubuntu 16.04
>            Reporter: Cyril Ramière
>            Assignee: Cyril Ramière
>              Labels: webrtc
>
> Hello everyone,
> Today I had a big issue with multiple asterisks across multiple machines.
> The asterisks have not crashed, but they was in an unstable state (some calls working, some not) with an unusual 9-10 seconds delay on all calls.
> For the incoming calls, when our SBC send the INVITE, asterisk had a 9-10 seconds delay replying to that invite.
> For outgoing calls (I even tried directly in the console using channel originate) the same delay was present between the time that I hit enter and the time when the INVITE is really send.
> I saw on the log multiples messages related to the task processor queue, those messages remained even after there was no active calls.
> It seems that there are no correlation between the issue and the amount of calls that my machines had, some had like 50 calls others like 300 calls...
> Here are some samples of error messages:
> ----------
> [Jun 30 08:44:20] ERROR[5871] res_pjsip_session.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f8ff11751b0 (0)
> [Jun 30 08:44:20] ERROR[5871] : Got 13 backtrace records
> # 0: /usr/sbin/asterisk() [0x45c977]
> # 1: /usr/sbin/asterisk() [0x45ff1b]
> # 2: /usr/sbin/asterisk(__ao2_find+0x28) [0x460108]
> # 3: /usr/lib/asterisk/modules/res_pjsip_session.so(ast_sip_session_get_datastore+0x31) [0x7f8f8d0a2491]
> # 4: /usr/lib/asterisk/modules/res_pjsip_header_funcs.so(+0x1f56) [0x7f8f8ba52f56]
> # 5: /usr/lib/asterisk/modules/res_pjsip.so(+0x11730) [0x7f8f8f51f730]
> # 6: /usr/sbin/asterisk(ast_taskprocessor_execute+0xce) [0x59b61e]
> # 7: /usr/sbin/asterisk() [0x5a2d10]
> # 8: /usr/sbin/asterisk(ast_taskprocessor_execute+0xce) [0x59b61e]
> # 9: /usr/sbin/asterisk() [0x5a34b0]
> #10: /usr/sbin/asterisk() [0x5ab34c]
> #11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f8ff96a76ba]
> #12: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8ff8c8141d]
> [Jun 30 08:56:09] WARNING[28633][C-000085dc] taskprocessor.c: The 'stasis/m:rtp:all-0000064a' task processor queue reached 500 scheduled tasks.
> [Jun 30 09:01:48] WARNING[1977] taskprocessor.c: The 'stasis/m:rtp:all-0000064a' task processor queue reached 500 scheduled tasks again.
> [...]
> [Jun 30 09:41:38] ERROR[21458] res_pjsip_session.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f8ff0c6bc70 (0)
> [Jun 30 09:41:38] ERROR[21458] : Got 13 backtrace records
> # 0: /usr/sbin/asterisk() [0x45c977]
> # 1: /usr/sbin/asterisk() [0x45ff1b]
> # 2: /usr/sbin/asterisk(__ao2_find+0x28) [0x460108]
> # 3: /usr/lib/asterisk/modules/res_pjsip_session.so(ast_sip_session_get_datastore+0x31) [0x7f8f8d0a2491]
> # 4: /usr/lib/asterisk/modules/res_pjsip_header_funcs.so(+0x1f56) [0x7f8f8ba52f56]
> # 5: /usr/lib/asterisk/modules/res_pjsip.so(+0x11730) [0x7f8f8f51f730]
> # 6: /usr/sbin/asterisk(ast_taskprocessor_execute+0xce) [0x59b61e]
> # 7: /usr/sbin/asterisk() [0x5a2d10]
> # 8: /usr/sbin/asterisk(ast_taskprocessor_execute+0xce) [0x59b61e]
> # 9: /usr/sbin/asterisk() [0x5a34b0]
> #10: /usr/sbin/asterisk() [0x5ab34c]
> #11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f8ff96a76ba]
> #12: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8ff8c8141d]
> ----------
> Since this is running in production, I don't have a coredump, and I wasn't able to reproduce (yet) the issue in develop.
> I checked everything I can (instances, network, dns resolution, ...) and found nothing unusual.
> Restarting asterisk fixed the issue, now I'm fearing that it will happen again.
> Any thoughts? I don't know what I'm looking, the the FRACK messages means nothing for me.
> Thanks

--
This message was sent by Atlassian JIRA
(v6.2#6252)