[asterisk-bugs] [JIRA] (ASTERISK-26420) ARI: Crash when 1 of the channel is leaving a bridge with more participants

Javier Riveros (JIRA) noreply at issues.asterisk.org
Wed Sep 28 13:31:01 CDT 2016


     [ https://issues.asterisk.org/jira/browse/ASTERISK-26420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Javier Riveros  updated ASTERISK-26420:
---------------------------------------

    Description: 
Basically the following scenario is making asterisk crash,

1) userA is using a WebRTC app to make a outbound call with the same app the user could perform an attended transfer so userA put callee1 in moh then make a call to callee2, userA regrets of calling callee2 "mistake" so hangup that call 'this callee2 call is important because it enter into the same bridge but it is kick when the userA hangup that call'  then userA finally call the proper callee3 send a dtmf/unhold and finally include callee3 into the bridge. as for now everything is ok here, the problem comes when the userA leave this bridge and the operations/events for hanging up userA channel happens *all of this is controlled via ARI*.

With this scenario asterisk complains when the userA hangup the call,

{noformat}
ERROR[21457]: astobj2.c:126 INTERNAL_OBJ: bad magic number 0xdeaddead for object 0x7fdbb4000d68
{noformat}

Asterisk continue working fine but if you run the same scenario 20 times  eventually asterisk will crash.

the coredumps has this flags enabled
{{DONT_OPTIMIZE LOADABLE_MODULES BETTER_BACKTRACES BUILD_NATIVE OPTIONAL_API}}

I notice that the multiple times i make it crash the core dumps looks a bit different in 3 types which make me think there is a memory corruption or something related to it, i start seeing this crash in asterisk 13.9.1 and now i can reproduced in asterisk 13.11.2

independent of the version and how the coredumps looks like the important part is that the last ARI event reported by asterisk at the time “userA leave the bridge on the scenario described above” is when the crash start happening.

Last ARI log before crash start happening asterisk 13.9.1/13.11.2

{noformat}
[2016-09-28 17:37:27.086] DEBUG[13555] res_http_websocket.c: Writing websocket string of length 563
[2016-09-28 17:37:27.086] DEBUG[11148] ari/ari_websockets.c: Examining ARI event (length 556):
{"channel":{"id":"i-000d96771fad70989-1475084108.626","state":"Up","name":"PJSIP/sip-webrtc-00000272","caller":{"name":"","number":"sip-webrtc"},"language":"en","connected":{"name":"Unknown","number":""},"accountcode":"","dialplan":{"context":"default","exten":"+18018852300","priority":4},"creationtime":"2016-09-28T17:35:09.003+0000"},"application":"5-call-e6cd57c6-85a1-11e6-9c63-bb94080041a5-v3","variable":"BRIDGEPEER","type":"ChannelVarset","value":"PJSIP/sip-standard-000002a6,PJSIP/sip-standard-00000273","timestamp":"2016-09-28T17:37:27.086+0000"}
[2016-09-28 17:37:27.086] DEBUG[13555] res_http_websocket.c: Writing websocket text frame, length 563
[2016-09-28 17:37:27.086] DEBUG[11148] res_http_websocket.c: Writing websocket string of length 556
[2016-09-28 17:37:27.086] DEBUG[11148] res_http_websocket.c: Writing websocket text frame, length 556
{noformat}


The coredumps commonly in asterisk 13.9.1/13.11.2 would look like this

{noformat}
#0  0x00007f985df018e5 in json_copy () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
#1  0x0000000000535df8 in ast_json_copy (value=0x7f9810602538) at json.c:714
#2  0x00000000005db206 in channel_blob_to_json (message=0x7f9810267328, type=0x668ca7 "ChannelVarset",
    sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:981
#3  0x00000000005db45f in varset_to_json (message=0x7f9810267328, sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:1048
#4  0x00000000005de309 in stasis_message_to_json (msg=0x7f9810267328, sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_message.c:195
#5  0x00007f985a8988f1 in sub_default_handler (data=0x7f9810189778, sub=0x7f9810461ad8, message=0x7f9810267328) at stasis/app.c:346
#6  0x00000000005de971 in router_dispatch (data=0x7f98109957c8, sub=0x7f9810461ad8, message=0x7f9810267328)
    at stasis_message_router.c:201
#7  0x00000000005cd5ce in subscription_invoke (sub=0x7f9810461ad8, message=0x7f9810267328) at stasis.c:433
#8  0x00000000005ce1b4 in dispatch_exec_async (local=0x7f97ed940db0) at stasis.c:714
#9  0x00000000005eb43f in ast_taskprocessor_execute (tps=0x7f9810461bb8) at taskprocessor.c:936
#10 0x00000000005e988a in default_tps_processing_function (data=0x7f98105a0e28) at taskprocessor.c:183
#11 0x00000000006003aa in dummy_start (data=0x7f98109a9420) at utils.c:1235
#12 0x00007f985d6a2184 in start_thread (arg=0x7f97ed941700) at pthread_create.c:312
#13 0x00007f985c96237d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
#0  0x00007f985df018e5 in json_copy () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
No symbol table info available.
#1  0x0000000000535df8 in ast_json_copy (value=0x7f9810602538) at json.c:714
No locals.
#2  0x00000000005db206 in channel_blob_to_json (message=0x7f9810267328, type=0x668ca7 "ChannelVarset",
    sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:981
        out = 0x0
        channel_blob = 0x7f98107ab438
        blob = 0x7f9810602538
        snapshot = 0x7f9810a77200
        tv = 0x7f9810267328
        res = 0
{noformat}

also the kernel logs the segfault like this.

{noformat}
2016-09-28 17:37:27.087 | info | kernel[-] |[ 2589.878464] asterisk[11170]: segfault at a ip 00007f985df018e5 sp 00007f97ed940b80 error 4 in libjansson.so.4.5.0[7f985defa000+a000]
{noformat}

I will attach the coredumps/logs and important information for this crash for asterisk 13.9.1 and asterisk 13.11.2 if that helps, if you need i enable or repeat the crash with an extra cflag let me know this could be reproduce in 20 min and the most interesting part of this crash is if i compile with {{MALLOC_DEBUG}} I’m unable to make it crash again ( at least in 24h test window).


  was:

Basically the following scenario is making asterisk crash,

1) userA is using a WebRTC app to make a outbound call with the same app the user could perform an attended transfer so userA put callee1 in moh then make a call to callee2, userA regrets of calling callee2 "mistake" so hangup that call 'this callee2 call is important because it enter into the same bridge but it is kick when the userA hangup that call'  then userA finally call the proper callee3 send a dtmf/unhold and finally include callee3 into the bridge. as for now everything is ok here, the problem comes when the userA leave this bridge and the operations/events for hanging up userA channel happens *all of this is controlled via ARI*.

With this scenario asterisk complains when the userA hangup the call,

{noformat}
ERROR[21457]: astobj2.c:126 INTERNAL_OBJ: bad magic number 0xdeaddead for object 0x7fdbb4000d68
{noformat}

Asterisk continue working fine but if you run the same scenario 20 times  eventually asterisk will crash.

the coredumps has this flags enabled
{{DONT_OPTIMIZE LOADABLE_MODULES BETTER_BACKTRACES BUILD_NATIVE OPTIONAL_API }}

I notice that the multiple times i make it crash the core dumps looks a bit different in 3 types which make me think there is a memory corruption or something related to it, i start seeing this crash in asterisk 13.9.1 and now i can reproduced in asterisk 13.11.2

independent of the version and how the coredumps looks like the important part is that the last ARI event reported by asterisk at the time “userA leave the bridge on the scenario described above” is when the crash start happening.

Last ARI log before crash start happening asterisk 13.9.1/13.11.2

{noformat}
[2016-09-28 17:37:27.086] DEBUG[13555] res_http_websocket.c: Writing websocket string of length 563
[2016-09-28 17:37:27.086] DEBUG[11148] ari/ari_websockets.c: Examining ARI event (length 556):
{"channel":{"id":"i-000d96771fad70989-1475084108.626","state":"Up","name":"PJSIP/sip-webrtc-00000272","caller":{"name":"","number":"sip-webrtc"},"language":"en","connected":{"name":"Unknown","number":""},"accountcode":"","dialplan":{"context":"default","exten":"+18018852300","priority":4},"creationtime":"2016-09-28T17:35:09.003+0000"},"application":"5-call-e6cd57c6-85a1-11e6-9c63-bb94080041a5-v3","variable":"BRIDGEPEER","type":"ChannelVarset","value":"PJSIP/sip-standard-000002a6,PJSIP/sip-standard-00000273","timestamp":"2016-09-28T17:37:27.086+0000"}
[2016-09-28 17:37:27.086] DEBUG[13555] res_http_websocket.c: Writing websocket text frame, length 563
[2016-09-28 17:37:27.086] DEBUG[11148] res_http_websocket.c: Writing websocket string of length 556
[2016-09-28 17:37:27.086] DEBUG[11148] res_http_websocket.c: Writing websocket text frame, length 556
{noformat}


The coredumps commonly in asterisk 13.9.1/13.11.2 would look like this

{noformat}
#0  0x00007f985df018e5 in json_copy () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
#1  0x0000000000535df8 in ast_json_copy (value=0x7f9810602538) at json.c:714
#2  0x00000000005db206 in channel_blob_to_json (message=0x7f9810267328, type=0x668ca7 "ChannelVarset",
    sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:981
#3  0x00000000005db45f in varset_to_json (message=0x7f9810267328, sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:1048
#4  0x00000000005de309 in stasis_message_to_json (msg=0x7f9810267328, sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_message.c:195
#5  0x00007f985a8988f1 in sub_default_handler (data=0x7f9810189778, sub=0x7f9810461ad8, message=0x7f9810267328) at stasis/app.c:346
#6  0x00000000005de971 in router_dispatch (data=0x7f98109957c8, sub=0x7f9810461ad8, message=0x7f9810267328)
    at stasis_message_router.c:201
#7  0x00000000005cd5ce in subscription_invoke (sub=0x7f9810461ad8, message=0x7f9810267328) at stasis.c:433
#8  0x00000000005ce1b4 in dispatch_exec_async (local=0x7f97ed940db0) at stasis.c:714
#9  0x00000000005eb43f in ast_taskprocessor_execute (tps=0x7f9810461bb8) at taskprocessor.c:936
#10 0x00000000005e988a in default_tps_processing_function (data=0x7f98105a0e28) at taskprocessor.c:183
#11 0x00000000006003aa in dummy_start (data=0x7f98109a9420) at utils.c:1235
#12 0x00007f985d6a2184 in start_thread (arg=0x7f97ed941700) at pthread_create.c:312
#13 0x00007f985c96237d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
#0  0x00007f985df018e5 in json_copy () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
No symbol table info available.
#1  0x0000000000535df8 in ast_json_copy (value=0x7f9810602538) at json.c:714
No locals.
#2  0x00000000005db206 in channel_blob_to_json (message=0x7f9810267328, type=0x668ca7 "ChannelVarset",
    sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:981
        out = 0x0
        channel_blob = 0x7f98107ab438
        blob = 0x7f9810602538
        snapshot = 0x7f9810a77200
        tv = 0x7f9810267328
        res = 0
{noformat}

also the kernel logs the segfault like this.

{noformat}
2016-09-28 17:37:27.087 | info | kernel[-] |[ 2589.878464] asterisk[11170]: segfault at a ip 00007f985df018e5 sp 00007f97ed940b80 error 4 in libjansson.so.4.5.0[7f985defa000+a000]
{noformat}

I will attach the coredumps/logs and important information for this crash for asterisk 13.9.1 and asterisk 13.11.2 if that helps, if you need i enable or repeat the crash with an extra cflag let me know this could be reproduce in 20 min and the most interesting part of this crash is if i compile with {{MALLOC_DEBUG}} I’m unable to make it crash again ( at least in 24h test window).



> ARI: Crash when 1 of the channel is leaving a bridge with more participants
> ---------------------------------------------------------------------------
>
>                 Key: ASTERISK-26420
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26420
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_ari, Resources/res_pjsip
>    Affects Versions: 13.9.1, 13.11.2
>         Environment: UBUNTU 14.04 
> 4.4.0-38-generic #57~14.04.1-Ubuntu SMP UTC 2016 x86_64 
> AWS server c3.large
>            Reporter: Javier Riveros 
>         Attachments: asterisk_core_dumps_13_11_2.zip, asterisk_core_dumps_13_9_1.zip
>
>
> Basically the following scenario is making asterisk crash,
> 1) userA is using a WebRTC app to make a outbound call with the same app the user could perform an attended transfer so userA put callee1 in moh then make a call to callee2, userA regrets of calling callee2 "mistake" so hangup that call 'this callee2 call is important because it enter into the same bridge but it is kick when the userA hangup that call'  then userA finally call the proper callee3 send a dtmf/unhold and finally include callee3 into the bridge. as for now everything is ok here, the problem comes when the userA leave this bridge and the operations/events for hanging up userA channel happens *all of this is controlled via ARI*.
> With this scenario asterisk complains when the userA hangup the call,
> {noformat}
> ERROR[21457]: astobj2.c:126 INTERNAL_OBJ: bad magic number 0xdeaddead for object 0x7fdbb4000d68
> {noformat}
> Asterisk continue working fine but if you run the same scenario 20 times  eventually asterisk will crash.
> the coredumps has this flags enabled
> {{DONT_OPTIMIZE LOADABLE_MODULES BETTER_BACKTRACES BUILD_NATIVE OPTIONAL_API}}
> I notice that the multiple times i make it crash the core dumps looks a bit different in 3 types which make me think there is a memory corruption or something related to it, i start seeing this crash in asterisk 13.9.1 and now i can reproduced in asterisk 13.11.2
> independent of the version and how the coredumps looks like the important part is that the last ARI event reported by asterisk at the time “userA leave the bridge on the scenario described above” is when the crash start happening.
> Last ARI log before crash start happening asterisk 13.9.1/13.11.2
> {noformat}
> [2016-09-28 17:37:27.086] DEBUG[13555] res_http_websocket.c: Writing websocket string of length 563
> [2016-09-28 17:37:27.086] DEBUG[11148] ari/ari_websockets.c: Examining ARI event (length 556):
> {"channel":{"id":"i-000d96771fad70989-1475084108.626","state":"Up","name":"PJSIP/sip-webrtc-00000272","caller":{"name":"","number":"sip-webrtc"},"language":"en","connected":{"name":"Unknown","number":""},"accountcode":"","dialplan":{"context":"default","exten":"+18018852300","priority":4},"creationtime":"2016-09-28T17:35:09.003+0000"},"application":"5-call-e6cd57c6-85a1-11e6-9c63-bb94080041a5-v3","variable":"BRIDGEPEER","type":"ChannelVarset","value":"PJSIP/sip-standard-000002a6,PJSIP/sip-standard-00000273","timestamp":"2016-09-28T17:37:27.086+0000"}
> [2016-09-28 17:37:27.086] DEBUG[13555] res_http_websocket.c: Writing websocket text frame, length 563
> [2016-09-28 17:37:27.086] DEBUG[11148] res_http_websocket.c: Writing websocket string of length 556
> [2016-09-28 17:37:27.086] DEBUG[11148] res_http_websocket.c: Writing websocket text frame, length 556
> {noformat}
> The coredumps commonly in asterisk 13.9.1/13.11.2 would look like this
> {noformat}
> #0  0x00007f985df018e5 in json_copy () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
> #1  0x0000000000535df8 in ast_json_copy (value=0x7f9810602538) at json.c:714
> #2  0x00000000005db206 in channel_blob_to_json (message=0x7f9810267328, type=0x668ca7 "ChannelVarset",
>     sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:981
> #3  0x00000000005db45f in varset_to_json (message=0x7f9810267328, sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:1048
> #4  0x00000000005de309 in stasis_message_to_json (msg=0x7f9810267328, sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_message.c:195
> #5  0x00007f985a8988f1 in sub_default_handler (data=0x7f9810189778, sub=0x7f9810461ad8, message=0x7f9810267328) at stasis/app.c:346
> #6  0x00000000005de971 in router_dispatch (data=0x7f98109957c8, sub=0x7f9810461ad8, message=0x7f9810267328)
>     at stasis_message_router.c:201
> #7  0x00000000005cd5ce in subscription_invoke (sub=0x7f9810461ad8, message=0x7f9810267328) at stasis.c:433
> #8  0x00000000005ce1b4 in dispatch_exec_async (local=0x7f97ed940db0) at stasis.c:714
> #9  0x00000000005eb43f in ast_taskprocessor_execute (tps=0x7f9810461bb8) at taskprocessor.c:936
> #10 0x00000000005e988a in default_tps_processing_function (data=0x7f98105a0e28) at taskprocessor.c:183
> #11 0x00000000006003aa in dummy_start (data=0x7f98109a9420) at utils.c:1235
> #12 0x00007f985d6a2184 in start_thread (arg=0x7f97ed941700) at pthread_create.c:312
> #13 0x00007f985c96237d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> #0  0x00007f985df018e5 in json_copy () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
> No symbol table info available.
> #1  0x0000000000535df8 in ast_json_copy (value=0x7f9810602538) at json.c:714
> No locals.
> #2  0x00000000005db206 in channel_blob_to_json (message=0x7f9810267328, type=0x668ca7 "ChannelVarset",
>     sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:981
>         out = 0x0
>         channel_blob = 0x7f98107ab438
>         blob = 0x7f9810602538
>         snapshot = 0x7f9810a77200
>         tv = 0x7f9810267328
>         res = 0
> {noformat}
> also the kernel logs the segfault like this.
> {noformat}
> 2016-09-28 17:37:27.087 | info | kernel[-] |[ 2589.878464] asterisk[11170]: segfault at a ip 00007f985df018e5 sp 00007f97ed940b80 error 4 in libjansson.so.4.5.0[7f985defa000+a000]
> {noformat}
> I will attach the coredumps/logs and important information for this crash for asterisk 13.9.1 and asterisk 13.11.2 if that helps, if you need i enable or repeat the crash with an extra cflag let me know this could be reproduce in 20 min and the most interesting part of this crash is if i compile with {{MALLOC_DEBUG}} I’m unable to make it crash again ( at least in 24h test window).



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list