[asterisk-bugs] [JIRA] (ASTERISK-26420) res_ari: Crash when 1 of the channel is leaving a bridge with more participants

Leonardo Santana (JIRA) noreply at issues.asterisk.org
Wed Dec 7 14:47:10 CST 2016


    [ https://issues.asterisk.org/jira/browse/ASTERISK-26420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=234118#comment-234118 ] 

Leonardo Santana commented on ASTERISK-26420:
---------------------------------------------

Hi All,

Is there any update on this thread? I'm asking because my asterisk 13.13.0 is eventually crashing and the core dump is similar to the one in this post. See below:

#0  0x00007fee17f63d50 in ?? () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
No symbol table info available.
#1  0x00007fee17f68ec0 in json_copy () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
No symbol table info available.
#2  0x000000000052f69c in ast_json_copy (value=0x7fede81dbf08) at json.c:835
No locals.
#3  0x00000000005cd661 in channel_blob_to_json (message=0x7fede823fa30, type=0x65a5c7 "ChannelVarset", sanitize=0x7fedb3fff250 <app_sanitizer>) at stasis_channels.c:981
        out = 0x0
        channel_blob = 0x7fede811b2c0
        blob = 0x7fede81dbf08
        snapshot = 0x7fede817e028
        tv = 0x7fede823fa30
        res = 0
#4  0x00000000005cd8a9 in varset_to_json (message=0x7fede823fa30, sanitize=0x7fedb3fff250 <app_sanitizer>) at stasis_channels.c:1048
No locals.
#5  0x00000000005d0708 in stasis_message_to_json (msg=0x7fede823fa30, sanitize=0x7fedb3fff250 <app_sanitizer>) at stasis_message.c:195
No locals.
#6  0x00007fedb3defecb in sub_default_handler (data=0x7fee04002fb0, sub=0x7fee04001430, message=0x7fede823fa30) at stasis/app.c:349
        app = 0x7fee04002fb0
        json = 0x0
#7  0x00000000005d0d7c in router_dispatch (data=0x7fee04001380, sub=0x7fee04001430, message=0x7fede823fa30) at stasis_message_router.c:201
        router = 0x7fee04001380
        route = {message_type = 0x0, callback = 0x7fedb3defe40 <sub_default_handler>, data = 0x7fee04002fb0}
#8  0x00000000005bf349 in subscription_invoke (sub=0x7fee04001430, message=0x7fede823fa30) at stasis.c:434
        __PRETTY_FUNCTION__ = "subscription_invoke"
#9  0x00000000005bff24 in dispatch_exec_async (local=0x7fee15c05da0) at stasis.c:715
        sub = 0x7fee04001430
        message = 0x7fede823fa30
#10 0x00000000005dd261 in ast_taskprocessor_execute (tps=0x7fee04001510) at taskprocessor.c:965
        local = {local_data = 0x7fee04001430, data = 0x7fede823fa30}
        t = 0x7fede806fc70
        size = 10
        __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#11 0x00000000005db5df in default_tps_processing_function (data=0x7fee04003710) at taskprocessor.c:185
        listener = 0x7fee04003710
        tps = 0x7fee04001510
        pvt = 0x7fee04003120
        sem_value = 364928720
        res = 0
        __PRETTY_FUNCTION__ = "default_tps_processing_function"
#12 0x00000000005f1569 in dummy_start (data=0x7fee04003480) at utils.c:1235
        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {0, 2328158857826215227, 0, 140660380275264, 17, 140660543874816, 2328158857817826619, -2336414809991338693}, __mask_was_saved = 0}}, __pad = {0x7fee15c05ef0, 0x0, 0x0,
            0x0}}
        __cancel_routine = 0x450365 <ast_unregister_thread>
        __cancel_arg = 0x7fee15c06700
        __not_first_call = 0
        ret = 0x7fee1676e878
        a = {start_routine = 0x5db549 <default_tps_processing_function>, data = 0x7fee04003710, name = 0x7fee04001640 "default_tps_processing_function started at [  202] taskprocessor.c default_listener_start()"}
#13 0x00007fee175030a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#14 0x00007fee164b504d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

I'm not able to reproduce or find the root cause of this issue however in my environment there are a lot of CALL_HOLD/MUTE/UNHOLD request.

Please, advise.

Thanks a lot,

Leo

> res_ari: Crash when 1 of the channel is leaving a bridge with more participants
> -------------------------------------------------------------------------------
>
>                 Key: ASTERISK-26420
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26420
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_ari, Resources/res_pjsip
>    Affects Versions: 13.9.1, 13.11.2
>         Environment: UBUNTU 14.04 
> 4.4.0-38-generic #57~14.04.1-Ubuntu SMP UTC 2016 x86_64 
> AWS server c3.large
>            Reporter: Javier Riveros 
>            Assignee: Unassigned
>         Attachments: asterisk_core_dumps_13_11_2.zip, asterisk_core_dumps_13_9_1.zip
>
>
> Basically the following scenario is making asterisk crash,
> 1) userA is using a WebRTC app to make a outbound call with the same app the user could perform an attended transfer so userA put callee1 in moh then make a call to callee2, userA regrets of calling callee2 "mistake" so hangup that call 'this callee2 call is important because it enter into the same bridge but it is kick when the userA hangup that call'  then userA finally call the proper callee3 send a dtmf/unhold and finally include callee3 into the bridge. as for now everything is ok here, the problem comes when the userA leave this bridge and the operations/events for hanging up userA channel happens *all of this is controlled via ARI*.
> With this scenario asterisk complains when the userA hangup the call,
> {noformat}
> ERROR[21457]: astobj2.c:126 INTERNAL_OBJ: bad magic number 0xdeaddead for object 0x7fdbb4000d68
> {noformat}
> Asterisk continue working fine but if you run the same scenario 20 times  eventually asterisk will crash.
> the coredumps has this flags enabled
> {{DONT_OPTIMIZE LOADABLE_MODULES BETTER_BACKTRACES BUILD_NATIVE OPTIONAL_API}}
> I notice that the multiple times i make it crash the core dumps looks a bit different in 3 types which make me think there is a memory corruption or something related to it, i start seeing this crash in asterisk 13.9.1 and now i can reproduced in asterisk 13.11.2
> independent of the version and how the coredumps looks like the important part is that the last ARI event reported by asterisk at the time “userA leave the bridge on the scenario described above” is when the crash start happening.
> Last ARI log before crash start happening asterisk 13.9.1/13.11.2
> {noformat}
> [2016-09-28 17:37:27.086] DEBUG[13555] res_http_websocket.c: Writing websocket string of length 563
> [2016-09-28 17:37:27.086] DEBUG[11148] ari/ari_websockets.c: Examining ARI event (length 556):
> {"channel":{"id":"i-000d96771fad70989-1475084108.626","state":"Up","name":"PJSIP/sip-webrtc-00000272","caller":{"name":"","number":"sip-webrtc"},"language":"en","connected":{"name":"Unknown","number":""},"accountcode":"","dialplan":{"context":"default","exten":"+18018852300","priority":4},"creationtime":"2016-09-28T17:35:09.003+0000"},"application":"5-call-e6cd57c6-85a1-11e6-9c63-bb94080041a5-v3","variable":"BRIDGEPEER","type":"ChannelVarset","value":"PJSIP/sip-standard-000002a6,PJSIP/sip-standard-00000273","timestamp":"2016-09-28T17:37:27.086+0000"}
> [2016-09-28 17:37:27.086] DEBUG[13555] res_http_websocket.c: Writing websocket text frame, length 563
> [2016-09-28 17:37:27.086] DEBUG[11148] res_http_websocket.c: Writing websocket string of length 556
> [2016-09-28 17:37:27.086] DEBUG[11148] res_http_websocket.c: Writing websocket text frame, length 556
> {noformat}
> The coredumps commonly in asterisk 13.9.1/13.11.2 would look like this
> {noformat}
> #0  0x00007f985df018e5 in json_copy () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
> #1  0x0000000000535df8 in ast_json_copy (value=0x7f9810602538) at json.c:714
> #2  0x00000000005db206 in channel_blob_to_json (message=0x7f9810267328, type=0x668ca7 "ChannelVarset",
>     sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:981
> #3  0x00000000005db45f in varset_to_json (message=0x7f9810267328, sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:1048
> #4  0x00000000005de309 in stasis_message_to_json (msg=0x7f9810267328, sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_message.c:195
> #5  0x00007f985a8988f1 in sub_default_handler (data=0x7f9810189778, sub=0x7f9810461ad8, message=0x7f9810267328) at stasis/app.c:346
> #6  0x00000000005de971 in router_dispatch (data=0x7f98109957c8, sub=0x7f9810461ad8, message=0x7f9810267328)
>     at stasis_message_router.c:201
> #7  0x00000000005cd5ce in subscription_invoke (sub=0x7f9810461ad8, message=0x7f9810267328) at stasis.c:433
> #8  0x00000000005ce1b4 in dispatch_exec_async (local=0x7f97ed940db0) at stasis.c:714
> #9  0x00000000005eb43f in ast_taskprocessor_execute (tps=0x7f9810461bb8) at taskprocessor.c:936
> #10 0x00000000005e988a in default_tps_processing_function (data=0x7f98105a0e28) at taskprocessor.c:183
> #11 0x00000000006003aa in dummy_start (data=0x7f98109a9420) at utils.c:1235
> #12 0x00007f985d6a2184 in start_thread (arg=0x7f97ed941700) at pthread_create.c:312
> #13 0x00007f985c96237d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> #0  0x00007f985df018e5 in json_copy () from /usr/lib/x86_64-linux-gnu/libjansson.so.4
> No symbol table info available.
> #1  0x0000000000535df8 in ast_json_copy (value=0x7f9810602538) at json.c:714
> No locals.
> #2  0x00000000005db206 in channel_blob_to_json (message=0x7f9810267328, type=0x668ca7 "ChannelVarset",
>     sanitize=0x7f985aaa88b0 <app_sanitizer>) at stasis_channels.c:981
>         out = 0x0
>         channel_blob = 0x7f98107ab438
>         blob = 0x7f9810602538
>         snapshot = 0x7f9810a77200
>         tv = 0x7f9810267328
>         res = 0
> {noformat}
> also the kernel logs the segfault like this.
> {noformat}
> 2016-09-28 17:37:27.087 | info | kernel[-] |[ 2589.878464] asterisk[11170]: segfault at a ip 00007f985df018e5 sp 00007f97ed940b80 error 4 in libjansson.so.4.5.0[7f985defa000+a000]
> {noformat}
> I will attach the coredumps/logs and important information for this crash for asterisk 13.9.1 and asterisk 13.11.2 if that helps, if you need i enable or repeat the crash with an extra cflag let me know this could be reproduce in 20 min and the most interesting part of this crash is if i compile with {{MALLOC_DEBUG}} I’m unable to make it crash again ( at least in 24h test window).



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list