[asterisk-bugs] [JIRA] (ASTERISK-25941) chan_pjsip: Crash on an immediate SIP final response

Javier Riveros (JIRA) noreply at issues.asterisk.org
Mon May 16 13:53:56 CDT 2016


    [ https://issues.asterisk.org/jira/browse/ASTERISK-25941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=230613#comment-230613 ] 

Javier Riveros  commented on ASTERISK-25941:
--------------------------------------------

Hey Mark thanks for taking a look to this i just try this diff with asterisk 13.9.1 and i was able to reproduce again and get a crash.

{noformat}
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: Function session_inv_on_state_changed called on event TSX_STATE
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: The state change pertains to the endpoint 'sip-standard(PJSIP/sip-standard-00000177)'
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: The inv session still has an invite_tsx (0x7fcb9805a5a8)
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: There is no transaction involved in this state change
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: The current inv state is DISCONNCTD
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: Source of transaction state change is RX_MSG
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: Received response
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: Response is 486 Busy Here
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: Function session_inv_on_tsx_state_changed called on event TSX_STATE
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: The state change pertains to the endpoint 'sip-standard(PJSIP/sip-standard-00000177)'
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: The inv session still has an invite_tsx (0x7fcb9805a5a8)
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: The UAC INVITE transaction involved in this state change is 0x7fcb9805a5a8
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: The current transaction state is Completed
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: The transaction state change event is RX_MSG
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: The current inv state is DISCONNCTD
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: Received response
[May 16 17:57:47] DEBUG[26658] res_pjsip_session.c: Response is 486 Busy Here
[May 16 17:57:47] DEBUG[2577] channel.c: Hanging up channel 'PJSIP/sip-standard-00000177'
[May 16 17:58:22] Asterisk 13.9.1 built by root @ ip-172-19-4-82 on a x86_64 running Linux on 2016-05-16 16:04:27 UTC
{noformat}

I will attach the coredumps and full logs you will se something like this.

{code}
#0  0x00007fcba5f22cc9 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fcba5f260d8 in __GI_abort () at abort.c:89
#2  0x00007fcba5f1bb86 in __assert_fail_base (fmt=0x7fcba606c830 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion at entry=0x7fcba6d30a5c "mutex->__data.__owner == 0", file=file at entry=0x7fcba6d30a28 "../nptl/pthread_mutex_lock.c",
    line=line at entry=116, function=function at entry=0x7fcba6d30b60 <__PRETTY_FUNCTION__.8500> "__pthread_mutex_lock") at assert.c:92
#3  0x00007fcba5f1bc32 in __GI___assert_fail (assertion=0x7fcba6d30a5c "mutex->__data.__owner == 0", file=0x7fcba6d30a28 "../nptl/pthread_mutex_lock.c",
    line=116, function=0x7fcba6d30b60 <__PRETTY_FUNCTION__.8500> "__pthread_mutex_lock") at assert.c:101
#4  0x00007fcba6d2852f in __GI___pthread_mutex_lock (mutex=0xfefefefefefefe00) at ../nptl/pthread_mutex_lock.c:116
#5  0x0000000000539d25 in __ast_pthread_mutex_lock (filename=0x6324eb "channel.c", lineno=1036,
    func=0x6379e0 <__PRETTY_FUNCTION__.18225> "__ast_queue_frame", mutex_name=0x6336f0 "chan", t=0x7fcb94d36780) at lock.c:315
#6  0x000000000045d089 in __ao2_lock (user_data=0x7fcb94d367d8, lock_how=AO2_LOCK_REQ_MUTEX, file=0x6324eb "channel.c",
    func=0x6379e0 <__PRETTY_FUNCTION__.18225> "__ast_queue_frame", line=1036, var=0x6336f0 "chan") at astobj2.c:167
#7  0x00000000004ae8df in __ast_queue_frame (chan=0x7fcb94d367d8, fin=0x7fcb680d47d0, head=0, after=0x0) at channel.c:1036
#8  0x00000000004af0e8 in ast_queue_frame (chan=0x7fcb94d367d8, fin=0x7fcb680d47d0) at channel.c:1145
#9  0x00000000004af6d4 in ast_queue_control_data (chan=0x7fcb94d367d8, control=AST_CONTROL_PVT_CAUSE_CODE, data=0x7fcb680d48a0, datalen=109)
    at channel.c:1242
#10 0x00007fcb587a9c20 in chan_pjsip_incoming_response (session=0x7fcb9c671608, rdata=<optimized out>) at chan_pjsip.c:2319
#11 0x00007fcb69656af0 in handle_incoming_response (type=<optimized out>, response_priority=AST_SIP_SESSION_AFTER_MEDIA, rdata=0x7fcb9821d538,
    session=0x7fcb9c671608) at res_pjsip_session.c:2385
#12 handle_incoming (session=session at entry=0x7fcb9c671608, rdata=0x7fcb9821d538, response_priority=response_priority at entry=AST_SIP_SESSION_AFTER_MEDIA,
    type=<optimized out>) at res_pjsip_session.c:2402
#13 0x00007fcb69658788 in session_inv_on_tsx_state_changed (inv=0x7fcb9c462af8, tsx=0x7fcb9805a5a8, e=0x7fcb680d4b00) at res_pjsip_session.c:2569
{code}


> chan_pjsip: Crash on an immediate SIP final response
> ----------------------------------------------------
>
>                 Key: ASTERISK-25941
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25941
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_ari, Resources/res_pjsip
>    Affects Versions: 13.8.0, 13.8.1, 13.8.2, 13.9.0
>            Reporter: Javier Riveros 
>         Attachments: backtrace.txt, full_debug, gdb.txt, kern.log, session_end_response.diff
>
>
> *Description*
> I start experience constant asterisk crashes on our pro asterisk boxes seems like when the box is a little busy (+10 calls) asterisk is more susceptible.
> So the issue seems to be for outbound/inbound calls, i see this more frequent in outbound calls. Doing a outbound call into the stasis app and get a fast busy signal causes race condition opening the door to  crash occur.
> 1) outbound call  endpoint -> asterisk -> trunk.
> 2) trunk (486 Busy Here) -> asterisk   (crash).
> You will see a segfault like this :
> {noformat}
> 2016-04-19 18:41:30.783 | info | kernel[-] |[436624.469629] asterisk[3737]: segfault at c0 ip 00000000004ce35b sp 00007f00ee7446d0 error 4 in asterisk[400000+2bc000]
> {noformat}
> And a coredumps like this:
> {noformat}
> #0  0x00000000004ce35b in ast_channel_dialed_causes_add (chan=0x0, cause_code=0x7f96ba940860, datalen=109) at channel_internal_api.c:1448
>         ao2_cause_code = 0x0
> #1  0x00000000004b7423 in ast_channel_hangupcause_hash_set (chan=0x0, cause_code=0x7f96ba940860, datalen=109) at channel.c:4356
>         causevar = '\000' <repeats 16 times>, "@\t\224\272\226\177\000\000X\264\f\354\226\177\000\000p\a\224\272\226\177\000\000\236\341J\000\000\000\000\000\240\a\224\272\226\177\000\000h\316\f\354\226\177\000\000@\b\224\272\226\177\000\000\212\347J\000\000\000\000\000m\000\000\000\000\000\000\000`\b\224\272\226\177\000\000\260\a\224\272!\000\000\000h\316\f\354\226\177\000\000\004\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "m", '\000' <repeats 39 times>, "`\b\224\272\226\177", '\000' <repeats 59 times>...
>         __PRETTY_FUNCTION__ = "ast_channel_hangupcause_hash_set"
> #2  0x00007f96c35293cf in chan_pjsip_incoming_response (session=0x7f96ec0cb458, rdata=<optimized out>) at chan_pjsip.c:2319
>         status = {code = 486, reason = {ptr = <optimized out>, slen = 9}}
>         cause_code = 0x7f96ba940860
>         data_size = 109
>         __PRETTY_FUNCTION__ = "chan_pjsip_incoming_response"
> #3  0x00007f96e33cf846 in handle_incoming_response (type=<optimized out>, response_priority=AST_SIP_SESSION_AFTER_MEDIA, rdata=0x7f97049443d8,
>     session=0x7f96ec0cb458) at res_pjsip_session.c:2323
>         supplement = 0x7f96ec0ccab0
>         status = {code = <optimized out>, reason = {ptr = 0x7f9704945740 "Busy HereV\224\004\227\177", slen = 9}}
> #4  handle_incoming (session=session at entry=0x7f96ec0cb458, rdata=0x7f97049443d8, response_priority=response_priority at entry=AST_SIP_SESSION_AFTER_MEDIA,
>     type=<optimized out>) at res_pjsip_session.c:2337
> No locals.
> #5  0x00007f96e33d15e8 in session_inv_on_tsx_state_changed (inv=0x7f96ec0cb218, tsx=0x7f9718010868, e=0x7f96ba940ad0) at res_pjsip_session.c:2512
>         cb = 0x0
>         session = 0x7f96ec0cb458
>         tdata = 0x7f96ba940ad0
>         __PRETTY_FUNCTION__ = "session_inv_on_tsx_state_changed"
> #6  0x00007f972448b2d4 in mod_inv_on_tsx_state (tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip-ua/sip_inv.c:699
>         dlg = 0x7f9718048e08
>         inv = 0x7f96ec0cb218
> #7  0x00007f97244d0fb0 in pjsip_dlg_on_tsx_state (dlg=0x7f9718048e08, tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip/sip_dialog.c:2013
>         i = 2
> #8  0x00007f97244d17c0 in mod_ua_on_tsx_state (tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip/sip_ua_layer.c:178
>         dlg = 0x7f9718048e08
> #9  0x00007f97244ca70c in tsx_set_state (tsx=0x7f9718010868, state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_RX_MSG,
>     event_src=0x7f97049443d8) at ../src/pjsip/sip_transaction.c:1213
>         e = {prev = 0x7f9704004ef0, next = 0x7f97040050c8, type = PJSIP_EVENT_TSX_STATE, body = {timer = {entry = 0x7f97049443d8}, tsx_state = {src = {
>                 rdata = 0x7f97049443d8, tdata = 0x7f97049443d8, timer = 0x7f97049443d8, status = 76825560, data = 0x7f97049443d8},
>               tsx = 0x7f9718010868, prev_state = 3, type = PJSIP_EVENT_RX_MSG}, tx_msg = {tdata = 0x7f97049443d8}, tx_error = {tdata = 0x7f97049443d8,
>               tsx = 0x7f9718010868}, rx_msg = {rdata = 0x7f97049443d8}, user = {user1 = 0x7f97049443d8, user2 = 0x7f9718010868, user3 = 0x300000003,
>               user4 = 0x7f97049443d8}}}
>         prev_state = PJSIP_TSX_STATE_PROCEEDING
> ....
> {noformat}
> *To reproduce*
> I was able to reproduce this easy i just do the following. with two asterisk boxes one acting as a carrier/trunk "Asterisk A" and another with my basic stasis app "Asterisk B".
> 1) on "Asterisk A" i just create a simple dialplan that response 486 Busy or another 4XX code, this simulating our carrier given a fast 486 busy signal.
> {code}
> exten => _[+][1][7]X.,1,Noop("test")
>  same => n,Busy()
>  same => n,Hangup()
> exten => _[+][1][8]X.,1,Noop("test")
>  same => n,Dial(SIP/1002)  ; this endpoint isn't register
>  same => n,Hangup()
> {code} 
> 2) on "Asterisk B" in other to add a little of load i just stress the box a little bit until load average were like 2 "normal busy load average" {{sudo stress -d 2 --hdd-bytes 512M}}
> 3) on "Asterisk B" once the the box is busy i just make couple of outbound calls and asterisk crash, "this simulate an outbound call to a carrier that response a fast busy signal"
> There is some external ways i'm workaround this but please asterisk don't let you crash in fast SIP ERROR codes :).
> Attached are the coredumps and debug logs anything else is needed let me know.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list