[asterisk-bugs] [JIRA] (ASTERISK-25941) chan_pjsip: Crash on an immediate SIP final response

Mark Michelson (JIRA) noreply at issues.asterisk.org
Wed May 11 12:53:56 CDT 2016


     [ https://issues.asterisk.org/jira/browse/ASTERISK-25941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Michelson updated ASTERISK-25941:
--------------------------------------

    Attachment: session_end_response.diff

I am attaching session_end_response.diff to the issue.

I believe the problem here is that there is a race condition caused by the fact that the chan_pjsip channel driver is told that the session has ended, and then is later given the 486 response to handle. This change makes it so that the 486 gets handled first, then the session is ended. Please test this and see if you have the same problem.

> chan_pjsip: Crash on an immediate SIP final response
> ----------------------------------------------------
>
>                 Key: ASTERISK-25941
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25941
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_ari, Resources/res_pjsip
>    Affects Versions: 13.8.0, 13.8.1, 13.8.2, 13.9.0
>            Reporter: Javier Riveros 
>         Attachments: backtrace.txt, full_debug, gdb.txt, kern.log, session_end_response.diff
>
>
> *Description*
> I start experience constant asterisk crashes on our pro asterisk boxes seems like when the box is a little busy (+10 calls) asterisk is more susceptible.
> So the issue seems to be for outbound/inbound calls, i see this more frequent in outbound calls. Doing a outbound call into the stasis app and get a fast busy signal causes race condition opening the door to  crash occur.
> 1) outbound call  endpoint -> asterisk -> trunk.
> 2) trunk (486 Busy Here) -> asterisk   (crash).
> You will see a segfault like this :
> {noformat}
> 2016-04-19 18:41:30.783 | info | kernel[-] |[436624.469629] asterisk[3737]: segfault at c0 ip 00000000004ce35b sp 00007f00ee7446d0 error 4 in asterisk[400000+2bc000]
> {noformat}
> And a coredumps like this:
> {noformat}
> #0  0x00000000004ce35b in ast_channel_dialed_causes_add (chan=0x0, cause_code=0x7f96ba940860, datalen=109) at channel_internal_api.c:1448
>         ao2_cause_code = 0x0
> #1  0x00000000004b7423 in ast_channel_hangupcause_hash_set (chan=0x0, cause_code=0x7f96ba940860, datalen=109) at channel.c:4356
>         causevar = '\000' <repeats 16 times>, "@\t\224\272\226\177\000\000X\264\f\354\226\177\000\000p\a\224\272\226\177\000\000\236\341J\000\000\000\000\000\240\a\224\272\226\177\000\000h\316\f\354\226\177\000\000@\b\224\272\226\177\000\000\212\347J\000\000\000\000\000m\000\000\000\000\000\000\000`\b\224\272\226\177\000\000\260\a\224\272!\000\000\000h\316\f\354\226\177\000\000\004\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "m", '\000' <repeats 39 times>, "`\b\224\272\226\177", '\000' <repeats 59 times>...
>         __PRETTY_FUNCTION__ = "ast_channel_hangupcause_hash_set"
> #2  0x00007f96c35293cf in chan_pjsip_incoming_response (session=0x7f96ec0cb458, rdata=<optimized out>) at chan_pjsip.c:2319
>         status = {code = 486, reason = {ptr = <optimized out>, slen = 9}}
>         cause_code = 0x7f96ba940860
>         data_size = 109
>         __PRETTY_FUNCTION__ = "chan_pjsip_incoming_response"
> #3  0x00007f96e33cf846 in handle_incoming_response (type=<optimized out>, response_priority=AST_SIP_SESSION_AFTER_MEDIA, rdata=0x7f97049443d8,
>     session=0x7f96ec0cb458) at res_pjsip_session.c:2323
>         supplement = 0x7f96ec0ccab0
>         status = {code = <optimized out>, reason = {ptr = 0x7f9704945740 "Busy HereV\224\004\227\177", slen = 9}}
> #4  handle_incoming (session=session at entry=0x7f96ec0cb458, rdata=0x7f97049443d8, response_priority=response_priority at entry=AST_SIP_SESSION_AFTER_MEDIA,
>     type=<optimized out>) at res_pjsip_session.c:2337
> No locals.
> #5  0x00007f96e33d15e8 in session_inv_on_tsx_state_changed (inv=0x7f96ec0cb218, tsx=0x7f9718010868, e=0x7f96ba940ad0) at res_pjsip_session.c:2512
>         cb = 0x0
>         session = 0x7f96ec0cb458
>         tdata = 0x7f96ba940ad0
>         __PRETTY_FUNCTION__ = "session_inv_on_tsx_state_changed"
> #6  0x00007f972448b2d4 in mod_inv_on_tsx_state (tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip-ua/sip_inv.c:699
>         dlg = 0x7f9718048e08
>         inv = 0x7f96ec0cb218
> #7  0x00007f97244d0fb0 in pjsip_dlg_on_tsx_state (dlg=0x7f9718048e08, tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip/sip_dialog.c:2013
>         i = 2
> #8  0x00007f97244d17c0 in mod_ua_on_tsx_state (tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip/sip_ua_layer.c:178
>         dlg = 0x7f9718048e08
> #9  0x00007f97244ca70c in tsx_set_state (tsx=0x7f9718010868, state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_RX_MSG,
>     event_src=0x7f97049443d8) at ../src/pjsip/sip_transaction.c:1213
>         e = {prev = 0x7f9704004ef0, next = 0x7f97040050c8, type = PJSIP_EVENT_TSX_STATE, body = {timer = {entry = 0x7f97049443d8}, tsx_state = {src = {
>                 rdata = 0x7f97049443d8, tdata = 0x7f97049443d8, timer = 0x7f97049443d8, status = 76825560, data = 0x7f97049443d8},
>               tsx = 0x7f9718010868, prev_state = 3, type = PJSIP_EVENT_RX_MSG}, tx_msg = {tdata = 0x7f97049443d8}, tx_error = {tdata = 0x7f97049443d8,
>               tsx = 0x7f9718010868}, rx_msg = {rdata = 0x7f97049443d8}, user = {user1 = 0x7f97049443d8, user2 = 0x7f9718010868, user3 = 0x300000003,
>               user4 = 0x7f97049443d8}}}
>         prev_state = PJSIP_TSX_STATE_PROCEEDING
> ....
> {noformat}
> *To reproduce*
> I was able to reproduce this easy i just do the following. with two asterisk boxes one acting as a carrier/trunk "Asterisk A" and another with my basic stasis app "Asterisk B".
> 1) on "Asterisk A" i just create a simple dialplan that response 486 Busy or another 4XX code, this simulating our carrier given a fast 486 busy signal.
> {code}
> exten => _[+][1][7]X.,1,Noop("test")
>  same => n,Busy()
>  same => n,Hangup()
> exten => _[+][1][8]X.,1,Noop("test")
>  same => n,Dial(SIP/1002)  ; this endpoint isn't register
>  same => n,Hangup()
> {code} 
> 2) on "Asterisk B" in other to add a little of load i just stress the box a little bit until load average were like 2 "normal busy load average" {{sudo stress -d 2 --hdd-bytes 512M}}
> 3) on "Asterisk B" once the the box is busy i just make couple of outbound calls and asterisk crash, "this simulate an outbound call to a carrier that response a fast busy signal"
> There is some external ways i'm workaround this but please asterisk don't let you crash in fast SIP ERROR codes :).
> Attached are the coredumps and debug logs anything else is needed let me know.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list