[asterisk-bugs] [JIRA] (ASTERISK-25941) ARI CRASH on a fast SIP ERROR response 4XX, 6XX, 5XX
Javier Riveros (JIRA)
noreply at issues.asterisk.org
Wed Apr 20 12:05:56 CDT 2016
[ https://issues.asterisk.org/jira/browse/ASTERISK-25941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Javier Riveros updated ASTERISK-25941:
---------------------------------------
Affects Version/s: 13.8.2
> ARI CRASH on a fast SIP ERROR response 4XX, 6XX, 5XX
> ----------------------------------------------------
>
> Key: ASTERISK-25941
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-25941
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Resources/res_ari, Resources/res_pjsip
> Affects Versions: 13.8.0, 13.8.1, 13.8.2
> Reporter: Javier Riveros
> Attachments: backtrace.txt, full_debug, gdb.txt, kern.log
>
>
> *Description*
> I start experience constant asterisk crashes on our pro asterisk boxes seems like when the box is a little busy (+10 calls) asterisk is more susceptible.
> So the issue seems to be for outbound/inbound calls, i see this more frequent in outbound calls. Doing a outbound call into the stasis app and get a fast busy signal causes race condition opening the door to crash occur.
> 1) outbound call endpoint -> asterisk -> trunk.
> 2) trunk (486 Busy Here) -> asterisk (crash).
> You will see a segfault like this :
> {noformat}
> 2016-04-19 18:41:30.783 | info | kernel[-] |[436624.469629] asterisk[3737]: segfault at c0 ip 00000000004ce35b sp 00007f00ee7446d0 error 4 in asterisk[400000+2bc000]
> {noformat}
> And a coredumps like this:
> {noformat}
> #0 0x00000000004ce35b in ast_channel_dialed_causes_add (chan=0x0, cause_code=0x7f96ba940860, datalen=109) at channel_internal_api.c:1448
> ao2_cause_code = 0x0
> #1 0x00000000004b7423 in ast_channel_hangupcause_hash_set (chan=0x0, cause_code=0x7f96ba940860, datalen=109) at channel.c:4356
> causevar = '\000' <repeats 16 times>, "@\t\224\272\226\177\000\000X\264\f\354\226\177\000\000p\a\224\272\226\177\000\000\236\341J\000\000\000\000\000\240\a\224\272\226\177\000\000h\316\f\354\226\177\000\000@\b\224\272\226\177\000\000\212\347J\000\000\000\000\000m\000\000\000\000\000\000\000`\b\224\272\226\177\000\000\260\a\224\272!\000\000\000h\316\f\354\226\177\000\000\004\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "m", '\000' <repeats 39 times>, "`\b\224\272\226\177", '\000' <repeats 59 times>...
> __PRETTY_FUNCTION__ = "ast_channel_hangupcause_hash_set"
> #2 0x00007f96c35293cf in chan_pjsip_incoming_response (session=0x7f96ec0cb458, rdata=<optimized out>) at chan_pjsip.c:2319
> status = {code = 486, reason = {ptr = <optimized out>, slen = 9}}
> cause_code = 0x7f96ba940860
> data_size = 109
> __PRETTY_FUNCTION__ = "chan_pjsip_incoming_response"
> #3 0x00007f96e33cf846 in handle_incoming_response (type=<optimized out>, response_priority=AST_SIP_SESSION_AFTER_MEDIA, rdata=0x7f97049443d8,
> session=0x7f96ec0cb458) at res_pjsip_session.c:2323
> supplement = 0x7f96ec0ccab0
> status = {code = <optimized out>, reason = {ptr = 0x7f9704945740 "Busy HereV\224\004\227\177", slen = 9}}
> #4 handle_incoming (session=session at entry=0x7f96ec0cb458, rdata=0x7f97049443d8, response_priority=response_priority at entry=AST_SIP_SESSION_AFTER_MEDIA,
> type=<optimized out>) at res_pjsip_session.c:2337
> No locals.
> #5 0x00007f96e33d15e8 in session_inv_on_tsx_state_changed (inv=0x7f96ec0cb218, tsx=0x7f9718010868, e=0x7f96ba940ad0) at res_pjsip_session.c:2512
> cb = 0x0
> session = 0x7f96ec0cb458
> tdata = 0x7f96ba940ad0
> __PRETTY_FUNCTION__ = "session_inv_on_tsx_state_changed"
> #6 0x00007f972448b2d4 in mod_inv_on_tsx_state (tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip-ua/sip_inv.c:699
> dlg = 0x7f9718048e08
> inv = 0x7f96ec0cb218
> #7 0x00007f97244d0fb0 in pjsip_dlg_on_tsx_state (dlg=0x7f9718048e08, tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip/sip_dialog.c:2013
> i = 2
> #8 0x00007f97244d17c0 in mod_ua_on_tsx_state (tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip/sip_ua_layer.c:178
> dlg = 0x7f9718048e08
> #9 0x00007f97244ca70c in tsx_set_state (tsx=0x7f9718010868, state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_RX_MSG,
> event_src=0x7f97049443d8) at ../src/pjsip/sip_transaction.c:1213
> e = {prev = 0x7f9704004ef0, next = 0x7f97040050c8, type = PJSIP_EVENT_TSX_STATE, body = {timer = {entry = 0x7f97049443d8}, tsx_state = {src = {
> rdata = 0x7f97049443d8, tdata = 0x7f97049443d8, timer = 0x7f97049443d8, status = 76825560, data = 0x7f97049443d8},
> tsx = 0x7f9718010868, prev_state = 3, type = PJSIP_EVENT_RX_MSG}, tx_msg = {tdata = 0x7f97049443d8}, tx_error = {tdata = 0x7f97049443d8,
> tsx = 0x7f9718010868}, rx_msg = {rdata = 0x7f97049443d8}, user = {user1 = 0x7f97049443d8, user2 = 0x7f9718010868, user3 = 0x300000003,
> user4 = 0x7f97049443d8}}}
> prev_state = PJSIP_TSX_STATE_PROCEEDING
> ....
> {noformat}
> *To reproduce*
> I was able to reproduce this easy i just do the following. with two asterisk boxes one acting as a carrier/trunk "Asterisk A" and another with my basic stasis app "Asterisk B".
> 1) on "Asterisk A" i just create a simple dialplan that response 486 Busy or another 4XX code, this simulating our carrier given a fast 486 busy signal.
> {code}
> exten => _[+][1][7]X.,1,Noop("test")
> same => n,Busy()
> same => n,Hangup()
> exten => _[+][1][8]X.,1,Noop("test")
> same => n,Dial(SIP/1002) ; this endpoint isn't register
> same => n,Hangup()
> {code}
> 2) on "Asterisk B" in other to add a little of load i just stress the box a little bit until load average were like 2 "normal busy load average" {{sudo stress -d 2 --hdd-bytes 512M}}
> 3) on "Asterisk B" once the the box is busy i just make couple of outbound calls and asterisk crash, "this simulate an outbound call to a carrier that response a fast busy signal"
> There is some external ways i'm workaround this but please asterisk don't let you crash in fast SIP ERROR codes :).
> Attached are the coredumps and debug logs anything else is needed let me know.
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list