[asterisk-bugs] [JIRA] (ASTERISK-25941) ARI crash on a fast SIP ERROR response 4XX, 6XX, 5XX
Javier Riveros (JIRA)
noreply at issues.asterisk.org
Tue Apr 19 15:04:56 CDT 2016
Javier Riveros created ASTERISK-25941:
------------------------------------------
Summary: ARI crash on a fast SIP ERROR response 4XX, 6XX, 5XX
Key: ASTERISK-25941
URL: https://issues.asterisk.org/jira/browse/ASTERISK-25941
Project: Asterisk
Issue Type: Bug
Security Level: None
Components: Resources/res_ari, Resources/res_pjsip
Affects Versions: 13.8.1, 13.8.0
Reporter: Javier Riveros
*Description*
I start experience constant asterisk crashes on our pro asterisk boxes seems like when the box is a little busy (+10 calls) asterisk is more susceptible.
So the issue seems to be for outbound/inbound calls, i see this more frequent in outbound calls. Doing a outbound call into the stasis app and get a fast busy signal causes race condition opening the door to crash occur.
1) outbound call endpoint -> asterisk -> trunk.
2) trunk (486 Busy Here) -> asterisk (crash).
You will see a segfault like this :
{noformat}
2016-04-19 18:41:30.783 | info | kernel[-] |[436624.469629] asterisk[3737]: segfault at c0 ip 00000000004ce35b sp 00007f00ee7446d0 error 4 in asterisk[400000+2bc000]
{noformat}
And a coredumps like this:
{noformat}
#0 0x00000000004ce35b in ast_channel_dialed_causes_add (chan=0x0, cause_code=0x7f96ba940860, datalen=109) at channel_internal_api.c:1448
ao2_cause_code = 0x0
#1 0x00000000004b7423 in ast_channel_hangupcause_hash_set (chan=0x0, cause_code=0x7f96ba940860, datalen=109) at channel.c:4356
causevar = '\000' <repeats 16 times>, "@\t\224\272\226\177\000\000X\264\f\354\226\177\000\000p\a\224\272\226\177\000\000\236\341J\000\000\000\000\000\240\a\224\272\226\177\000\000h\316\f\354\226\177\000\000@\b\224\272\226\177\000\000\212\347J\000\000\000\000\000m\000\000\000\000\000\000\000`\b\224\272\226\177\000\000\260\a\224\272!\000\000\000h\316\f\354\226\177\000\000\004\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "m", '\000' <repeats 39 times>, "`\b\224\272\226\177", '\000' <repeats 59 times>...
__PRETTY_FUNCTION__ = "ast_channel_hangupcause_hash_set"
#2 0x00007f96c35293cf in chan_pjsip_incoming_response (session=0x7f96ec0cb458, rdata=<optimized out>) at chan_pjsip.c:2319
status = {code = 486, reason = {ptr = <optimized out>, slen = 9}}
cause_code = 0x7f96ba940860
data_size = 109
__PRETTY_FUNCTION__ = "chan_pjsip_incoming_response"
#3 0x00007f96e33cf846 in handle_incoming_response (type=<optimized out>, response_priority=AST_SIP_SESSION_AFTER_MEDIA, rdata=0x7f97049443d8,
session=0x7f96ec0cb458) at res_pjsip_session.c:2323
supplement = 0x7f96ec0ccab0
status = {code = <optimized out>, reason = {ptr = 0x7f9704945740 "Busy HereV\224\004\227\177", slen = 9}}
#4 handle_incoming (session=session at entry=0x7f96ec0cb458, rdata=0x7f97049443d8, response_priority=response_priority at entry=AST_SIP_SESSION_AFTER_MEDIA,
type=<optimized out>) at res_pjsip_session.c:2337
No locals.
#5 0x00007f96e33d15e8 in session_inv_on_tsx_state_changed (inv=0x7f96ec0cb218, tsx=0x7f9718010868, e=0x7f96ba940ad0) at res_pjsip_session.c:2512
cb = 0x0
session = 0x7f96ec0cb458
tdata = 0x7f96ba940ad0
__PRETTY_FUNCTION__ = "session_inv_on_tsx_state_changed"
#6 0x00007f972448b2d4 in mod_inv_on_tsx_state (tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip-ua/sip_inv.c:699
dlg = 0x7f9718048e08
inv = 0x7f96ec0cb218
#7 0x00007f97244d0fb0 in pjsip_dlg_on_tsx_state (dlg=0x7f9718048e08, tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip/sip_dialog.c:2013
i = 2
#8 0x00007f97244d17c0 in mod_ua_on_tsx_state (tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip/sip_ua_layer.c:178
dlg = 0x7f9718048e08
#9 0x00007f97244ca70c in tsx_set_state (tsx=0x7f9718010868, state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_RX_MSG,
event_src=0x7f97049443d8) at ../src/pjsip/sip_transaction.c:1213
e = {prev = 0x7f9704004ef0, next = 0x7f97040050c8, type = PJSIP_EVENT_TSX_STATE, body = {timer = {entry = 0x7f97049443d8}, tsx_state = {src = {
rdata = 0x7f97049443d8, tdata = 0x7f97049443d8, timer = 0x7f97049443d8, status = 76825560, data = 0x7f97049443d8},
tsx = 0x7f9718010868, prev_state = 3, type = PJSIP_EVENT_RX_MSG}, tx_msg = {tdata = 0x7f97049443d8}, tx_error = {tdata = 0x7f97049443d8,
tsx = 0x7f9718010868}, rx_msg = {rdata = 0x7f97049443d8}, user = {user1 = 0x7f97049443d8, user2 = 0x7f9718010868, user3 = 0x300000003,
user4 = 0x7f97049443d8}}}
prev_state = PJSIP_TSX_STATE_PROCEEDING
....
{noformat}
*To reproduce*
I was able to reproduce this easy i just do the following. with two asterisk boxes one acting as a carrier/trunk "Asterisk A" and another with my basic stasis app "Asterisk B".
1) on "Asterisk A" i just create a simple dialplan that response 486 Busy or another 4XX code, this simulating our carrier given a fast 486 busy signal.
{code}
exten => _[+][1][7]X.,1,Noop("test")
same => n,Busy()
same => n,Hangup()
exten => _[+][1][8]X.,1,Noop("test")
same => n,Dial(SIP/1002) ; this endpoint isn't register
same => n,Hangup()
{code}
2) on "Asterisk B" in other to add a little of load i just stress the box a little bit until load average were like 2 "normal busy load average" {{sudo stress -d 2 --hdd-bytes 512M}}
3) on "Asterisk B" once the the box is busy i just make couple of outbound calls and asterisk crash, "this simulate an outbound call to a carrier that response a fast busy signal"
There is some external ways i'm workaround this but please asterisk don't let you crash in fast SIP ERROR codes :).
Attached are the coredumps and debug logs anything else is needed let me know.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list