[asterisk-bugs] [JIRA] (ASTERISK-25941) ARI crash on a fast SIP ERROR response 4XX, 6XX, 5XX

Javier Riveros (JIRA) noreply at issues.asterisk.org
Tue Apr 19 15:04:56 CDT 2016


Javier Riveros  created ASTERISK-25941:
------------------------------------------

             Summary: ARI crash on a fast SIP ERROR response 4XX, 6XX, 5XX
                 Key: ASTERISK-25941
                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25941
             Project: Asterisk
          Issue Type: Bug
      Security Level: None
          Components: Resources/res_ari, Resources/res_pjsip
    Affects Versions: 13.8.1, 13.8.0
            Reporter: Javier Riveros 


*Description*
I start experience constant asterisk crashes on our pro asterisk boxes seems like when the box is a little busy (+10 calls) asterisk is more susceptible.

So the issue seems to be for outbound/inbound calls, i see this more frequent in outbound calls. Doing a outbound call into the stasis app and get a fast busy signal causes race condition opening the door to  crash occur.

1) outbound call  endpoint -> asterisk -> trunk.
2) trunk (486 Busy Here) -> asterisk   (crash).

You will see a segfault like this :
{noformat}
2016-04-19 18:41:30.783 | info | kernel[-] |[436624.469629] asterisk[3737]: segfault at c0 ip 00000000004ce35b sp 00007f00ee7446d0 error 4 in asterisk[400000+2bc000]
{noformat}

And a coredumps like this:

{noformat}
#0  0x00000000004ce35b in ast_channel_dialed_causes_add (chan=0x0, cause_code=0x7f96ba940860, datalen=109) at channel_internal_api.c:1448
        ao2_cause_code = 0x0
#1  0x00000000004b7423 in ast_channel_hangupcause_hash_set (chan=0x0, cause_code=0x7f96ba940860, datalen=109) at channel.c:4356
        causevar = '\000' <repeats 16 times>, "@\t\224\272\226\177\000\000X\264\f\354\226\177\000\000p\a\224\272\226\177\000\000\236\341J\000\000\000\000\000\240\a\224\272\226\177\000\000h\316\f\354\226\177\000\000@\b\224\272\226\177\000\000\212\347J\000\000\000\000\000m\000\000\000\000\000\000\000`\b\224\272\226\177\000\000\260\a\224\272!\000\000\000h\316\f\354\226\177\000\000\004\000\000\000\000\000\000\000!", '\000' <repeats 23 times>, "m", '\000' <repeats 39 times>, "`\b\224\272\226\177", '\000' <repeats 59 times>...
        __PRETTY_FUNCTION__ = "ast_channel_hangupcause_hash_set"
#2  0x00007f96c35293cf in chan_pjsip_incoming_response (session=0x7f96ec0cb458, rdata=<optimized out>) at chan_pjsip.c:2319
        status = {code = 486, reason = {ptr = <optimized out>, slen = 9}}
        cause_code = 0x7f96ba940860
        data_size = 109
        __PRETTY_FUNCTION__ = "chan_pjsip_incoming_response"
#3  0x00007f96e33cf846 in handle_incoming_response (type=<optimized out>, response_priority=AST_SIP_SESSION_AFTER_MEDIA, rdata=0x7f97049443d8,
    session=0x7f96ec0cb458) at res_pjsip_session.c:2323
        supplement = 0x7f96ec0ccab0
        status = {code = <optimized out>, reason = {ptr = 0x7f9704945740 "Busy HereV\224\004\227\177", slen = 9}}
#4  handle_incoming (session=session at entry=0x7f96ec0cb458, rdata=0x7f97049443d8, response_priority=response_priority at entry=AST_SIP_SESSION_AFTER_MEDIA,
    type=<optimized out>) at res_pjsip_session.c:2337
No locals.
#5  0x00007f96e33d15e8 in session_inv_on_tsx_state_changed (inv=0x7f96ec0cb218, tsx=0x7f9718010868, e=0x7f96ba940ad0) at res_pjsip_session.c:2512
        cb = 0x0
        session = 0x7f96ec0cb458
        tdata = 0x7f96ba940ad0
        __PRETTY_FUNCTION__ = "session_inv_on_tsx_state_changed"
#6  0x00007f972448b2d4 in mod_inv_on_tsx_state (tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip-ua/sip_inv.c:699
        dlg = 0x7f9718048e08
        inv = 0x7f96ec0cb218
#7  0x00007f97244d0fb0 in pjsip_dlg_on_tsx_state (dlg=0x7f9718048e08, tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip/sip_dialog.c:2013
        i = 2
#8  0x00007f97244d17c0 in mod_ua_on_tsx_state (tsx=0x7f9718010868, e=0x7f96ba940ad0) at ../src/pjsip/sip_ua_layer.c:178
        dlg = 0x7f9718048e08
#9  0x00007f97244ca70c in tsx_set_state (tsx=0x7f9718010868, state=PJSIP_TSX_STATE_COMPLETED, event_src_type=PJSIP_EVENT_RX_MSG,
    event_src=0x7f97049443d8) at ../src/pjsip/sip_transaction.c:1213
        e = {prev = 0x7f9704004ef0, next = 0x7f97040050c8, type = PJSIP_EVENT_TSX_STATE, body = {timer = {entry = 0x7f97049443d8}, tsx_state = {src = {
                rdata = 0x7f97049443d8, tdata = 0x7f97049443d8, timer = 0x7f97049443d8, status = 76825560, data = 0x7f97049443d8},
              tsx = 0x7f9718010868, prev_state = 3, type = PJSIP_EVENT_RX_MSG}, tx_msg = {tdata = 0x7f97049443d8}, tx_error = {tdata = 0x7f97049443d8,
              tsx = 0x7f9718010868}, rx_msg = {rdata = 0x7f97049443d8}, user = {user1 = 0x7f97049443d8, user2 = 0x7f9718010868, user3 = 0x300000003,
              user4 = 0x7f97049443d8}}}
        prev_state = PJSIP_TSX_STATE_PROCEEDING

....
{noformat}

*To reproduce*
I was able to reproduce this easy i just do the following. with two asterisk boxes one acting as a carrier/trunk "Asterisk A" and another with my basic stasis app "Asterisk B".

1) on "Asterisk A" i just create a simple dialplan that response 486 Busy or another 4XX code, this simulating our carrier given a fast 486 busy signal.

{code}
exten => _[+][1][7]X.,1,Noop("test")
 same => n,Busy()
 same => n,Hangup()

exten => _[+][1][8]X.,1,Noop("test")
 same => n,Dial(SIP/1002)  ; this endpoint isn't register
 same => n,Hangup()
{code} 

2) on "Asterisk B" in other to add a little of load i just stress the box a little bit until load average were like 2 "normal busy load average" {{sudo stress -d 2 --hdd-bytes 512M}}

3) on "Asterisk B" once the the box is busy i just make couple of outbound calls and asterisk crash, "this simulate an outbound call to a carrier that response a fast busy signal"

There is some external ways i'm workaround this but please asterisk don't let you crash in fast SIP ERROR codes :).

Attached are the coredumps and debug logs anything else is needed let me know.

 



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list