[asterisk-bugs] [JIRA] (ASTERISK-26344) Asterisk 13.11.0 + PJSIP crash

Richard Mudgett (JIRA) noreply at issues.asterisk.org
Mon Oct 17 18:10:04 CDT 2016


    [ https://issues.asterisk.org/jira/browse/ASTERISK-26344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=232755#comment-232755 ] 

Richard Mudgett commented on ASTERISK-26344:
--------------------------------------------

Both ASTERISK-26387 and ASTERISK-26344 are likely the same issue.  I have been studying the logs from both issues.  I have noticed that on the ASTERISK-26387 logs there are OPTIONS ping responses for endpoint qualification that are being processed by a different serializer than the request.  This can cause reentrancy problems (e.g. crashes).  The outgoing OPTIONS requests go out on a pjsip/default serializer while the response is processed by a pjsip/distributor serializer because the distributor cannot find the original serializer that sent the request.  I also noticed that when this happened the updated contact status was reporting for an endpoint that needed DNS resolution (sip:sbc.anveno.com was one).  On the ASTERISK-26344 logs a similar thing is happening but I see it for outbound registration requests needing DNS resolution.  The REGISTER response is being processed by a pjsip/distributor serializer while the request went out on a pjsip/outreg serializer.

> Asterisk 13.11.0 + PJSIP crash
> ------------------------------
>
>                 Key: ASTERISK-26344
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26344
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: pjproject/pjsip
>    Affects Versions: 13.11.0
>         Environment: Centos 6.8 (64-bit) + Asterisk-13.11.0 (with bundled 2.5.5 pjsip) + libsrtp-1.5.4
>            Reporter: Ian Gilmour
>            Assignee: Richard Mudgett
>         Attachments: 0001-r5400-pjsip_tx_data_dec_ref.patch, cli-and-gdb-3-crashes.tgz, cli-and-gdb-bt-on-destroying_tx_data.tgz, cli-and-gdb-inc-dec-ref-logging.tgz, cli-and-gdb.tgz, cli-and-gdb.tgz
>
>
> Hi,
> I have a development Asterisk 13.11.0 test setup (uses the bundled pjsip-2.5.5).
> Environment is Centos 6.8 (64-bit) + Asterisk-13.11.0 + libsrtp-1.5.4.
> On startup Asterisk registers 5 Asterisk users with a remote OpenSIPS server, over TLS, using PJSIP. As part of the test all 5 Asterisk PJSIP users are reregistered with OpenSIPS Server every couple of mins.
> All outgoing/incoming pjsip call media is encrypted using SRTP and via an external RTPPROXY running alongside the external OpenSIPS Server.
> Asterisk is also configured to use chan_sip on 127.0.0.1:5060 to allow calls from a locally run SIPp process. All SIPp calls are TCP+RTP.
> I use SIPp to run multiple concurrent loopback calls (calls vary in
> duration) through Asterisk to the OpenSIPS server and back to an echo() service running on the same Asterisk).
> i.e.
> {noformat}
>   SIPp <-TCP/RTP-> chan_sip <-> chan_pjsip <-TLS/SRTP->
>       OpenSIPS server (+ rtpproxy) <-TLS/SRTP-> chan_pjsip (echo service).
> {noformat}
> Initially I see all chan_pjsip registrations and reregistrations for all 5 PJSIP users go out through a single TCP port. I then start a SIPp test running multiple concurrent calls. At some point into the test the Asterisk PJSIP TCP port gets closed and reopened - when it does so I see Asterisk crash shortly afterwards. Possibly significantly\(?) the time of the crash was around the time one of the PJSIP users should have reregistered after the TCP outgoing port change (The log shows all 5 PJSIP users reregistering after the PJSIP TCP port change, but only 4 of the 5 reregistering twice before the crash).
> {noformat}
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffa2814700 (LWP 7166)]
> __pthread_mutex_lock (mutex=0x44492d6c6c6143) at pthread_mutex_lock.c:50
> 50        unsigned int type = PTHREAD_MUTEX_TYPE (mutex);
> (gdb) bt
> #0  __pthread_mutex_lock (mutex=0x44492d6c6c6143) at pthread_mutex_lock.c:50
> #1  0x00007ffff78e9d9b in pj_mutex_lock (mutex=0x44492d6c6c6143) at ../src/pj/os_core_unix.c:1265
> #2  0x00007ffff78e9e39 in pj_atomic_dec_and_get (atomic_var=0x7fffd8074630) at ../src/pj/os_core_unix.c:962
> #3  0x00007ffff787d7e0 in pjsip_tx_data_dec_ref (tdata=0x7fff8c3bfab8) at ../src/pjsip/sip_transport.c:495
> #4  0x00007ffff788a087 in tsx_shutdown (tsx=0x7fff94060a98) at ../src/pjsip/sip_transaction.c:1062
> #5  0x00007ffff788b4bc in tsx_set_state (tsx=0x7fff94060a98, state=PJSIP_TSX_STATE_DESTROYED, event_src_type=PJSIP_EVENT_TIMER, event_src=0x7fff94060c50, flag=0) at ../src/pjsip/sip_transaction.c:1271
> #6  0x00007ffff788b88e in tsx_on_state_terminated (tsx=<value optimized out>, event=<value optimized out>) at ../src/pjsip/sip_transaction.c:3337
> #7  0x00007ffff788bcd5 in tsx_timer_callback (theap=<value optimized out>, entry=0x7fff94060c50) at ../src/pjsip/sip_transaction.c:1171
> #8  0x00007ffff78fc449 in pj_timer_heap_poll (ht=0x1137950, next_delay=0x7fffa2813d30) at ../src/pj/timer.c:643
> #9  0x00007ffff7875b19 in pjsip_endpt_handle_events2 (endpt=0x1137668, max_timeout=0x7fffa2813d70, p_count=0x0) at ../src/pjsip/sip_endpoint.c:712
> #10 0x00007ffff1320b00 in monitor_thread_exec (endpt=<value optimized out>) at res_pjsip.c:3889
> #11 0x00007ffff78ea5d6 in thread_main (param=0x114dee8) at ../src/pj/os_core_unix.c:541
> #12 0x00007ffff5a8faa1 in start_thread (arg=0x7fffa2814700) at pthread_create.c:301
> #13 0x00007ffff509baad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> (gdb) bt full
> #0  __pthread_mutex_lock (mutex=0x44492d6c6c6143) at pthread_mutex_lock.c:50
>         type = <value optimized out>
>         id = <value optimized out>
> #1  0x00007ffff78e9d9b in pj_mutex_lock (mutex=0x44492d6c6c6143) at ../src/pj/os_core_unix.c:1265
>         status = <value optimized out>
> #2  0x00007ffff78e9e39 in pj_atomic_dec_and_get (atomic_var=0x7fffd8074630) at ../src/pj/os_core_unix.c:962
>         new_value = <value optimized out>
> #3  0x00007ffff787d7e0 in pjsip_tx_data_dec_ref (tdata=0x7fff8c3bfab8) at ../src/pjsip/sip_transport.c:495
> No locals.
> #4  0x00007ffff788a087 in tsx_shutdown (tsx=0x7fff94060a98) at ../src/pjsip/sip_transaction.c:1062
> No locals.
> #5  0x00007ffff788b4bc in tsx_set_state (tsx=0x7fff94060a98, state=PJSIP_TSX_STATE_DESTROYED, event_src_type=PJSIP_EVENT_TIMER, event_src=0x7fff94060c50, flag=0) at ../src/pjsip/sip_transaction.c:1271
>         prev_state = PJSIP_TSX_STATE_TERMINATED
> #6  0x00007ffff788b88e in tsx_on_state_terminated (tsx=<value optimized out>, event=<value optimized out>) at ../src/pjsip/sip_transaction.c:3337
> No locals.
> #7  0x00007ffff788bcd5 in tsx_timer_callback (theap=<value optimized out>, entry=0x7fff94060c50) at ../src/pjsip/sip_transaction.c:1171
>         event = {prev = 0x7fff8c5f4908, next = 0x1bfe, type = PJSIP_EVENT_TIMER, body = {timer = {entry = 0x7fff94060c50}, tsx_state = {src = {rdata = 0x7fff94060c50, tdata = 0x7fff94060c50, timer = 0x7fff94060c50, status = -1811542960, data = 0x7fff94060c50}, 
>               tsx = 0x7fffa2813c90, prev_state = -1568588592, type = 32767}, tx_msg = {tdata = 0x7fff94060c50}, tx_error = {tdata = 0x7fff94060c50, tsx = 0x7fffa2813c90}, rx_msg = {rdata = 0x7fff94060c50}, user = {user1 = 0x7fff94060c50, user2 = 0x7fffa2813c90, 
>               user3 = 0x7fffa2813cd0, user4 = 0x0}}}
>         tsx = 0x7fff94060a98
> #8  0x00007ffff78fc449 in pj_timer_heap_poll (ht=0x1137950, next_delay=0x7fffa2813d30) at ../src/pj/timer.c:643
>         node = 0x7fff94060c50
>         grp_lock = 0x7fffd8000ab8
>         now = {sec = 613363, msec = 925}
>         count = 2
> #9  0x00007ffff7875b19 in pjsip_endpt_handle_events2 (endpt=0x1137668, max_timeout=0x7fffa2813d70, p_count=0x0) at ../src/pjsip/sip_endpoint.c:712
>         timeout = {sec = 0, msec = 0}
>         count = 0
>         net_event_count = 0
>         c = <value optimized out>
> #10 0x00007ffff1320b00 in monitor_thread_exec (endpt=<value optimized out>) at res_pjsip.c:3889
>         delay = {sec = 0, msec = 10}
> #11 0x00007ffff78ea5d6 in thread_main (param=0x114dee8) at ../src/pj/os_core_unix.c:541
>         rec = 0x114dee8
>         result = <value optimized out>
> #12 0x00007ffff5a8faa1 in start_thread (arg=0x7fffa2814700) at pthread_create.c:301
>         __res = <value optimized out>
>         pd = 0x7fffa2814700
>         now = <value optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140735919769344, -4896504223120570676, 140737488337344, 140735919770048, 0, 3, 4896356555646224076, 4896525551845689036}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, 
>               canceltype = 0}}}
>         not_first_call = <value optimized out>
>         pagesize_m1 = <value optimized out>
>         sp = <value optimized out>
>         freesize = <value optimized out>
> #13 0x00007ffff509baad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> No locals.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list