[asterisk-bugs] [JIRA] (ASTERISK-26344) Asterisk 13.11.0 + PJSIP crash

Richard Mudgett (JIRA) noreply at issues.asterisk.org
Tue Oct 25 17:45:02 CDT 2016


     [ https://issues.asterisk.org/jira/browse/ASTERISK-26344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Mudgett updated ASTERISK-26344:
---------------------------------------

    Attachment: 0006_jira_asterisk_26344_v13_fix.patch

[^0006_jira_asterisk_26344_v13_fix.patch] - This is an initial attempt to fix the crashes resulting from tdata being unreffed too many times.  Copy this file as is into the third-party/pjproject/patches directory.

> Asterisk 13.11.0 + PJSIP crash
> ------------------------------
>
>                 Key: ASTERISK-26344
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26344
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: pjproject/pjsip
>    Affects Versions: 13.11.0
>         Environment: Centos 6.8 (64-bit) + Asterisk-13.11.0 (with bundled 2.5.5 pjsip) + libsrtp-1.5.4
>            Reporter: Ian Gilmour
>            Assignee: Richard Mudgett
>         Attachments: 0001-r5400-pjsip_tx_data_dec_ref.patch, 0006_jira_asterisk_26344_v13_fix.patch, cli-and-gdb-2016-10-24-crash2.tgz, cli-and-gdb-2016-10-24.tgz, cli-and-gdb-2016-10-25-crash1.tgz, cli-and-gdb-3-crashes.tgz, cli-and-gdb-bt-on-destroying_tx_data.tgz, cli-and-gdb-inc-dec-ref-logging.tgz, cli-and-gdb.tgz, cli-and-gdb.tgz, jira_asterisk_26344_v13_debuging.patch
>
>
> Hi,
> I have a development Asterisk 13.11.0 test setup (uses the bundled pjsip-2.5.5).
> Environment is Centos 6.8 (64-bit) + Asterisk-13.11.0 + libsrtp-1.5.4.
> On startup Asterisk registers 5 Asterisk users with a remote OpenSIPS server, over TLS, using PJSIP. As part of the test all 5 Asterisk PJSIP users are reregistered with OpenSIPS Server every couple of mins.
> All outgoing/incoming pjsip call media is encrypted using SRTP and via an external RTPPROXY running alongside the external OpenSIPS Server.
> Asterisk is also configured to use chan_sip on 127.0.0.1:5060 to allow calls from a locally run SIPp process. All SIPp calls are TCP+RTP.
> I use SIPp to run multiple concurrent loopback calls (calls vary in
> duration) through Asterisk to the OpenSIPS server and back to an echo() service running on the same Asterisk).
> i.e.
> {noformat}
>   SIPp <-TCP/RTP-> chan_sip <-> chan_pjsip <-TLS/SRTP->
>       OpenSIPS server (+ rtpproxy) <-TLS/SRTP-> chan_pjsip (echo service).
> {noformat}
> Initially I see all chan_pjsip registrations and reregistrations for all 5 PJSIP users go out through a single TCP port. I then start a SIPp test running multiple concurrent calls. At some point into the test the Asterisk PJSIP TCP port gets closed and reopened - when it does so I see Asterisk crash shortly afterwards. Possibly significantly\(?) the time of the crash was around the time one of the PJSIP users should have reregistered after the TCP outgoing port change (The log shows all 5 PJSIP users reregistering after the PJSIP TCP port change, but only 4 of the 5 reregistering twice before the crash).
> {noformat}
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffa2814700 (LWP 7166)]
> __pthread_mutex_lock (mutex=0x44492d6c6c6143) at pthread_mutex_lock.c:50
> 50        unsigned int type = PTHREAD_MUTEX_TYPE (mutex);
> (gdb) bt
> #0  __pthread_mutex_lock (mutex=0x44492d6c6c6143) at pthread_mutex_lock.c:50
> #1  0x00007ffff78e9d9b in pj_mutex_lock (mutex=0x44492d6c6c6143) at ../src/pj/os_core_unix.c:1265
> #2  0x00007ffff78e9e39 in pj_atomic_dec_and_get (atomic_var=0x7fffd8074630) at ../src/pj/os_core_unix.c:962
> #3  0x00007ffff787d7e0 in pjsip_tx_data_dec_ref (tdata=0x7fff8c3bfab8) at ../src/pjsip/sip_transport.c:495
> #4  0x00007ffff788a087 in tsx_shutdown (tsx=0x7fff94060a98) at ../src/pjsip/sip_transaction.c:1062
> #5  0x00007ffff788b4bc in tsx_set_state (tsx=0x7fff94060a98, state=PJSIP_TSX_STATE_DESTROYED, event_src_type=PJSIP_EVENT_TIMER, event_src=0x7fff94060c50, flag=0) at ../src/pjsip/sip_transaction.c:1271
> #6  0x00007ffff788b88e in tsx_on_state_terminated (tsx=<value optimized out>, event=<value optimized out>) at ../src/pjsip/sip_transaction.c:3337
> #7  0x00007ffff788bcd5 in tsx_timer_callback (theap=<value optimized out>, entry=0x7fff94060c50) at ../src/pjsip/sip_transaction.c:1171
> #8  0x00007ffff78fc449 in pj_timer_heap_poll (ht=0x1137950, next_delay=0x7fffa2813d30) at ../src/pj/timer.c:643
> #9  0x00007ffff7875b19 in pjsip_endpt_handle_events2 (endpt=0x1137668, max_timeout=0x7fffa2813d70, p_count=0x0) at ../src/pjsip/sip_endpoint.c:712
> #10 0x00007ffff1320b00 in monitor_thread_exec (endpt=<value optimized out>) at res_pjsip.c:3889
> #11 0x00007ffff78ea5d6 in thread_main (param=0x114dee8) at ../src/pj/os_core_unix.c:541
> #12 0x00007ffff5a8faa1 in start_thread (arg=0x7fffa2814700) at pthread_create.c:301
> #13 0x00007ffff509baad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> (gdb) bt full
> #0  __pthread_mutex_lock (mutex=0x44492d6c6c6143) at pthread_mutex_lock.c:50
>         type = <value optimized out>
>         id = <value optimized out>
> #1  0x00007ffff78e9d9b in pj_mutex_lock (mutex=0x44492d6c6c6143) at ../src/pj/os_core_unix.c:1265
>         status = <value optimized out>
> #2  0x00007ffff78e9e39 in pj_atomic_dec_and_get (atomic_var=0x7fffd8074630) at ../src/pj/os_core_unix.c:962
>         new_value = <value optimized out>
> #3  0x00007ffff787d7e0 in pjsip_tx_data_dec_ref (tdata=0x7fff8c3bfab8) at ../src/pjsip/sip_transport.c:495
> No locals.
> #4  0x00007ffff788a087 in tsx_shutdown (tsx=0x7fff94060a98) at ../src/pjsip/sip_transaction.c:1062
> No locals.
> #5  0x00007ffff788b4bc in tsx_set_state (tsx=0x7fff94060a98, state=PJSIP_TSX_STATE_DESTROYED, event_src_type=PJSIP_EVENT_TIMER, event_src=0x7fff94060c50, flag=0) at ../src/pjsip/sip_transaction.c:1271
>         prev_state = PJSIP_TSX_STATE_TERMINATED
> #6  0x00007ffff788b88e in tsx_on_state_terminated (tsx=<value optimized out>, event=<value optimized out>) at ../src/pjsip/sip_transaction.c:3337
> No locals.
> #7  0x00007ffff788bcd5 in tsx_timer_callback (theap=<value optimized out>, entry=0x7fff94060c50) at ../src/pjsip/sip_transaction.c:1171
>         event = {prev = 0x7fff8c5f4908, next = 0x1bfe, type = PJSIP_EVENT_TIMER, body = {timer = {entry = 0x7fff94060c50}, tsx_state = {src = {rdata = 0x7fff94060c50, tdata = 0x7fff94060c50, timer = 0x7fff94060c50, status = -1811542960, data = 0x7fff94060c50}, 
>               tsx = 0x7fffa2813c90, prev_state = -1568588592, type = 32767}, tx_msg = {tdata = 0x7fff94060c50}, tx_error = {tdata = 0x7fff94060c50, tsx = 0x7fffa2813c90}, rx_msg = {rdata = 0x7fff94060c50}, user = {user1 = 0x7fff94060c50, user2 = 0x7fffa2813c90, 
>               user3 = 0x7fffa2813cd0, user4 = 0x0}}}
>         tsx = 0x7fff94060a98
> #8  0x00007ffff78fc449 in pj_timer_heap_poll (ht=0x1137950, next_delay=0x7fffa2813d30) at ../src/pj/timer.c:643
>         node = 0x7fff94060c50
>         grp_lock = 0x7fffd8000ab8
>         now = {sec = 613363, msec = 925}
>         count = 2
> #9  0x00007ffff7875b19 in pjsip_endpt_handle_events2 (endpt=0x1137668, max_timeout=0x7fffa2813d70, p_count=0x0) at ../src/pjsip/sip_endpoint.c:712
>         timeout = {sec = 0, msec = 0}
>         count = 0
>         net_event_count = 0
>         c = <value optimized out>
> #10 0x00007ffff1320b00 in monitor_thread_exec (endpt=<value optimized out>) at res_pjsip.c:3889
>         delay = {sec = 0, msec = 10}
> #11 0x00007ffff78ea5d6 in thread_main (param=0x114dee8) at ../src/pj/os_core_unix.c:541
>         rec = 0x114dee8
>         result = <value optimized out>
> #12 0x00007ffff5a8faa1 in start_thread (arg=0x7fffa2814700) at pthread_create.c:301
>         __res = <value optimized out>
>         pd = 0x7fffa2814700
>         now = <value optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140735919769344, -4896504223120570676, 140737488337344, 140735919770048, 0, 3, 4896356555646224076, 4896525551845689036}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, 
>               canceltype = 0}}}
>         not_first_call = <value optimized out>
>         pagesize_m1 = <value optimized out>
>         sp = <value optimized out>
>         freesize = <value optimized out>
> #13 0x00007ffff509baad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> No locals.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list