[asterisk-bugs] [JIRA] (ASTERISK-26344) Asterisk 13.11.0 + PJSIP crash

Ian Gilmour (JIRA) noreply at issues.asterisk.org
Thu Oct 27 04:42:10 CDT 2016


     [ https://issues.asterisk.org/jira/browse/ASTERISK-26344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Gilmour updated ASTERISK-26344:
-----------------------------------

    Attachment: cli-and-gdb-2016-10-27.tgz

With the latest patch in place I got no immediate crash so I left it running overnight. Still no crashes by morning. So good progress. :-)

But I suspect the following indicate we now have a memory leak.

{noformat}
grep -e ":40:0[012]" -B 3 screenlog.2 | grep -e ":40:0" -e 31339 | sed 's/[[:space:]]*$//'
31339 iang      20   0 2778m  63m  13m S  0.3  0.8   0:47.22 asterisk
top - 21:40:01 up 14 days, 13:39,  5 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  63m  13m S  0.7  0.8   1:03.18 asterisk
top - 22:40:02 up 14 days, 14:39,  5 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  64m  13m S  1.7  0.8   1:19.21 asterisk
top - 23:40:01 up 14 days, 15:39,  5 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  65m  13m S  0.3  0.8   1:36.02 asterisk
top - 00:40:02 up 14 days, 16:39,  5 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  67m  13m S  0.0  0.9   1:52.29 asterisk
top - 01:40:00 up 14 days, 17:39,  5 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  67m  13m S  0.7  0.9   2:08.39 asterisk
top - 02:40:01 up 14 days, 18:39,  5 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  68m  13m S  0.7  0.9   2:24.50 asterisk
top - 03:40:02 up 14 days, 19:39,  5 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  68m  13m S  0.3  0.9   2:41.09 asterisk
top - 04:40:01 up 14 days, 20:39,  5 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  69m  13m S  0.3  0.9   2:59.74 asterisk
top - 05:40:02 up 14 days, 21:39,  5 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  70m  13m S  0.3  0.9   3:16.87 asterisk
top - 06:40:00 up 14 days, 22:39,  5 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  72m  13m S  0.7  0.9   3:33.20 asterisk
top - 07:40:01 up 14 days, 23:39,  6 users,  load average: 0.00, 0.00, 0.00
31339 iang      20   0 2778m  73m  13m S  0.3  0.9   3:49.29 asterisk
top - 08:40:02 up 15 days, 39 min,  6 users,  load average: 0.00, 0.00, 0.00
{noformat}

and,

{noformat}
grep -nH -B 5 -e "bytes in all allocations and deferred free allocations" screenlog.0 
screenlog.0-56326-
screenlog.0-56327-  12590115 bytes in all allocations
screenlog.0-56328-   1409194 bytes in deferred free large allocations
screenlog.0-56329-     35735 bytes in deferred free small allocations
screenlog.0-56330-   1444929 bytes in deferred free allocations
screenlog.0:56331:  14035044 bytes in all allocations and deferred free allocations
--
screenlog.0-56594-
screenlog.0-56595-  12591138 bytes in all allocations
screenlog.0-56596-   1405537 bytes in deferred free large allocations
screenlog.0-56597-     35642 bytes in deferred free small allocations
screenlog.0-56598-   1441179 bytes in deferred free allocations
screenlog.0:56599:  14032317 bytes in all allocations and deferred free allocations
--
screenlog.0-59106-
screenlog.0-59107-  12638906 bytes in all allocations
screenlog.0-59108-   1416980 bytes in deferred free large allocations
screenlog.0-59109-     36774 bytes in deferred free small allocations
screenlog.0-59110-   1453754 bytes in deferred free allocations
screenlog.0:59111:  14092660 bytes in all allocations and deferred free allocations
--
screenlog.0-66080-
screenlog.0-66081-  12690106 bytes in all allocations
screenlog.0-66082-   1441081 bytes in deferred free large allocations
screenlog.0-66083-     36137 bytes in deferred free small allocations
screenlog.0-66084-   1477218 bytes in deferred free allocations
screenlog.0:66085:  14167324 bytes in all allocations and deferred free allocations
--
screenlog.0-67593-
screenlog.0-67594-  12700346 bytes in all allocations
screenlog.0-67595-   1439375 bytes in deferred free large allocations
screenlog.0-67596-     36300 bytes in deferred free small allocations
screenlog.0-67597-   1475675 bytes in deferred free allocations
screenlog.0:67598:  14176021 bytes in all allocations and deferred free allocations
--
screenlog.0-285757-
screenlog.0-285758-  15318479 bytes in all allocations
screenlog.0-285759-   1413605 bytes in deferred free large allocations
screenlog.0-285760-     36032 bytes in deferred free small allocations
screenlog.0-285761-   1449637 bytes in deferred free allocations
screenlog.0:285762:  16768116 bytes in all allocations and deferred free allocations
--
screenlog.0-325786-
screenlog.0-325787-  15715874 bytes in all allocations
screenlog.0-325788-   1381982 bytes in deferred free large allocations
screenlog.0-325789-     36186 bytes in deferred free small allocations
screenlog.0-325790-   1418168 bytes in deferred free allocations
screenlog.0:325791:  17134042 bytes in all allocations and deferred free allocations
{noformat}


> Asterisk 13.11.0 + PJSIP crash
> ------------------------------
>
>                 Key: ASTERISK-26344
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26344
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: pjproject/pjsip
>    Affects Versions: 13.11.0
>         Environment: Centos 6.8 (64-bit) + Asterisk-13.11.0 (with bundled 2.5.5 pjsip) + libsrtp-1.5.4
>            Reporter: Ian Gilmour
>            Assignee: Ian Gilmour
>         Attachments: 0001-r5400-pjsip_tx_data_dec_ref.patch, 0006_jira_asterisk_26344_v13_fix_cache.patch, 0006_jira_asterisk_26344_v13_fix.patch, cli-and-gdb-2016-10-24-crash2.tgz, cli-and-gdb-2016-10-24.tgz, cli-and-gdb-2016-10-25-crash1.tgz, cli-and-gdb-2016-10-26-crash1-after-fix.tgz, cli-and-gdb-2016-10-27.tgz, cli-and-gdb-3-crashes.tgz, cli-and-gdb-bt-on-destroying_tx_data.tgz, cli-and-gdb-inc-dec-ref-logging.tgz, cli-and-gdb.tgz, cli-and-gdb.tgz, jira_asterisk_26344_v13_debuging.patch
>
>
> Hi,
> I have a development Asterisk 13.11.0 test setup (uses the bundled pjsip-2.5.5).
> Environment is Centos 6.8 (64-bit) + Asterisk-13.11.0 + libsrtp-1.5.4.
> On startup Asterisk registers 5 Asterisk users with a remote OpenSIPS server, over TLS, using PJSIP. As part of the test all 5 Asterisk PJSIP users are reregistered with OpenSIPS Server every couple of mins.
> All outgoing/incoming pjsip call media is encrypted using SRTP and via an external RTPPROXY running alongside the external OpenSIPS Server.
> Asterisk is also configured to use chan_sip on 127.0.0.1:5060 to allow calls from a locally run SIPp process. All SIPp calls are TCP+RTP.
> I use SIPp to run multiple concurrent loopback calls (calls vary in
> duration) through Asterisk to the OpenSIPS server and back to an echo() service running on the same Asterisk).
> i.e.
> {noformat}
>   SIPp <-TCP/RTP-> chan_sip <-> chan_pjsip <-TLS/SRTP->
>       OpenSIPS server (+ rtpproxy) <-TLS/SRTP-> chan_pjsip (echo service).
> {noformat}
> Initially I see all chan_pjsip registrations and reregistrations for all 5 PJSIP users go out through a single TCP port. I then start a SIPp test running multiple concurrent calls. At some point into the test the Asterisk PJSIP TCP port gets closed and reopened - when it does so I see Asterisk crash shortly afterwards. Possibly significantly\(?) the time of the crash was around the time one of the PJSIP users should have reregistered after the TCP outgoing port change (The log shows all 5 PJSIP users reregistering after the PJSIP TCP port change, but only 4 of the 5 reregistering twice before the crash).
> {noformat}
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffa2814700 (LWP 7166)]
> __pthread_mutex_lock (mutex=0x44492d6c6c6143) at pthread_mutex_lock.c:50
> 50        unsigned int type = PTHREAD_MUTEX_TYPE (mutex);
> (gdb) bt
> #0  __pthread_mutex_lock (mutex=0x44492d6c6c6143) at pthread_mutex_lock.c:50
> #1  0x00007ffff78e9d9b in pj_mutex_lock (mutex=0x44492d6c6c6143) at ../src/pj/os_core_unix.c:1265
> #2  0x00007ffff78e9e39 in pj_atomic_dec_and_get (atomic_var=0x7fffd8074630) at ../src/pj/os_core_unix.c:962
> #3  0x00007ffff787d7e0 in pjsip_tx_data_dec_ref (tdata=0x7fff8c3bfab8) at ../src/pjsip/sip_transport.c:495
> #4  0x00007ffff788a087 in tsx_shutdown (tsx=0x7fff94060a98) at ../src/pjsip/sip_transaction.c:1062
> #5  0x00007ffff788b4bc in tsx_set_state (tsx=0x7fff94060a98, state=PJSIP_TSX_STATE_DESTROYED, event_src_type=PJSIP_EVENT_TIMER, event_src=0x7fff94060c50, flag=0) at ../src/pjsip/sip_transaction.c:1271
> #6  0x00007ffff788b88e in tsx_on_state_terminated (tsx=<value optimized out>, event=<value optimized out>) at ../src/pjsip/sip_transaction.c:3337
> #7  0x00007ffff788bcd5 in tsx_timer_callback (theap=<value optimized out>, entry=0x7fff94060c50) at ../src/pjsip/sip_transaction.c:1171
> #8  0x00007ffff78fc449 in pj_timer_heap_poll (ht=0x1137950, next_delay=0x7fffa2813d30) at ../src/pj/timer.c:643
> #9  0x00007ffff7875b19 in pjsip_endpt_handle_events2 (endpt=0x1137668, max_timeout=0x7fffa2813d70, p_count=0x0) at ../src/pjsip/sip_endpoint.c:712
> #10 0x00007ffff1320b00 in monitor_thread_exec (endpt=<value optimized out>) at res_pjsip.c:3889
> #11 0x00007ffff78ea5d6 in thread_main (param=0x114dee8) at ../src/pj/os_core_unix.c:541
> #12 0x00007ffff5a8faa1 in start_thread (arg=0x7fffa2814700) at pthread_create.c:301
> #13 0x00007ffff509baad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> (gdb) bt full
> #0  __pthread_mutex_lock (mutex=0x44492d6c6c6143) at pthread_mutex_lock.c:50
>         type = <value optimized out>
>         id = <value optimized out>
> #1  0x00007ffff78e9d9b in pj_mutex_lock (mutex=0x44492d6c6c6143) at ../src/pj/os_core_unix.c:1265
>         status = <value optimized out>
> #2  0x00007ffff78e9e39 in pj_atomic_dec_and_get (atomic_var=0x7fffd8074630) at ../src/pj/os_core_unix.c:962
>         new_value = <value optimized out>
> #3  0x00007ffff787d7e0 in pjsip_tx_data_dec_ref (tdata=0x7fff8c3bfab8) at ../src/pjsip/sip_transport.c:495
> No locals.
> #4  0x00007ffff788a087 in tsx_shutdown (tsx=0x7fff94060a98) at ../src/pjsip/sip_transaction.c:1062
> No locals.
> #5  0x00007ffff788b4bc in tsx_set_state (tsx=0x7fff94060a98, state=PJSIP_TSX_STATE_DESTROYED, event_src_type=PJSIP_EVENT_TIMER, event_src=0x7fff94060c50, flag=0) at ../src/pjsip/sip_transaction.c:1271
>         prev_state = PJSIP_TSX_STATE_TERMINATED
> #6  0x00007ffff788b88e in tsx_on_state_terminated (tsx=<value optimized out>, event=<value optimized out>) at ../src/pjsip/sip_transaction.c:3337
> No locals.
> #7  0x00007ffff788bcd5 in tsx_timer_callback (theap=<value optimized out>, entry=0x7fff94060c50) at ../src/pjsip/sip_transaction.c:1171
>         event = {prev = 0x7fff8c5f4908, next = 0x1bfe, type = PJSIP_EVENT_TIMER, body = {timer = {entry = 0x7fff94060c50}, tsx_state = {src = {rdata = 0x7fff94060c50, tdata = 0x7fff94060c50, timer = 0x7fff94060c50, status = -1811542960, data = 0x7fff94060c50}, 
>               tsx = 0x7fffa2813c90, prev_state = -1568588592, type = 32767}, tx_msg = {tdata = 0x7fff94060c50}, tx_error = {tdata = 0x7fff94060c50, tsx = 0x7fffa2813c90}, rx_msg = {rdata = 0x7fff94060c50}, user = {user1 = 0x7fff94060c50, user2 = 0x7fffa2813c90, 
>               user3 = 0x7fffa2813cd0, user4 = 0x0}}}
>         tsx = 0x7fff94060a98
> #8  0x00007ffff78fc449 in pj_timer_heap_poll (ht=0x1137950, next_delay=0x7fffa2813d30) at ../src/pj/timer.c:643
>         node = 0x7fff94060c50
>         grp_lock = 0x7fffd8000ab8
>         now = {sec = 613363, msec = 925}
>         count = 2
> #9  0x00007ffff7875b19 in pjsip_endpt_handle_events2 (endpt=0x1137668, max_timeout=0x7fffa2813d70, p_count=0x0) at ../src/pjsip/sip_endpoint.c:712
>         timeout = {sec = 0, msec = 0}
>         count = 0
>         net_event_count = 0
>         c = <value optimized out>
> #10 0x00007ffff1320b00 in monitor_thread_exec (endpt=<value optimized out>) at res_pjsip.c:3889
>         delay = {sec = 0, msec = 10}
> #11 0x00007ffff78ea5d6 in thread_main (param=0x114dee8) at ../src/pj/os_core_unix.c:541
>         rec = 0x114dee8
>         result = <value optimized out>
> #12 0x00007ffff5a8faa1 in start_thread (arg=0x7fffa2814700) at pthread_create.c:301
>         __res = <value optimized out>
>         pd = 0x7fffa2814700
>         now = <value optimized out>
>         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140735919769344, -4896504223120570676, 140737488337344, 140735919770048, 0, 3, 4896356555646224076, 4896525551845689036}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, 
>               canceltype = 0}}}
>         not_first_call = <value optimized out>
>         pagesize_m1 = <value optimized out>
>         sp = <value optimized out>
>         freesize = <value optimized out>
> #13 0x00007ffff509baad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> No locals.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list