[asterisk-bugs] [JIRA] (ASTERISK-29232) Memory Leak since 16.13.0

Wed Nov 10 13:32:49 CST 2021

    [ https://issues.asterisk.org/jira/browse/ASTERISK-29232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=256839#comment-256839 ] 

Luke Escude commented on ASTERISK-29232:
----------------------------------------

Additionally, here's something interesting that may be unrelated (something I didn't notice back when we used UDP transport)

When everything is running just fine (Asterisk making outbound registrations to our SIP proxies, and inbound registrations coming from devices are coming into Asterisk like normal), and suddenly the connection with the SIP proxy gets broken/restarted, the following occurs:

WARNING[3360]: taskprocessor.c:1160 taskprocessor_push: The 'pjsip/outreg/dev-chicago-1-0000019f' task processor queue reached 500 scheduled tasks.
[Nov 10 19:27:47] DEBUG[6508]: res_pjsip_outbound_registration.c:860 reregister_immediately_cb: Outbound registration transport to server 'sip:104.238.165.75:4242' from client 'sip:999999 at dev-chicago-sip1-int2.primevox.net' shutdown
[Nov 10 19:27:47] DEBUG[6508]: res_pjsip_outbound_registration.c:608 handle_client_registration: Outbound REGISTER attempt 1 to 'sip:104.238.165.75:4242' with client 'sip:999999 at dev-chicago-sip1-int2.primevox.net'

Those last 2 messages repeat probably 200+ times.

So, is it possible there's some build-up of events in PJSIP outbound registration?

> Memory Leak since 16.13.0
> -------------------------
>
>                 Key: ASTERISK-29232
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-29232
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/PBX
>    Affects Versions: 16.15.0, 16.20.0
>         Environment: CentOS 7 x64
>            Reporter: Luke Escude
>            Assignee: Luke Escude
>            Severity: Major
>              Labels: fax
>         Attachments: Analysis.xlsx, Apex-Analysis.xlsx, Container Leak Tracking.xlsx, cw1-memchart.png, Jan6-1401.csv, nw1-memchart.png, PW3-Memchart.png
>
>
> So we have around 100 instances of Asterisk 16.13.0 that have been running for over 2 months, normal load (small businesses with less than 30 users each), without issue.
> We have another 350 instances of Asterisk 16.15.0 that we've started seeing a very linear increase in memory consumption over time. Specifically, we see higher-load instances (150+ users) last only a few days before hitting our artificial 3GB ceiling and getting restarted by the OOM killer.
> There are very few differences in our implementation of the 16.13 and 16.15 versions. All versions are set up as the following:
> - CentOS 7 64-bit
> - Voicemail over ODBC
> - unixODBC 2.3.1
> - MariaDB Connector (instead of the crappy mysql connector)
> - CDR over MySQL
> - SIP Trunks are registered every 2 minutes, qualified every 15 seconds.
> - User devices register every 10 minutes, qualified every 15 seconds.
> - User devices connect via TCP more often than UDP.
> - I have NO pjsip threadpool configuration options defined. I think the default is 50 threads?
> Here is what I am about to test within the next week:
> 1. unixODBC updated to 2.3.9
> 2. Longer SIP Trunk Registration period - Maybe PJSIP is working too hard?
> 3. Longer qualify timeout - Maybe PJSIP is working too hard?
> One of my first questions: Is it SAFE to compile asterisk with MALLOC_DEBUG and just leave it on permanently? I am scared to enable it, and suddenly have a bunch of users that are experiencing issues because I've enabled something that should only be enabled in Dev.
> Sorry for the length of the post, trying to cover as much ground as possible.

--
This message was sent by Atlassian JIRA
(v6.2#6252)