[asterisk-bugs] [JIRA] (ASTERISK-29232) Memory Leak since 16.13.0

Luke Escude (JIRA) noreply at issues.asterisk.org
Wed Nov 10 10:15:49 CST 2021


    [ https://issues.asterisk.org/jira/browse/ASTERISK-29232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=256835#comment-256835 ] 

Luke Escude commented on ASTERISK-29232:
----------------------------------------

Thanks George!

Here's what I am going to start doing today:
1. malloc trim hourly cronjob - Malloc trim doesn't return all the leaked memory but it does a good chunk
2. Only create dial plan hints that will actually be used by customer devices (instead of creating hints for everything we monitor)
3. Move SUBSCRIBE/NOTIFY stuff to Kamailio, utilizing Asterisk's PUBLISH capabilities so Asterisk no long has to handle subscriptions.

Besides enabling MALLOC_DEBUG in the compile process, I couldn't figure out how to do anything else for memory diagnostics, including asan/lsan - I got no output from it, and have 0 experience working with it. I know there's something called Valgrind but I haven't looked into how to use it yet.

> Memory Leak since 16.13.0
> -------------------------
>
>                 Key: ASTERISK-29232
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-29232
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/PBX
>    Affects Versions: 16.15.0, 16.20.0
>         Environment: CentOS 7 x64
>            Reporter: Luke Escude
>            Assignee: Luke Escude
>            Severity: Major
>              Labels: fax
>         Attachments: Analysis.xlsx, Apex-Analysis.xlsx, Container Leak Tracking.xlsx, cw1-memchart.png, Jan6-1401.csv, nw1-memchart.png, PW3-Memchart.png
>
>
> So we have around 100 instances of Asterisk 16.13.0 that have been running for over 2 months, normal load (small businesses with less than 30 users each), without issue.
> We have another 350 instances of Asterisk 16.15.0 that we've started seeing a very linear increase in memory consumption over time. Specifically, we see higher-load instances (150+ users) last only a few days before hitting our artificial 3GB ceiling and getting restarted by the OOM killer.
> There are very few differences in our implementation of the 16.13 and 16.15 versions. All versions are set up as the following:
> - CentOS 7 64-bit
> - Voicemail over ODBC
> - unixODBC 2.3.1
> - MariaDB Connector (instead of the crappy mysql connector)
> - CDR over MySQL
> - SIP Trunks are registered every 2 minutes, qualified every 15 seconds.
> - User devices register every 10 minutes, qualified every 15 seconds.
> - User devices connect via TCP more often than UDP.
> - I have NO pjsip threadpool configuration options defined. I think the default is 50 threads?
> Here is what I am about to test within the next week:
> 1. unixODBC updated to 2.3.9
> 2. Longer SIP Trunk Registration period - Maybe PJSIP is working too hard?
> 3. Longer qualify timeout - Maybe PJSIP is working too hard?
> One of my first questions: Is it SAFE to compile asterisk with MALLOC_DEBUG and just leave it on permanently? I am scared to enable it, and suddenly have a bunch of users that are experiencing issues because I've enabled something that should only be enabled in Dev.
> Sorry for the length of the post, trying to cover as much ground as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list