[asterisk-bugs] [JIRA] (ASTERISK-25905) Memory leak during perf testing

Robert McGilvray (JIRA) noreply at issues.asterisk.org
Thu Apr 14 09:22:56 CDT 2016


    [ https://issues.asterisk.org/jira/browse/ASTERISK-25905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=230257#comment-230257 ] 

Robert McGilvray commented on ASTERISK-25905:
---------------------------------------------

Richard,

First, thank you for looking at this. Your analysis makes sense but it does raise an issue for me - I need high concurrency and I also need CDR logs. Once you pointed out that CDR was at the root of the issue I did more testing - without CDR I can easily run 1800 concurrent calls and Asterisk is happy never going beyond 500M RES.

Granted my load tests are generating a higher call rate than normal (60s call duration) so under normal circumstances we wouldn't have as many calls in the same time period, however, it seems CDR and high concurrency don't mix at all.

I just did a test where I ran 1500 concurrent calls for a total of around 15k. The task processor grew to a depth of over 9 million (9127584 to be exact) , almost 25 minutes later after having paused all new calls the the queue is still higher than the total amount pcoessed.

93777b48-6756-4e4b-a723-65ffb8358e2d             4758483      5909429      9127584

I see around 1500 PJSIP channels in 'cdr show active' but I'm not sure what the box is actually doing. It's not writing the CDRs anywhere (csvs are empty, no backends loaded), but one of my CPUs is pegged at 100% while the other 7 are idle. It seems the code that handles the CDR records / processing the backlog is single threaded, cpu bound and incredibly inefficient. 

If this isn't a memory leak can anything be done about the single threaded code / improving the speed at which Asterisk processes the backlog? I don't think I'm doing anything crazy here - 1k-1500 concurrency shouldn't be an issue with the hardware I'm throwing at Asterisk. (8) vCpus @ 3.30 Ghz and 32G of ram.

In my production environment we peak at 400 concurrency daily with chan_sip and Meetme. We have 3 Call manager clusters (>5000 users) that all feed into Asterisk for conferencing so I need this to scale. Any help you guys can provide to make that happen is much appreciated.

Regards




> Memory leak during perf testing
> -------------------------------
>
>                 Key: ASTERISK-25905
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25905
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_confbridge, pjproject/pjsip
>    Affects Versions: 13.8.0
>         Environment: Red Hat Enterprise Linux Server release 7.2 (Maipo)
> Linux ykt1cfbprd1 3.10.0-327.13.1.el7.x86_64
> certified/13.8-cert1-rc1
> pjproject-2.4.5
>            Reporter: Robert McGilvray
>            Assignee: Richard Mudgett
>            Severity: Minor
>         Attachments: loadtest.txt, memory-summary.txt
>
>
> ** I've been testing against the certified branch, last cloned yesterday with certified/13.8-cert1-rc1. It would not allow me to select that as a version however ** 
> While using sipp as a generator to load test Asterisk I've come across a memory leak that very quickly exhausts the host of resources. 
> The testing methodology is pretty simple: use sipp to launch 1500 concurrent calls to asterisk with a call rate of 25/sec. On the asterisk side use the RAND function to generate two numbers, one of which is the confbridge number and the other (either 0 or 1) is to determine whether to use the moderator profile or participant. The call is then dropped into a ConfBridge for 60s and Hungup. 
> After a few thousand completed calls the memory usage grows and eventually exhausts the host resources. I recompiled with MALLOC_DEBUG enabled, the output of memory show allocations is attached. It looks like the allocations are in stasis_channels, well after all channels have been disconnected. 
> {noformat}
> ykt1cfbprd1:/home/netops# ps -C asterisk u
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> asterisk 20927  113  9.2 6292308 3029204 ?     Sl   16:32  24:47 /home/asterisk/asterisk-cert-13.8/sbin/asterisk -f -C /home/asterisk/asterisk-cert
> root     32052  0.0  0.0  47428  2840 pts/0    S+   16:43   0:00 rasterisk risk/asterisk-cert-13.8/sbin/asterisk -r
> {noformat}
> Please let me know if you need any further information.
> Thanks!!



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list