[asterisk-bugs] [JIRA] (ASTERISK-24837) chan_sip calls to Asterisk result in file descriptors growing exponentially while channels remain up

Thu Oct 8 18:10:33 CDT 2015

    [ https://issues.asterisk.org/jira/browse/ASTERISK-24837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=227830#comment-227830 ] 

Private Name commented on ASTERISK-24837:
-----------------------------------------

The issue of the handle leak remains identical in Asterisk GIT-11-966265dM
I have an application that plays IVRs and dials out, only a few calls, waits for DTMF,etc.
The calls are always almost exactly 1300 calls
core show channels count
1292 active channels
1284 active calls
core show uptime
System uptime: 2 hours, 29 minutes, 32 seconds
Last reload: 1 hour, 27 minutes, 20 seconds

The operating system shows
ss -s
Total: 29574 (kernel 30296)
TCP: 28 (estab 9, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 22

Transport Total IP IPv6

    30296 - -
    RAW 0 0 0
    UDP 29441 29433 8
    TCP 28 20 8
    INET 29469 29453 16
    FRAG 0 0 0

And it keeps growing until no more ports can be open, and the app needs to be restarted, which may happen again in 3 hours.
This app cannot be debugged because there are too many calls per second and it is in production.
However, if there is anything I may do to get to the bottom of this, let me know.

core show settings

PBX Core settings
-----------------
Version: GIT-11-966265dM
Build Options: AST_DEVMODE, LOADABLE_MODULES, G711_NEW_ALGORITHM, G711_REDUCED_BRANCHING
Maximum calls: Not set
Maximum open file handles: Not set
Root console verbosity: 0
Current console verbosity: 0
Debug level: 0
Maximum load average: 0.000000
Minimum free memory: 0 MB
Startup time: 16:27:47
Last reload time: 17:30:00
System: Linux/2.6.32-573.3.1.el6.x86_64 built by root on x86_64 2015-10-08 19:31:32 UTC
System name: X2
Entity ID: 00:50:56:b1:24:65
Default language: en
Language prefix: Enabled
User name and group: /
Executable includes: Disabled
Transcode via SLIN: Enabled
Transmit silence during rec: Disabled
Generic PLC: Enabled
Min DTMF duration:: 80

    Subsystems
    -------------
    Manager (AMI): Enabled
    Web Manager (AMI/HTTP): Disabled
    Call data records: Enabled
    Realtime Architecture (ARA): Disabled

    Directories
    -------------
    Configuration file:
    Configuration directory: /etc/asterisk
    Module directory: /usr/lib/asterisk/modules
    Spool directory: /var/spool/asterisk
    Log directory: /var/log/asterisk
    Run/Sockets directory: /var/run/asterisk
    PID file: /var/run/asterisk/asterisk.pid
    VarLib directory: /var/lib/asterisk
    Data directory: /var/lib/asterisk
    ASTDB: /var/lib/asterisk/astdb
    IAX2 Keys directory: /var/lib/asterisk/keys
    AGI Scripts directory: /var/lib/asterisk/agi-bin

> chan_sip calls to Asterisk result in file descriptors growing exponentially while channels remain up
> ----------------------------------------------------------------------------------------------------
>
>                 Key: ASTERISK-24837
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-24837
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_rtp_asterisk
>    Affects Versions: 13.2.0
>         Environment: Linux 64
>            Reporter: Private Name
>            Assignee: Unassigned
>         Attachments: asterisktest.sh, core_show_fd_120_channels.txt, sip.conf, trace.txt, trace.txt, trace.txt, valgrind.txt, valgrind.txt, valgrind.txt, valgrind.txt
>
>
> [Edit by Rusty - Please use the wiki formatting available to make reports easy to read - I'm going to clean this up for now.]
> When I originate several hundred calls using a call file, no dialplan, using app Echo() or app MusicOnHold, the calls connect, but when the other side starts to send media, after some 200+ calls I get several errors:
> {noformat}
> "ERROR[2433] res_rtp_asterisk.c: RTCP SR transmission error to 208.78.162.174:34443, rtcp halted Operation not permitted"
> {noformat}
> and the handle count explodes, measured at the end
> {noformat}
> lsof | grep asterisk | grep FIFO | wc -l
> 1025046
> {noformat}
> yes one million and change. The handle count never decreases as long as the channels are open. There are no active calls, only 500 channels.
> The call files are all identical:
> {noformat}
> Channel: SIP/0000000000 at demo
> CallerID: "0000000000" <>
> WaitTime: 45
> MaxRetries: 0
> RetryTime: 0
> Application: Echo
> Data:
> Archive: no
> {noformat}
> where demo is a simple peer like this
> {noformat}
> [demo]
> host=xxx.xxx.xxx.xx 
> type=peer
> insecure=port,invite
> context=reject
> disallow=all
> allow=ulaw
> allow=g729
> session-timers=accept
> port=5060
> faxdetect=no
> transport=udp
> directmedia=yes
> {noformat}
> *Note 1:* the caller ID may vary call by call  it makes no difference. The issue here is the very high handle count, which slows down and kills the machine, and the errors which show that many calls do not connect.
> *Note 2:* The calls do no across the internet, they go to a local box.
> I need to send as many calls in real life, with media, so this is a killer for my business model.
> If I set up debug=10 and verbose=20, I get thousands of lines identical like this ones
> {noformat}
> Feb 26 23:47:13] DEBUG[2433]: acl.c:963 ast_find_ourip: Attached to given IP address
> [Feb 26 23:47:13] DEBUG[5763]: res_rtp_asterisk.c:3958 ast_rtcp_read: Got RTCP report of 64 bytes
> [Feb 26 23:47:13] DEBUG[5763]: acl.c:958 ast_find_ourip: Not an IPv4 nor IPv6 address, cannot get port.
> [Feb 26 23:47:13] DEBUG[5763]: netsock2.c:172 ast_sockaddr_split_hostport: Splitting 'dasaro' into...
> [Feb 26 23:47:13] DEBUG[5763]: netsock2.c:226 ast_sockaddr_split_hostport: ...host 'dasaro' and port ''.
> [Feb 26 23:47:13] DEBUG[5763]: acl.c:958 ast_find_ourip: Not an IPv4 nor IPv6 address, cannot get port.
> [Feb 26 23:47:13] DEBUG[5763]: acl.c:963 ast_find_ourip: Attached to given IP address
> {noformat}
> {noformat}
>  ulimit -a
> core file size          (blocks, -c) unlimited
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 1048576
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1048576
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) unlimited
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> {noformat}
> *Note 3:* Please note how the handles start to grow with each additional call
> {noformat}
> Count idle
> 46
> 1 call
> lsof | grep asterisk | grep FIFO | wc -l
> 104
> 2 calls
> lsof | grep asterisk | grep FIFO | wc -l
> 168
> 3 calls
> lsof | grep asterisk | grep FIFO | wc -l
> 240
> 4 calls
> lsof | grep asterisk | grep FIFO | wc -l
> 320
> {noformat}
> As you can see, the handles do not grow linearly, but close to exponentially.

--
This message was sent by Atlassian JIRA
(v6.2#6252)