[asterisk-bugs] [JIRA] (ASTERISK-24768) res_timing_pthread: file descriptor leak

Sun Feb 22 08:53:34 CST 2015

    [ https://issues.asterisk.org/jira/browse/ASTERISK-24768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=225053#comment-225053 ] 

Private Name edited comment on ASTERISK-24768 at 2/22/15 8:51 AM:
------------------------------------------------------------------

The problem is back. I am using SIP, not PJSIP. With 220 open calls, playing tons of voice prompts, it is a free roulette over the phone, you may test it 1 573 764 2870 (free free free)
 lsof | grep asterisk | grep FIFO | wc -l
646110
after like 1/2 hour.

Note: SVN-branch-11-r432098M has the same issue, so this is a regression issue.
I think this may be a new leak, unknown so far. I went back to rev SVN-branch-13-r431807M and with 120 calls it shows
lsof | grep asterisk| grep FIFO | wc -l
190950
The only thing I do is playing prompts and constantly reading variables
Note:
This is turning simpler. In a completely empty, not even a single line, dialplan, with 

SIP/demo-0000037d    (None)               Up      Echo()
SIP/demo-0000037e    (None)               Up      Echo()
SIP/demo-0000037f    (None)               Up      Echo()
501 active channels
0 active calls
0 calls processed
lsof | grep asterisk| grep FIFO | wc -l
1028105

This is SVN-branch-13-r432154M
Is this normal?
By the way, there are no codecs loaded that do not come with Asterisk.
All the channels are originated using a call file and the app Echo.
I did some experimentation, and it shows that if If I open 500 calls, and in the receiving end use the app Echo(), same in the sending box, the count is 44 on both ends. Two boxes, one originates 500 calls and drops them into Echo(), the other one receives them and also Echo(). No FIFO problem.  
However, if in the receiving end I play music on hold, on the 500 calls, the FIFO count goes to 500K. This proves that the issue is RTP, not timers. Also I tried loading alternatively both timing  modules, pthread and timerf, and it did not matter.
Also I tried changing the codec, frim ulaw to g729, and the FIFO count, ceteris paribus, went down to 1/4.
When the  call count gets to 500, I get this error several times
ERROR[14107]: res_rtp_asterisk.c:3031 ast_rtcp_write_report: RTCP RR transmission error to X.X.X.X:10585, rtcp halted Operation not permitted

Can somebody give instructions as to how to debug this inside gdb?

was (Author: falves11):
The problem is back. I am using SIP, not PJSIP. With 220 open calls, playing tons of voice prompts, it is a free roulette over the phone, you may test it 1 573 764 2870 (free free free)
 lsof | grep asterisk | grep FIFO | wc -l
646110
after like 1/2 hour.

Note: SVN-branch-11-r432098M has the same issue, so this is a regression issue.
I think this may be a new leak, unknown so far. I went back to rev SVN-branch-13-r431807M and with 120 calls it shows
lsof | grep asterisk| grep FIFO | wc -l
190950
The only thing I do is playing prompts and constantly reading variables
Note:
This is turning simpler. In a completely empty, not even a single line, dialplan, with 

SIP/demo-0000037d    (None)               Up      Echo()
SIP/demo-0000037e    (None)               Up      Echo()
SIP/demo-0000037f    (None)               Up      Echo()
501 active channels
0 active calls
0 calls processed
lsof | grep asterisk| grep FIFO | wc -l
1028105

This is SVN-branch-13-r432154M
Is this normal?
By the way, there are no codecs loaded that do not come with Asterisk.
All the channels are originated using a call file and the app Echo.
I did some experimentation, and it shows that if If I open 500 calls, and in the receiving end use the app Echo(), same in the sending box, the count is 44 on both ends. Two boxes, one originates 500 calls and drops them into Echo(), the other one receives them and also Echo(). No FIFO problem.  
However, if in the receiving end I play music on hold, on the 500 calls, the FIFO count goes to 500K. This proves that the issue is RTP, not timers. Also I tried loading alternatively both timing  modules, pthread and timerf, and it did not matter.

Can somebody give instructions as to how to debug this inside gdb?

> res_timing_pthread: file descriptor leak
> ----------------------------------------
>
>                 Key: ASTERISK-24768
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-24768
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_timing_pthread
>    Affects Versions: 13.2.0
>         Environment: Current Debian (jessie/testing), i386, up-to-date
>            Reporter: Matthias Urlichs
>            Assignee: Joshua Colp
>         Attachments: timer.patch
>
>
> Pthread timers are never deallocated because their link into the pthread_timers chain is never undone.
> This causes a file descriptor leak (at least two per incoming call).
> The locking in this patch probably needs review; the ao2_unlink() call does not. :-P
> \[Edit:\] *Inline patch removed*

--
This message was sent by Atlassian JIRA
(v6.2#6252)