[asterisk-bugs] [JIRA] (ASTERISK-21872) high CPU usage ~15 seconds into call if rtpkeepalive set on channels when Asterisk is in a generic bridge and passing RFC2833 DTMF

hristo (JIRA) noreply at issues.asterisk.org
Fri Jul 12 09:48:03 CDT 2013


    [ https://issues.asterisk.org/jira/browse/ASTERISK-21872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=207939#comment-207939 ] 

hristo commented on ASTERISK-21872:
-----------------------------------

I just finished testing with SVN-branch-1.8-r391778 on the Ubuntu server. rtpkeepalive still doesn't seem to make any difference in my setup, only the pause between DTMFs is important. As soon as I set it to 40 ms, I start seeing the problem even with a single call.

Maybe setting rtpkeepalive simply triggers the problem faster, but is not the root cause for it? Keeping in mind that this is most probably a performance problem, I can imagine that the CPU/System also plays a role. Could it be, that the system on which I test, already has slow enough CPUs and doesn't need the rtpkeepalive set to on to experience the problem...just speculating?

I used a Citrix virtual server with a single core for the last set of tests:
{code}
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 44
model name	: Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
stepping	: 2
microcode	: 0x13
cpu MHz		: 2394.056
cache size	: 12288 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 48
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc up rep_good nopl pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat dtherm
bogomips	: 4788.11
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:
{code}

However, on a 4 core physical server with Debian 6 and 1.8.22.0 I get the same results. Unfortunately, I cannot test with the SVN version on the physical server, nor can I install Ubuntu on it.
                
> high CPU usage ~15 seconds into call if rtpkeepalive set on channels when Asterisk is in a generic bridge and passing RFC2833 DTMF
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ASTERISK-21872
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21872
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/General
>    Affects Versions: SVN, 1.8.17.0, 1.8.19.1, 1.8.20.0, 1.8.22.0
>         Environment: Debian 6.0 64-bit
>            Reporter: hristo
>            Severity: Minor
>         Attachments: 2-calls-one-sending-many-dtmfs-asterisk-debug.txt, forward-stream-first-call-after-asterisk.pcap.txt, forward-stream-first-call-before-asterisk.pcap.txt, full.txt, sample-config.diff, trafficdump.pcap, vmstat.txt
>
>
> If I send several DTMFs to Asterisk, one after the other, fast enough, it blocks other voice RTP packets for as long as several hundred milliseconds. This seems to affects *all* RTP streams on a server.
> I can say for sure, that Asterisk is not dropping the RTP packets, because after a while it sends all of them at once. It seems as if they are being held by something, while the DTMFs are being processed/forwarded.
> This only occurs in non Packet2Packet mode.
> Originally I've seen the problem when several people were connecting to a conference at about the same time and were entering the PIN numbers at about the same time, therefore producing a lot of DTMFs. The conference runs on a dedicate hardware und is unrelated. Asterisk just sits in the middle and bridges the calls. I have managed to reproduce this with only two calls with as little as 10-15 DTMFs, provided they are send fast enough.
> Attaching is a debug console log from the following call scenario. In this case both calls were genereted from a dedicated server and terminated on another dedicated server.
> Call 1:
>  A (IP 1.1.1.1) dials 1000 --> Asterisk (IP 2.2.2.2) ---> B (IP 3.3.3.3)
> Call 2:
>  A' (IP 1.1.1.1) dials 2000 --> Asterisk (IP 2.2.2.2) ---> B' (IP 3.3.3.3)
> Both calls are active at this point. A' on Call 2 starts sending DTMFs (in this case 40 of them). As a result RTP packets from Call *1* in both directions are delayed by 150-160 ms and are being sent in bursts.
> In the logs I often see:
> res_timing_timerfd.c:225 timerfd_timer_ack: Expected to acknowledge 1 ticks but got 5 instead
> and the CPU is close to 100% (caused by the asterisk process). As soon as all DTMFs are sent, the RTP streams return back to normal with asterisk sending one packet every 20 ms on average.
> Attached is also a filtered packet capture that shows only the forward RTP stream on Call 1 from A -> Asterisk and from Asterisk -> B. "Time" represents the delta from the previos packet. Under normal conditions this should be close to 0.020 s (or 20 ms).
> One example of the problem can be seen at line 1235 in 'forward-stream-first-call-after-asterisk.pcap.txt'. The packet there has been held for ~160 ms, then sent together with the next 7 packets all at once.
> The RTP packets from the corresponding call leg (before asterisk) start at line 1244 in 'forward-stream-first-call-before-asterisk.pcap.txt" and are all equally spaced at about 20 ms.
> There are many such examples - simply search for 0.000 (deltas which are less than 1 ms) to identify groups of packets that are sent together. The same problem is present in the backward stream too (not attached).
> How to reproduce - add the following to the dialplan:
> exten => _X.,n,Dial(SIP/B at 3.3.3.3,,t)
> The 't' option is important, because it effectively disables the Packet2Packet mode. Connect 2 calls (2 sets of telephones) and start dialing DTMFs as fast as you can on one of them (or simply generate 2 calls and send the DTMFs as I did). This will disrupt the call between the other set of phones if done fast enough.
> I have tested this on 3 servers (2 physical and one virtual). All of them were running the same OS (Debian 6), so this may end up being an OS or res_timing_timerfd problem after all, but I really cannot test it on a different distribution.
> I tested with the following versions and was able to reproduce the problem with all of them:
> 1.8.22.0
> 1.8.20.0
> 1.8.19.1
> 1.8.17.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list