[asterisk-bugs] [JIRA] (ASTERISK-21872) high CPU usage ~15 seconds into call if rtpkeepalive set on channels when Asterisk is in a generic bridge and passing RFC2833 DTMF

Tue Nov 5 13:58:03 CST 2013

    [ https://issues.asterisk.org/jira/browse/ASTERISK-21872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=211533#comment-211533 ] 

Modulus commented on ASTERISK-21872:
------------------------------------

We can reproduce this on a KVM-based virtual machine running asterisk with the following characteristics:

||Architecture| x86_64 (KVM)|
||Number of virtual CPUs| 6|
||Distribution| Debian Wheezy 7.2|
||Asterisk| 10.12.1|
||Kernel| 3.2.0-4-amd64 (debian package version), 3.2.51 (upstream release)|
||CPUs on host machine| 2 x 8-core AMD Opteron(tm) Processor 4284|

To reproduce this automatically we use another asterisk server to originate a call to the target asterisk machine and send DTMFs using the following command:

{code}CLI> originate Local/2********@originate extension s at senddtmf{code}

with the following dialplan context:
{code}[senddtmf]
exten => s,1,Wait(10)
exten => s,n,SendDTMF(824718273487126345871263486128374619872364897632984716238947,25,100)
exten => s,n,Goto(s,1){code}

With timeouts of 25ms up to 40ms between DTMF tones (40ms being the minimum according to RFC4733) the problem is automatically reproduced all the time, with one CPU going at 100% (more CPUs can go 100% if there are more than one calls sending DTMF tones concurrently). Although it cannot be reproduced with a SendDTMF() timeout of 45ms or above, it is quite easy to reproduce it manually by just fingering digits fast enough and this is very important because it reveals that the problem can be triggered by normal use. It should be also noted that it takes just one call to reproduce the problem and cause one CPU to go 100%. rtpkeepalive is not set.

Running the [perf|https://perf.wiki.kernel.org] tool we get the following distribution of CPU cycles inside the asterisk process:
{code}Events: 12K cpu-clock, DSO: asterisk             
+  30.47%  asterisk  ast_dummy_channel_destructor
+  18.92%  asterisk  ast_channel_destructor      
+  12.79%  asterisk  ast_dsp_process             
+  11.60%  asterisk  ast_party_redirecting_set_init   
+   5.45%  asterisk  ast_set_hangupsource             
+   2.78%  asterisk  generator_force                  
+   2.62%  asterisk  manager_state_cb                 
+   2.21%  asterisk  ast_get_enum                     
+   1.88%  asterisk  0x4024f                          
+   1.68%  asterisk  ast_party_redirecting_copy       
+   1.44%  asterisk  action_originate                 
+   1.32%  asterisk  ast_sendtext                     
+   1.12%  asterisk  action_aocmessage                
+   1.02%  asterisk  ast_senddigit_begin              
+   0.99%  asterisk  msg_send_exec                    
+   0.94%  asterisk  ast_settimeout                   
+   0.59%  asterisk  ast_read_generator_actions       
+   0.31%  asterisk  ast_register_application2        
+   0.22%  asterisk  __ast_manager_event_multichan    
+   0.21%  asterisk  pthread_mutex_lock at plt           
+   0.20%  asterisk  __init_manager                   
+   0.19%  asterisk  ast_channel_clear_softhangup     
+   0.19%  asterisk  action_coreshowchannels          
+   0.14%  asterisk  generic_thread_loop              
+   0.12%  asterisk  parse_naptr                      
+   0.07%  asterisk  private_enum_init                
+   0.06%  asterisk  read at plt                         
+   0.06%  asterisk  data_search_generate             
+   0.05%  asterisk  ast_register_switch              
+   0.04%  asterisk  pthread_self at plt                 
+   0.04%  asterisk  ast_bridge_depart                
+   0.03%  asterisk  pthread_rwlock_unlock at plt        
+   0.02%  asterisk  fcntl at plt                        
+   0.02%  asterisk  ebl_callback                     
+   0.02%  asterisk  ast_stun_request                 
+   0.02%  asterisk  write at plt                        
+   0.02%  asterisk  devstate_event                   
+   0.02%  asterisk  ast_dsp_init                     
+   0.02%  asterisk  msg_tech_hash                    
+   0.02%  asterisk  pbx_builtin_execiftime           
+   0.02%  asterisk  ast_pbx_outgoing_app             
+   0.02%  asterisk  ast_get_srv                      
+   0.01%  asterisk  poll at plt                         
+   0.01%  asterisk  gettimeofday at plt                 
+   0.01%  asterisk  ast_parse_arg                    
+   0.01%  asterisk  ast_dsp_silence_noise_with_energy
+   0.01%  asterisk  __ast_dsp_call_progress          
+   0.01%  asterisk  ast_dsp_set_features             
+   0.01%  asterisk  ast_destroy_timing               
+   0.01%  asterisk  ast_say_date_with_format_it      
+   0.01%  asterisk  ssl_lock                         
+   0.01%  asterisk  tmcomp                           
+   0.01%  asterisk  _history_expand_command{code}

> high CPU usage ~15 seconds into call if rtpkeepalive set on channels when Asterisk is in a generic bridge and passing RFC2833 DTMF
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ASTERISK-21872
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21872
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/General
>    Affects Versions: SVN, 1.8.17.0, 1.8.19.1, 1.8.20.0, 1.8.22.0
>         Environment: Debian 6.0 64-bit
>            Reporter: hristo
>            Severity: Minor
>         Attachments: 2-calls-one-sending-many-dtmfs-asterisk-debug.txt, forward-stream-first-call-after-asterisk.pcap.txt, forward-stream-first-call-before-asterisk.pcap.txt, full.txt, sample-config.diff, trafficdump.pcap, vmstat.txt
>
>
> If I send several DTMFs to Asterisk, one after the other, fast enough, it blocks other voice RTP packets for as long as several hundred milliseconds. This seems to affects *all* RTP streams on a server.
> I can say for sure, that Asterisk is not dropping the RTP packets, because after a while it sends all of them at once. It seems as if they are being held by something, while the DTMFs are being processed/forwarded.
> This only occurs in non Packet2Packet mode.
> Originally I've seen the problem when several people were connecting to a conference at about the same time and were entering the PIN numbers at about the same time, therefore producing a lot of DTMFs. The conference runs on a dedicate hardware und is unrelated. Asterisk just sits in the middle and bridges the calls. I have managed to reproduce this with only two calls with as little as 10-15 DTMFs, provided they are send fast enough.
> Attaching is a debug console log from the following call scenario. In this case both calls were genereted from a dedicated server and terminated on another dedicated server.
> Call 1:
>  A (IP 1.1.1.1) dials 1000 --> Asterisk (IP 2.2.2.2) ---> B (IP 3.3.3.3)
> Call 2:
>  A' (IP 1.1.1.1) dials 2000 --> Asterisk (IP 2.2.2.2) ---> B' (IP 3.3.3.3)
> Both calls are active at this point. A' on Call 2 starts sending DTMFs (in this case 40 of them). As a result RTP packets from Call *1* in both directions are delayed by 150-160 ms and are being sent in bursts.
> In the logs I often see:
> res_timing_timerfd.c:225 timerfd_timer_ack: Expected to acknowledge 1 ticks but got 5 instead
> and the CPU is close to 100% (caused by the asterisk process). As soon as all DTMFs are sent, the RTP streams return back to normal with asterisk sending one packet every 20 ms on average.
> Attached is also a filtered packet capture that shows only the forward RTP stream on Call 1 from A -> Asterisk and from Asterisk -> B. "Time" represents the delta from the previos packet. Under normal conditions this should be close to 0.020 s (or 20 ms).
> One example of the problem can be seen at line 1235 in 'forward-stream-first-call-after-asterisk.pcap.txt'. The packet there has been held for ~160 ms, then sent together with the next 7 packets all at once.
> The RTP packets from the corresponding call leg (before asterisk) start at line 1244 in 'forward-stream-first-call-before-asterisk.pcap.txt" and are all equally spaced at about 20 ms.
> There are many such examples - simply search for 0.000 (deltas which are less than 1 ms) to identify groups of packets that are sent together. The same problem is present in the backward stream too (not attached).
> How to reproduce - add the following to the dialplan:
> exten => _X.,n,Dial(SIP/B at 3.3.3.3,,t)
> The 't' option is important, because it effectively disables the Packet2Packet mode. Connect 2 calls (2 sets of telephones) and start dialing DTMFs as fast as you can on one of them (or simply generate 2 calls and send the DTMFs as I did). This will disrupt the call between the other set of phones if done fast enough.
> I have tested this on 3 servers (2 physical and one virtual). All of them were running the same OS (Debian 6), so this may end up being an OS or res_timing_timerfd problem after all, but I really cannot test it on a different distribution.
> I tested with the following versions and was able to reproduce the problem with all of them:
> 1.8.22.0
> 1.8.20.0
> 1.8.19.1
> 1.8.17.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira