[asterisk-dev] [Code Review] Improve event API cache performance

Russell Bryant russell at digium.com
Tue Mar 24 13:40:11 CDT 2009


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviewboard.digium.com/r/205/
-----------------------------------------------------------

(Updated 2009-03-24 13:40:11.585747)


Review request for Asterisk Developers.


Changes
-------

Add mantis issue number


Summary
-------

I have learned a lot since I originally wrote event.c a couple of years ago.  Lately, I have been reflecting on some of the performance implications of some of the design decisions that I made.  Then, this past Friday, I did some load testing for unrelated reasons and found that the conversion of device state handling to use the event API in 1.6.1 introduced a significant performance hit for the test that I was doing.

Specifically, the test I was doing at the time was sending many thousands of IAX2 registrations at an Asterisk box.  Asterisk 1.6.0 could handle it just fine.  1.6.1 choked.  The reason was the performance of the event API cache with handling that many entries in the cache.  In 1.6.0, when the devicestate code needed to find out the state of an IAX2 device, it would call into chan_iax2, which could determine the state by looking up the peer in a hash table.  The performance gained by using the hash table was lost in 1.6.1, mostly due to the way the cache was implemented.

Here is a summary of the changes that have been made, in order of both invasiveness and performance impact, from smallest to largest.

1) Asterisk 1.6.1 introduces some additional logic to be able to handle distributed device state.  This functionality comes at a cost.  One relatively minor change in this patch is that the extra processing required for distributed device state is now completely bypassed if it's not needed.

2) One of the things that I noticed when profiling this code was that a _lot_ of time was spent doing string comparisons.  I changed the way strings are represented in an event to include a hash value at the front.  So, before doing a string comparison, we do an integer comparison on the hash.

3) Finally, the code that handles the event cache has been re-written.  I tried to do this in a such a way that it had minimal impact on the API.  I did have to change one API call, though - ast_event_queue_and_cache().  However, the way it works now is nicer, IMO.  Each type of event that can be cached (MWI, device state) has its own hash table and rules for hashing and comparing objects.  This by far made the biggest impact on performance.


To be clear, I am proposing that these changes be merged into 1.6.1, 1.6.2, and trunk.  I consider the performance issues here a regression.


This addresses bug 14738.
    http://bugs.digium.com/view.php?id=14738


Diffs
-----

  /branches/1.6.1/apps/app_voicemail.c 183987 
  /branches/1.6.1/channels/chan_dahdi.c 183987 
  /branches/1.6.1/channels/chan_iax2.c 183987 
  /branches/1.6.1/channels/chan_mgcp.c 183987 
  /branches/1.6.1/channels/chan_sip.c 183987 
  /branches/1.6.1/channels/chan_skinny.c 183987 
  /branches/1.6.1/channels/chan_unistim.c 183987 
  /branches/1.6.1/include/asterisk/_private.h 183987 
  /branches/1.6.1/include/asterisk/devicestate.h 183987 
  /branches/1.6.1/include/asterisk/event.h 183987 
  /branches/1.6.1/include/asterisk/strings.h 183987 
  /branches/1.6.1/main/asterisk.c 183987 
  /branches/1.6.1/main/devicestate.c 183987 
  /branches/1.6.1/main/event.c 183987 
  /branches/1.6.1/res/ais/evt.c 183987 

Diff: http://reviewboard.digium.com/r/205/diff


Testing
-------

The first set of tests I did were functionality tests.  

For device state, I set up some hints and verified that when I changed the state of the associated device, the hint would change state.  I ran this test with and without the distributed devstate processing code turned on.

For MWI, I set up a mailbox, left a message, and verified that a NOTIFY was sent by chan_sip to my phone assigned to that mailbox.


The second set of tests was performance.  The box I was using for this has an Intel Core 2 Duo CPU @ 2.33 GHz.  It has 2 GB of RAM.  When I mention the CPU usage percentage for the Asterisk process, it is out of 200%, since this is a dual core box.

The test I ran is that I ran a tool which sent 12000 IAX2 registrations to Asterisk, evenly spaced out over 1 minute (one every 5 ms).  IAX2 dynamic realtime was enabled, with caching turned on.

The oprofile results include all functions that were taking up 0.5% of Asterisk's CPU time or higher.

----------------------------------------

Version: Asterisk SVN-branch-1.6.0-r183915

Average CPU usage of asterisk: 3% to 5%

oprofile results:

samples  %        image name               app name                 symbol name
26088    44.3357  asterisk                 asterisk                 ast_sched_add_variable
8226     13.9798  chan_iax2.so             chan_iax2.so             network_thread
4731      8.0402  asterisk                 asterisk                 _ast_sched_del
4262      7.2431  chan_iax2.so             chan_iax2.so             __find_callno
3785      6.4325  chan_iax2.so             chan_iax2.so             socket_process
1842      3.1304  chan_iax2.so             chan_iax2.so             pvt_destructor
1652      2.8075  [vdso] (tgid:24198 range:0xb7f64000-0xb7f65000) asterisk                 (no symbols)
617       1.0486  asterisk                 asterisk                 cd_cb
523       0.8888  asterisk                 asterisk                 bt_broot
339       0.5761  chan_iax2.so             chan_iax2.so             build_peer
337       0.5727  res_config_odbc.so       res_config_odbc.so       realtime_odbc
329       0.5591  chan_iax2.so             chan_iax2.so             register_verify

----------------------------------------

Version: Asterisk SVN-branch-1.6.1-r183916

The CPU usage by Asterisk increased steadily throughout this test and peaked at about 140%.

oprofile results:

samples  %        image name               app name                 symbol name
317713   49.4110  asterisk                 asterisk                 match_ie_val
75913    11.8061  asterisk                 asterisk                 ast_event_queue_and_cache
58659     9.1227  asterisk                 asterisk                 schedule
51157     7.9560  asterisk                 asterisk                 ast_event_get_cached
44859     6.9765  asterisk                 asterisk                 ast_event_dump_cache
31661     4.9240  chan_iax2.so             chan_iax2.so             network_thread
15696     2.4411  asterisk                 asterisk                 .plt
8601      1.3376  chan_iax2.so             chan_iax2.so             socket_process
7524      1.1701  chan_iax2.so             chan_iax2.so             __find_callno
6401      0.9955  chan_iax2.so             chan_iax2.so             pvt_destructor
4282      0.6659  chan_iax2.so             chan_iax2.so             __attempt_transmit

----------------------------------------

Version: Asterisk SVN-russell-event_performance-r183985-/branches/1.6.1

Average CPU usage of asterisk: 3% to 5%

oprofile results:

samples  %        image name               app name                 symbol name
42952    44.9665  asterisk                 asterisk                 schedule
18687    19.5634  chan_iax2.so             chan_iax2.so             network_thread
6553      6.8603  chan_iax2.so             chan_iax2.so             __find_callno
6444      6.7462  chan_iax2.so             chan_iax2.so             socket_process
3882      4.0641  chan_iax2.so             chan_iax2.so             pvt_destructor
2261      2.3670  [vdso] (tgid:22953 range:0xb7f99000-0xb7f9a000) asterisk                 (no symbols)
1658      1.7358  chan_iax2.so             chan_iax2.so             __attempt_transmit
977       1.0228  asterisk                 asterisk                 __ao2_callback
682       0.7140  asterisk                 asterisk                 __ao2_ref
549       0.5747  asterisk                 asterisk                 bt_broot

----------------------------------------

Analysis of results:

As you can see, the current code in Asterisk 1.6.1 has some performance issues, and the clear culprit is the event code.  After this set of changes, the event code drops off of th profiling map, and we're back to what we would expect to be the biggest consumers of CPU time during this test.

Additional Commentary:

The scheduler is the next big offender in CPU consumption here.  Luckily, this code has been heavily modified in trunk and 1.6.2 (http://reviewboard.digium.com/r/160/), so once these changes make their way around, it will be interesting to do this test again in 1.6.2.  After the scheduler, the IAX2 network_thread is the next biggest consumer of CPU.  When looking closer at that code, the offending piece of code is actually the handling of the IAX2 frame cache.  There are some very easy changes we could make to significantly reduce the CPU consumption there, but I'll save that for another day.  :-)


Thanks,

Russell




More information about the asterisk-dev mailing list