[asterisk-dev] [Code Review] Improve event API cache performance
Mark Michelson
mmichelson at digium.com
Tue Mar 24 15:51:07 CDT 2009
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviewboard.digium.com/r/205/#review616
-----------------------------------------------------------
/branches/1.6.1/main/event.c
<http://reviewboard.digium.com/r/205/#comment1572>
If alloc_event_ref fails, be sure to free dup_event.
- Mark
On 2009-03-24 13:40:11, Russell Bryant wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviewboard.digium.com/r/205/
> -----------------------------------------------------------
>
> (Updated 2009-03-24 13:40:11)
>
>
> Review request for Asterisk Developers.
>
>
> Summary
> -------
>
> I have learned a lot since I originally wrote event.c a couple of years ago. Lately, I have been reflecting on some of the performance implications of some of the design decisions that I made. Then, this past Friday, I did some load testing for unrelated reasons and found that the conversion of device state handling to use the event API in 1.6.1 introduced a significant performance hit for the test that I was doing.
>
> Specifically, the test I was doing at the time was sending many thousands of IAX2 registrations at an Asterisk box. Asterisk 1.6.0 could handle it just fine. 1.6.1 choked. The reason was the performance of the event API cache with handling that many entries in the cache. In 1.6.0, when the devicestate code needed to find out the state of an IAX2 device, it would call into chan_iax2, which could determine the state by looking up the peer in a hash table. The performance gained by using the hash table was lost in 1.6.1, mostly due to the way the cache was implemented.
>
> Here is a summary of the changes that have been made, in order of both invasiveness and performance impact, from smallest to largest.
>
> 1) Asterisk 1.6.1 introduces some additional logic to be able to handle distributed device state. This functionality comes at a cost. One relatively minor change in this patch is that the extra processing required for distributed device state is now completely bypassed if it's not needed.
>
> 2) One of the things that I noticed when profiling this code was that a _lot_ of time was spent doing string comparisons. I changed the way strings are represented in an event to include a hash value at the front. So, before doing a string comparison, we do an integer comparison on the hash.
>
> 3) Finally, the code that handles the event cache has been re-written. I tried to do this in a such a way that it had minimal impact on the API. I did have to change one API call, though - ast_event_queue_and_cache(). However, the way it works now is nicer, IMO. Each type of event that can be cached (MWI, device state) has its own hash table and rules for hashing and comparing objects. This by far made the biggest impact on performance.
>
>
> To be clear, I am proposing that these changes be merged into 1.6.1, 1.6.2, and trunk. I consider the performance issues here a regression.
>
>
> This addresses bug 14738.
> http://bugs.digium.com/view.php?id=14738
>
>
> Diffs
> -----
>
> /branches/1.6.1/apps/app_voicemail.c 183987
> /branches/1.6.1/channels/chan_dahdi.c 183987
> /branches/1.6.1/channels/chan_iax2.c 183987
> /branches/1.6.1/channels/chan_mgcp.c 183987
> /branches/1.6.1/channels/chan_sip.c 183987
> /branches/1.6.1/channels/chan_skinny.c 183987
> /branches/1.6.1/channels/chan_unistim.c 183987
> /branches/1.6.1/include/asterisk/_private.h 183987
> /branches/1.6.1/include/asterisk/devicestate.h 183987
> /branches/1.6.1/include/asterisk/event.h 183987
> /branches/1.6.1/include/asterisk/strings.h 183987
> /branches/1.6.1/main/asterisk.c 183987
> /branches/1.6.1/main/devicestate.c 183987
> /branches/1.6.1/main/event.c 183987
> /branches/1.6.1/res/ais/evt.c 183987
>
> Diff: http://reviewboard.digium.com/r/205/diff
>
>
> Testing
> -------
>
> The first set of tests I did were functionality tests.
>
> For device state, I set up some hints and verified that when I changed the state of the associated device, the hint would change state. I ran this test with and without the distributed devstate processing code turned on.
>
> For MWI, I set up a mailbox, left a message, and verified that a NOTIFY was sent by chan_sip to my phone assigned to that mailbox.
>
>
> The second set of tests was performance. The box I was using for this has an Intel Core 2 Duo CPU @ 2.33 GHz. It has 2 GB of RAM. When I mention the CPU usage percentage for the Asterisk process, it is out of 200%, since this is a dual core box.
>
> The test I ran is that I ran a tool which sent 12000 IAX2 registrations to Asterisk, evenly spaced out over 1 minute (one every 5 ms). IAX2 dynamic realtime was enabled, with caching turned on.
>
> The oprofile results include all functions that were taking up 0.5% of Asterisk's CPU time or higher.
>
> ----------------------------------------
>
> Version: Asterisk SVN-branch-1.6.0-r183915
>
> Average CPU usage of asterisk: 3% to 5%
>
> oprofile results:
>
> samples % image name app name symbol name
> 26088 44.3357 asterisk asterisk ast_sched_add_variable
> 8226 13.9798 chan_iax2.so chan_iax2.so network_thread
> 4731 8.0402 asterisk asterisk _ast_sched_del
> 4262 7.2431 chan_iax2.so chan_iax2.so __find_callno
> 3785 6.4325 chan_iax2.so chan_iax2.so socket_process
> 1842 3.1304 chan_iax2.so chan_iax2.so pvt_destructor
> 1652 2.8075 [vdso] (tgid:24198 range:0xb7f64000-0xb7f65000) asterisk (no symbols)
> 617 1.0486 asterisk asterisk cd_cb
> 523 0.8888 asterisk asterisk bt_broot
> 339 0.5761 chan_iax2.so chan_iax2.so build_peer
> 337 0.5727 res_config_odbc.so res_config_odbc.so realtime_odbc
> 329 0.5591 chan_iax2.so chan_iax2.so register_verify
>
> ----------------------------------------
>
> Version: Asterisk SVN-branch-1.6.1-r183916
>
> The CPU usage by Asterisk increased steadily throughout this test and peaked at about 140%.
>
> oprofile results:
>
> samples % image name app name symbol name
> 317713 49.4110 asterisk asterisk match_ie_val
> 75913 11.8061 asterisk asterisk ast_event_queue_and_cache
> 58659 9.1227 asterisk asterisk schedule
> 51157 7.9560 asterisk asterisk ast_event_get_cached
> 44859 6.9765 asterisk asterisk ast_event_dump_cache
> 31661 4.9240 chan_iax2.so chan_iax2.so network_thread
> 15696 2.4411 asterisk asterisk .plt
> 8601 1.3376 chan_iax2.so chan_iax2.so socket_process
> 7524 1.1701 chan_iax2.so chan_iax2.so __find_callno
> 6401 0.9955 chan_iax2.so chan_iax2.so pvt_destructor
> 4282 0.6659 chan_iax2.so chan_iax2.so __attempt_transmit
>
> ----------------------------------------
>
> Version: Asterisk SVN-russell-event_performance-r183985-/branches/1.6.1
>
> Average CPU usage of asterisk: 3% to 5%
>
> oprofile results:
>
> samples % image name app name symbol name
> 42952 44.9665 asterisk asterisk schedule
> 18687 19.5634 chan_iax2.so chan_iax2.so network_thread
> 6553 6.8603 chan_iax2.so chan_iax2.so __find_callno
> 6444 6.7462 chan_iax2.so chan_iax2.so socket_process
> 3882 4.0641 chan_iax2.so chan_iax2.so pvt_destructor
> 2261 2.3670 [vdso] (tgid:22953 range:0xb7f99000-0xb7f9a000) asterisk (no symbols)
> 1658 1.7358 chan_iax2.so chan_iax2.so __attempt_transmit
> 977 1.0228 asterisk asterisk __ao2_callback
> 682 0.7140 asterisk asterisk __ao2_ref
> 549 0.5747 asterisk asterisk bt_broot
>
> ----------------------------------------
>
> Analysis of results:
>
> As you can see, the current code in Asterisk 1.6.1 has some performance issues, and the clear culprit is the event code. After this set of changes, the event code drops off of th profiling map, and we're back to what we would expect to be the biggest consumers of CPU time during this test.
>
> Additional Commentary:
>
> The scheduler is the next big offender in CPU consumption here. Luckily, this code has been heavily modified in trunk and 1.6.2 (http://reviewboard.digium.com/r/160/), so once these changes make their way around, it will be interesting to do this test again in 1.6.2. After the scheduler, the IAX2 network_thread is the next biggest consumer of CPU. When looking closer at that code, the offending piece of code is actually the handling of the IAX2 frame cache. There are some very easy changes we could make to significantly reduce the CPU consumption there, but I'll save that for another day. :-)
>
>
> Thanks,
>
> Russell
>
>
More information about the asterisk-dev
mailing list