[asterisk-dev] [Code Review] Improve scheduler performance under high load

Sun Feb 15 21:15:04 CST 2009

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviewboard.digium.com/r/160/#review424
-----------------------------------------------------------


/trunk/include/asterisk/heap.h
<http://reviewboard.digium.com/r/160/#comment1009>

    Can this be amended to replace -1 with any negative value and 1 with any positive value? It appears that the code in heap.c works like this anyway.
    
    To justify why I ask for this, I point to the fact that a string-indexed heap would likely want to just use strcmp or strcasecmp directly as its comparison function.


/trunk/include/asterisk/heap.h
<http://reviewboard.digium.com/r/160/#comment1011>

    From looking at the code, index_offset is *not* optional at all. Sure, the only time it is actually used for anything useful is in ast_heap_remove, but it is also used any time that heap_set is called, which means that the bubble-up and bubble-down operations that are performed during an ast_heap_push and ast_heap_pop will result in the index_offset of the element being written to.
    
    There are a couple of ways around this problem
    
    1. Change the documentation to state that the index_offset is not optional
    2. Allow a reserved value to be used to indicate that the index_offset provided to ast_heap_create is invalid and should not actually be referenced. For me, -1 seems like an obvious choice. You'd have to change the type of index_offset to be a signed type and add some extra sanity checks inside any heap function which tries to reference the index_offset.
    
    My vote is for number 2.


/trunk/include/asterisk/heap.h
<http://reviewboard.digium.com/r/160/#comment1010>

    "unsigned int" is in quotation marks here, which to me means that a literal unsigned int type must be used for the heap's  index tracking. However, the actual API seems to use size_t instead. It's a nit really, but it's a case where the docs seem to contradict what's actually used in the code.
    
    Of course the content of my previous note may result in this field being a signed type...


/trunk/main/heap.c
<http://reviewboard.digium.com/r/160/#comment1012>

    There is absolutely no locking implemented in a heap right now, and the operations are anything but atomic. It seems like ast_heap should have a lock so that multiple threads may make use of the heap safely.


- Mark


On 2009-02-15 19:38:53, Russell Bryant wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviewboard.digium.com/r/160/
> -----------------------------------------------------------
> 
> (Updated 2009-02-15 19:38:53)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Summary
> -------
> 
> This patch implements the binary max-heap data structure and makes use of it in the Asterisk scheduler.  There are also two new test modules which we were used for development and testing of this code.  The heap test module verifies that the heap is behaving as expected.  The scheduler test module times how long it takes to do adds and deletes from a scheduler context.
> 
> More information about the heap data structure can be found here: http://en.wikipedia.org/wiki/Heap_(data_structure)
> 
> In the rest of the description and testing discussion, let N be the number of entries in the scheduler.
> 
> The data structure that has been replaced in sched.c is a sorted doubly linked list (DLL).  
> 
> With the DLL, adding an entry into the scheduler runs in O(N) time.  If the existing entries have an even distribution over time, then the average case is that N/2 element comparisons will be required to find the proper place to insert the new entry.  In the worst case, it's N comparisons.  Removing an element from the scheduler runs in O(1) time, since it simply requires unlinking it from its current position in the list.
> 
> With the heap, adding an entry into the scheduler runs in O(log_2 N) time.  Removing an entry is also O(log_2 N).
> 
> So, when switching to the heap implementation, you theoretically gain performance on additions, but take a hit on deletes.  But, how much?  See the "Testing Done" section to find out!
> 
> 
> Diffs
> -----
> 
>   /trunk/include/asterisk/heap.h PRE-CREATION 
>   /trunk/main/Makefile 175921 
>   /trunk/main/heap.c PRE-CREATION 
>   /trunk/main/sched.c 175921 
>   /trunk/tests/test_heap.c PRE-CREATION 
>   /trunk/tests/test_sched.c PRE-CREATION 
> 
> Diff: http://reviewboard.digium.com/r/160/diff
> 
> 
> Testing
> -------
> 
> Since it may not be immediately clear what the benefits would be of this, testing has been done to measure the performance benefit for different values of N.
> 
> Useful things to note:
> 
> 1) An addition into the scheduler also requires a malloc().  A deletion requires a call to free().  
> 
> 2) Both operations require a mutex lock() and unlock().
> 
> So, the testing done was to figure out:
> 
> A) Do the benefits in add() outweigh the penalty in del() ?
> 
> B) At what value of N do these changes make enough of an impact to be noticed compared to how much time it takes for the malloc(), free(), and mutex lock()/unlock() calls?
> 
> The following tests were done on my primary development machine, which has an Intel Core 2 Duo @ 2.33 GHz.  The results were gathered using the "sched test <N>" CLI command from the test_sched_module.
> 
> Test results:
> 
> Average running time for N calls to ast_sched_add():
> 
>              trunk            team/russell/heap
> 
> N=1 million  686.7 seconds    2.7 seconds          
> N=100k       53.8 seconds     .278 seconds
> N=10k        149.8 ms.        29.9 ms.
> N=5k         47.0 ms.         15.0 ms.
> N=1k         3.9 ms.          2.9 ms.
> N=500        1.6 ms.          1.5 ms.
> N=100        0.3 ms.          0.3 ms.
> 
> Average running time for N calls to ast_sched_del():
> 
>              trunk            team/russell/heap
> 
> N=1 million  436.3 ms.        603.0 ms.
> N=100k       42.0 ms.         60.3 ms.
> N=10k        3.4 ms.          4.4 ms.
> N=5k         1.8 ms.          2.2 ms.
> N=1k         0.38 ms.         0.44 ms.
> N=500        0.19 ms.         0.21 ms.
> N=100        0.034 ms.        0.040 ms.
> 
> Total average running time for N calls to ast_sched_add() and ast_sched_del():
> 
>              trunk            team/russell/heap
> 
> N=1 million  686.1 seconds    3.3 seconds
> N=100k       53.8 seconds     .338 seconds
> N=10k        153.2 ms.        34.2 ms.
> N=5k         48.8 ms.         17.2 ms.
> N=1k         4.28 ms.         3.34 ms.
> N=500        1.79 ms.         1.71 ms.
> N=100        0.334 ms.        0.340 ms.
> 
> Average running time of team/russell/heap as a percentage of the average running time from trunk:
> 
>               Percentage of time
> N=1 million   0.4 %
> N=100k        0.6 %
> N=10k         22.3 %
> N=5k          35.2 %
> N=1k          78.0 %
> N=500         95.5 %
> N=100         101.8 %
> 
> Analysis of results:
> 
> It appears that in this test environment, somewhere near N=100 and below, the code in trunk will perform better.  However, at that point, we're talking about very small fractions of a second.  As N gets larger, the code in team/russell/heap clearly starts to perform much better.
> 
> With how the scheduler API is used today, a module such as chan_sip or chan_iax2 could easily have a scheduler context with N in the thousands.  Every registration will result in a scheduler entry.  Also, every active channel will result in at least one, if not more scheduler entries.
> 
> Given these results, I do believe that the changes have proven to be beneficial overall.
> 
> 
> Thanks,
> 
> Russell
> 
>