[asterisk-dev] [Code Review] Improve scheduler performance under high load

Sun Feb 15 21:33:31 CST 2009

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviewboard.digium.com/r/160/
-----------------------------------------------------------

(Updated 2009-02-15 21:33:31.307386)

Review request for Asterisk Developers.

Summary
-------

This patch implements the binary max-heap data structure and makes use of it in the Asterisk scheduler.  There are also two new test modules which we were used for development and testing of this code.  The heap test module verifies that the heap is behaving as expected.  The scheduler test module times how long it takes to do adds and deletes from a scheduler context.

More information about the heap data structure can be found here: http://en.wikipedia.org/wiki/Heap_(data_structure)

In the rest of the description and testing discussion, let N be the number of entries in the scheduler.

The data structure that has been replaced in sched.c is a sorted doubly linked list (DLL).  

With the DLL, adding an entry into the scheduler runs in O(N) time.  If the existing entries have an even distribution over time, then the average case is that N/2 element comparisons will be required to find the proper place to insert the new entry.  In the worst case, it's N comparisons.  Removing an element from the scheduler runs in O(1) time, since it simply requires unlinking it from its current position in the list.

With the heap, adding an entry into the scheduler runs in O(log_2 N) time.  Removing an entry is also O(log_2 N).

So, when switching to the heap implementation, you theoretically gain performance on additions, but take a hit on deletes.  But, how much?  See the "Testing Done" section to find out!

Diffs (updated)
-----

  /trunk/include/asterisk/heap.h PRE-CREATION 
  /trunk/main/Makefile 176017 
  /trunk/main/heap.c PRE-CREATION 
  /trunk/main/sched.c 176017 
  /trunk/tests/test_heap.c PRE-CREATION 
  /trunk/tests/test_sched.c PRE-CREATION 

Diff: http://reviewboard.digium.com/r/160/diff

Testing
-------

Since it may not be immediately clear what the benefits would be of this, testing has been done to measure the performance benefit for different values of N.

Useful things to note:

1) An addition into the scheduler also requires a malloc().  A deletion requires a call to free().  

2) Both operations require a mutex lock() and unlock().

So, the testing done was to figure out:

A) Do the benefits in add() outweigh the penalty in del() ?

B) At what value of N do these changes make enough of an impact to be noticed compared to how much time it takes for the malloc(), free(), and mutex lock()/unlock() calls?

The following tests were done on my primary development machine, which has an Intel Core 2 Duo @ 2.33 GHz.  The results were gathered using the "sched test <N>" CLI command from the test_sched_module.

Test results:

Average running time for N calls to ast_sched_add():

             trunk            team/russell/heap

N=1 million  686.7 seconds    2.7 seconds          
N=100k       53.8 seconds     .278 seconds
N=10k        149.8 ms.        29.9 ms.
N=5k         47.0 ms.         15.0 ms.
N=1k         3.9 ms.          2.9 ms.
N=500        1.6 ms.          1.5 ms.
N=100        0.3 ms.          0.3 ms.

Average running time for N calls to ast_sched_del():

             trunk            team/russell/heap

N=1 million  436.3 ms.        603.0 ms.
N=100k       42.0 ms.         60.3 ms.
N=10k        3.4 ms.          4.4 ms.
N=5k         1.8 ms.          2.2 ms.
N=1k         0.38 ms.         0.44 ms.
N=500        0.19 ms.         0.21 ms.
N=100        0.034 ms.        0.040 ms.

Total average running time for N calls to ast_sched_add() and ast_sched_del():

             trunk            team/russell/heap

N=1 million  686.1 seconds    3.3 seconds
N=100k       53.8 seconds     .338 seconds
N=10k        153.2 ms.        34.2 ms.
N=5k         48.8 ms.         17.2 ms.
N=1k         4.28 ms.         3.34 ms.
N=500        1.79 ms.         1.71 ms.
N=100        0.334 ms.        0.340 ms.

Average running time of team/russell/heap as a percentage of the average running time from trunk:

              Percentage of time
N=1 million   0.4 %
N=100k        0.6 %
N=10k         22.3 %
N=5k          35.2 %
N=1k          78.0 %
N=500         95.5 %
N=100         101.8 %

Analysis of results:

It appears that in this test environment, somewhere near N=100 and below, the code in trunk will perform better.  However, at that point, we're talking about very small fractions of a second.  As N gets larger, the code in team/russell/heap clearly starts to perform much better.

With how the scheduler API is used today, a module such as chan_sip or chan_iax2 could easily have a scheduler context with N in the thousands.  Every registration will result in a scheduler entry.  Also, every active channel will result in at least one, if not more scheduler entries.

Given these results, I do believe that the changes have proven to be beneficial overall.

Thanks,

Russell