[asterisk-dev] MTX_PROFILE tutorial & work

Steve Murphy murf at digium.com
Thu Jun 12 01:23:14 CDT 2008


WARNING: if you don't give a hoot about how much
time mutex locking is absorbing in asterisk, then
move on to the next letter.


In the midnight hours, I played with the implementation of
the MTX_PROFILE code.

It's all in svn:

  http://svn.digium.com/svn/asterisk/team/murf/mtxprof

What does the MTX_PROFILE code do for you?

Well, in my case, I'm seeing inexplicable slowdowns
in code that should be running faster. One possible
explanation is that lock contention is slowing
things down. Even compiling the code with
-pg isn't showing lock acquisition time.
The MTX_PROFILE code could be used
to show the amount of time Asterisk waited
for locks, and either support or reject the
lock contention theory.

But it's not working the way I hoped it would,
so I've been hacking at it. On my own time, in
the wee hours of the night.

Why? 

Well, if you use "make menuselect" and check the MTX_PROFILE
option in the compiler flags section, and NOT have the
DEBUG_THREAD option turned on, you can use the "core show profiles"
command to see how much time it's taking to get locks in Asterisk.

But, there are problems:
  1. one 'marker' per module (each marker holds a scale factor,
                              an event count, a total time
                              (in cpu ticks).
     all ast_mutex_lock calls within the module
     will use that same marker.

  2. the profiling code uses a 'trylock' call
     instead of the pthread_mutex_lock.

The above 'problems' seem to be a reasonable 
compromise to these problems:

1. general substitution of ast_mutex_lock() calls
   that return a value. This makes it hard
   to wrap calls with curly brackets via a macro.
2. the func to get the cpu tick count (rdtsc) returns
   a 64-bit value. (you can't use the fetchadd
   funcs on anything but an int). Therefore,
   this code is not easily thread-safe... for
   instance, while one thread is timing one
   lock, another may be writing over its stored
   cpu tick-count. Thus, keeping the time
   between ast_mark to a minimum is important.
   This may explain the use of trylock. Also
   to consider is that the DETECT_DEADLOCK
   code is built around the trylock, and using
   the same techniques in both allow them both
   to be used at the same time.

Here is what I WANTED:

1. a seperate 'marker' for each ast_mutex_lock
   call, with the call file and line number
   as part of the name of the 'marker'.
2. local storage of the time before the
   lock was attempted, so we can have thread-
   safe operation via the stack.
3. Storage of an 32-bit int in the 'marker' struct
   instead of a 64-bit value, so we can safely
   update the counts using the fetchadd_int
   facility.

Here's what I had to do:

Change the MTX_PROF macro to call ast_mark2 to
start timing, and ast_mark3 to finish the
process and update the profile ('marker')
struct's counts. A local 64-bit value
is used to store the cpu ticks before
the lock is attempted. ast_mark2() fetches the
cpu ticks via rdtsc(). ast_mark3() fetches the current
tick value and, passed the previous tick value,
subtracts the two, turns the result into an
int and uses the fetchadd_int facility
to update the 'marker' event counter and
the total time value.

The local start-time and profile 'marker'
index are done by turning the ast_mutex_lock()
call into a macro that uses curly-braces to
define  some local vars, which are then passed
into the MTX_PROF macro, etc. Also, the
file and line numbers are passed in at run
time. This means that the ast_mutex_lock() 
call is now a macro expansion that involves
introducing a new curly-block, which makes
it so we can't use this call as part of
an assignment, or a simple return statement.

So, I added an ast_mutex_lock_assign() macro
that would allow you to assign the return value
to a variable.

Then I went thru all of asterisk, and changed
all occurrences of:

     retvar = ast_mutex_lock(...);

to

     ast_mutex_lock_assign(..., retvar);

and all occurrences of:

      return ast_mutex_lock(...);

to

      ast_mutex_lock_assign(...,retvar);
      return retvar;

and a few other similar situations.


Luckily, the overwhelming majority of usages are simple
calls:

       ast_mutex_lock(...);

I only had to change these files:
        channels/chan_local.c
        channels/chan_mgcp.c
        channels/chan_phone.c
        channels/chan_unistim.c
        funcs/func_lock.c
        include/asterisk/linkedlists.h
        include/asterisk/lock.h
        include/asterisk.h
        main/asterisk.c
        main/astobj2.c
        main/channel.c
        main/pbx.c
        pbx/pbx_config.c
        res/res_smdi.c
        utils/hashtest.c
        utils/hashtest2.c
        utils/refcounter.c

Also, because I do not time via trylock any more, 
I've made the MTX_PROFILE option incompatible with
the DETECT_DEADLOCK option...

So, what do I get now?

*CLI> core show profile
profile values (111, allocated 130)
-------------------
 ID  Scale Events   Value Average Name
  0: [ 0]    65     62939    968  config.c__config_text_file_load__1205
  1: [ 0]     1      2604   2604  config.c__clear_config_maps__1780
  2: [ 0]     1      3811   3811  utils.c__test_for_thread_safety__265
  3: [ 0]     1       168    168  utils.c__test_for_thread_safety__266
  4: [ 0]     1       200    200  utils.c__test_for_thread_safety__268
  5: [ 0]     1    922719 922719  utils.c__test_thread_body__242
  6: [ 0]     1       462    462  utils.c__test_thread_body__246
  7: [ 0]     1     12201  12201  utils.c__test_thread_body__250
  8: [ 0]   180     79432    441  astobj2.c__ao2_lock__154
  9: [ 0]    20      4946    247
taskprocessor.c__tps_processing_function__288
 10: [ 0]  3868   1880453    486  logger.c__logger_thread__967
 11: [ 0]  1911    423793    221  logger.c____ast_verbose__1284
 12: [ 0]  2034   1224604    602  stdtime/localtime.c__ast_tzset__1018
 13: [ 0]     1       494    494  stdtime/localtime.c__ast_tzset__1035
 14: [ 0]  2034    908226    446  logger.c__ast_log__1157
 15: [ 0]    26     12129    466  config.c__find_engine__1942
 16: [ 0]     2      1343    671  loader.c__load_modules__784
 17: [ 0]     1      1040   1040  dnsmgr.c__do_reload__371
 18: [ 0]     1       452    452  manager.c__append_event__3139
 19: [ 0]     6      8684   1447  manager.c__purge_sessions__3102
 20: [ 0]     6      4736    789  manager.c__purge_events__307
 21: [ 0]     1      1501   1501  cdr.c__do_reload__1343
 22: [ 0]     6      3307    551
devicestate.c__run_devstate_collector__691
... <snipped out the remaining of the 110 listed>...
*CLI>   

In the above, the name is: <filename>__<funcname>__<lineno>

As opposed to the current output:

*CLI> core show profile
profile values (412, allocated 430)
-------------------
ID  Scale Events    Value Average  Name
 0: [ 0]      6     3223     537  mtx_lock_utils.c
 1: [ 0]      0        0       0  mtx_lock_ulaw.c
 2: [ 0]      0        0       0  mtx_lock_udptl.c
 3: [ 0]      0        0       0  mtx_lock_translate.c
 4: [ 0]      0        0       0  mtx_lock_term.c
 5: [ 0]      0        0       0  mtx_lock_tdd.c
 6: [ 0]      0        0       0  mtx_lock_tcptls.c
 7: [ 0]     20     8095     404  mtx_lock_taskprocessor.c
 8: [ 0]   2060   990368     480  mtx_lock_stdtime/localtime.c
 9: [ 0]      0        0       0  mtx_lock_srv.c
10: [ 0]      0        0       0  mtx_lock_slinfactory.c
11: [ 0]    173    99930     577  mtx_lock_sched.c
12: [ 0]      0        0       0  mtx_lock_say.c
13: [ 0]      0        0       0  mtx_lock_rtp.c
14: [ 0]      0        0       0  mtx_lock_privacy.c
15: [ 0]      0        0       0  mtx_lock_plc.c
16: [ 0]      0        0       0  mtx_lock_pbx.c
17: [ 0]      3     1774     591  mtx_lock_netsock.c
18: [ 0]      0        0       0  mtx_lock_md5.c
19: [ 0]      7     3875     553  mtx_lock_manager.c
20: [ 0]   7263  2831240     389  mtx_lock_logger.c
21: [ 0]    721   320148     444  mtx_lock_loader.c
22: [ 0]      0        0       0  mtx_lock_jitterbuf.c
...

Notice in the above, the one-counter-per-module.
As an example, see the first entry, for utils.c,
which reports 6 lock events with a total wait
time of 3223 clicks.

In the new output (see further above), each of
those 6 locks is now accounted for separately:
  2: [ 0]     1      3811   3811  utils.c__test_for_thread_safety__265
  3: [ 0]     1       168    168  utils.c__test_for_thread_safety__266
  4: [ 0]     1       200    200  utils.c__test_for_thread_safety__268
  5: [ 0]     1    922719 922719  utils.c__test_thread_body__242
  6: [ 0]     1       462    462  utils.c__test_thread_body__246
  7: [ 0]     1     12201  12201  utils.c__test_thread_body__250
with the 'real' lock times. (Hmmm, might (maybe) be interesting
to know why the lock at line 242 is held so long!)



What's the status?

It looks like things compile with and without MTX_PROFILE
defined... so it's pretty close to done... 

Enjoy!

murf

-- 
Steve Murphy
Software Developer
Digium
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3227 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20080612/919c9fee/attachment.bin 


More information about the asterisk-dev mailing list