[svn-commits] murf: branch murf/bug11210 r103695 - in /team/murf/bug11210: channels/ includ...

SVN commits to the Digium repositories svn-commits at lists.digium.com
Thu Feb 14 17:13:04 CST 2008


Author: murf
Date: Thu Feb 14 17:13:03 2008
New Revision: 103695

URL: http://svn.digium.com/view/asterisk?view=rev&rev=103695
Log:
Added some documentation to astobj2.h; spruced up things.

Modified:
    team/murf/bug11210/channels/chan_sip.c
    team/murf/bug11210/include/asterisk/astobj2.h
    team/murf/bug11210/utils/refcounter.c

Modified: team/murf/bug11210/channels/chan_sip.c
URL: http://svn.digium.com/view/asterisk/team/murf/bug11210/channels/chan_sip.c?view=diff&rev=103695&r1=103694&r2=103695
==============================================================================
--- team/murf/bug11210/channels/chan_sip.c (original)
+++ team/murf/bug11210/channels/chan_sip.c Thu Feb 14 17:13:03 2008
@@ -164,7 +164,18 @@
 #include "asterisk/utils.h"
 #include "asterisk/file.h"
 #include "asterisk/astobj.h"
-#define  REF_DEBUG 1
+/* 
+   Uncomment the define below,  if you are having refcount related memory leaks.
+   With this uncommented, this module will generate a file, /tmp/refs, which contains
+   a history of the ao2_ref() calls. To be useful, all calls to ao2_* functions should
+   be modified to ao2_t_* calls, and include a tag describing what is happening with 
+   enough detail, to make pairing up a reference count increment with its corresponding decrement.
+   The refcounter program in utils/ can be invaluable in highlighting objects that are not
+   balanced, along with the complete history for that object.
+   In normal operation, the macros defined will throw away the tags, so they do not 
+   affect the speed of the program at all. They can be considered to be documentation.
+*/
+/* #define  REF_DEBUG 1 */
 #include "asterisk/astobj2.h"
 #include "asterisk/dnsmgr.h"
 #include "asterisk/devicestate.h"

Modified: team/murf/bug11210/include/asterisk/astobj2.h
URL: http://svn.digium.com/view/asterisk/team/murf/bug11210/include/asterisk/astobj2.h?view=diff&rev=103695&r1=103694&r2=103695
==============================================================================
--- team/murf/bug11210/include/asterisk/astobj2.h (original)
+++ team/murf/bug11210/include/asterisk/astobj2.h Thu Feb 14 17:13:03 2008
@@ -133,6 +133,245 @@
 - \ref astobj2.h All documentation for functions and data structures
 
  */
+
+/*
+\note DEBUGGING REF COUNTS BIBLE:
+An interface to help debug refcounting is provided 
+in this package. It is dependent on the REF_DEBUG macro being
+defined in a source file, before the #include of astobj2.h, 
+and in using variants of the normal ao2_xxxx functions
+that are named ao2_t_xxxx instead, with an extra argument, a string,
+that will be printed out into /tmp/refs when the refcount for an
+object is changed.
+
+  these ao2_t_xxxx variants are provided:
+
+ao2_t_alloc(arg1, arg2, arg3)
+ao2_t_ref(arg1,arg2,arg3)
+ao2_t_container_alloc(arg1,arg2,arg3,arg4)
+ao2_t_link(arg1, arg2, arg3)
+ao2_t_unlink(arg1, arg2, arg3)
+ao2_t_callback(arg1,arg2,arg3,arg4,arg5) 
+ao2_t_find(arg1,arg2,arg3,arg4)
+ao2_t_iterator_next(arg1, arg2)
+
+If you study each argument list, you will see that these functions all have
+one extra argument that their ao2_xxx counterpart. The last argument in
+each case is supposed to be a string pointer, a "tag", that should contain
+enough of an expanation, that you pair operations that increment the 
+ref count, with operations that are meant to decrement the refcount.
+
+Each of these calls will generate at least one line of output in /tmp/refs.
+These lines look like this:
+...
+0x8756f00 =1   chan_sip.c:22240:load_module (allocate users)
+0x86e3408 =1   chan_sip.c:22241:load_module (allocate peers)
+0x86dd380 =1   chan_sip.c:22242:load_module (allocate peers_by_ip)
+0x822d020 =1   chan_sip.c:22243:load_module (allocate dialogs)
+0x8930fd8 =1   chan_sip.c:20025:build_peer (allocate a peer struct)
+0x8930fd8 +1   chan_sip.c:21467:reload_config (link peer into peer table) [@1]
+0x8930fd8 -1   chan_sip.c:2370:unref_peer (unref_peer: from reload_config) [@2]
+0x89318b0 =1   chan_sip.c:20025:build_peer (allocate a peer struct)
+0x89318b0 +1   chan_sip.c:21467:reload_config (link peer into peer table) [@1]
+0x89318b0 -1   chan_sip.c:2370:unref_peer (unref_peer: from reload_config) [@2]
+0x8930218 =1   chan_sip.c:20025:build_peer (allocate a peer struct)
+0x8930218 +1   chan_sip.c:21539:reload_config (link peer into peers table) [@1]
+0x868c040 -1   chan_sip.c:2424:dialog_unlink_all (unset the relatedpeer->call field in tandem with relatedpeer field itself) [@2]
+0x868c040 -1   chan_sip.c:2443:dialog_unlink_all (Let's unbump the count in the unlink so the poor pvt can disappear if it is time) [@1]
+0x868c040 **call destructor** chan_sip.c:2443:dialog_unlink_all (Let's unbump the count in the unlink so the poor pvt can disappear if it is time)
+0x8cc07e8 -1   chan_sip.c:2370:unref_peer (unsetting a dialog relatedpeer field in sip_destroy) [@3]
+0x8cc07e8 +1   chan_sip.c:3876:find_peer (ao2_find in peers table) [@2]
+0x8cc07e8 -1   chan_sip.c:2370:unref_peer (unref_peer, from sip_devicestate, release ref from find_peer) [@3]
+...
+
+The first column is the object address. 
+The second column reflects how the operation affected the ref count 
+    for that object. Creation sets the ref count to 1 (=1). 
+    increment or decrement and amount are specified (-1/+1).
+The remainder of the line specifies where in the file the call was made,
+    and the function name, and the tag supplied in the function call.
+
+The **call destructor** is specified when the the destroy routine is
+run for an object. It does not affect the ref count, but is important
+in debugging, because it is possible to have the astobj2 system run it
+multiple times on the same object, commonly fatal to asterisk.
+
+Sometimes you have some helper functions to do object ref/unref
+operations. Using these normally hides the place where these
+functions were called. To get the location where these functions
+were called to appear in /tmp/refs, you can do this sort of thing:
+
+#ifdef REF_DEBUG
+#define dialog_ref(arg1,arg2) dialog_ref_debug((arg1),(arg2), __FILE__, __LINE__, __PRETTY_FUNCTION__)
+#define dialog_unref(arg1,arg2) dialog_unref_debug((arg1),(arg2), __FILE__, __LINE__, __PRETTY_FUNCTION__)
+static struct sip_pvt *dialog_ref_debug(struct sip_pvt *p, char *tag, char *file, int line, const char *func)
+{
+	if (p)
+		ao2_ref_debug(p, 1, tag, file, line, func);
+	else
+		ast_log(LOG_ERROR, "Attempt to Ref a null pointer\n");
+	return p;
+}
+
+static struct sip_pvt *dialog_unref_debug(struct sip_pvt *p, char *tag, char *file, int line, const char *func)
+{
+	if (p)
+		ao2_ref_debug(p, -1, tag, file, line, func);
+	return NULL;
+}
+#else
+static struct sip_pvt *dialog_ref(struct sip_pvt *p, char *tag)
+{
+	if (p)
+		ao2_ref(p, 1);
+	else
+		ast_log(LOG_ERROR, "Attempt to Ref a null pointer\n");
+	return p;
+}
+
+static struct sip_pvt *dialog_unref(struct sip_pvt *p, char *tag)
+{
+	if (p)
+		ao2_ref(p, -1);
+	return NULL;
+}
+#endif
+
+In the above code, note that the "normal" helper funcs call ao2_ref() as
+normal, and the "helper" functions call ao2_ref_debug directly with the 
+file, function, and line number info provided. You might find this
+well worth the effort to help track these function calls in the code.
+
+To find out why objects are not destroyed (a common bug), you can 
+edit the source file to use the ao2_t_* variants, add the #define REF_DEBUG 1 
+before the #include "asterisk/astobj2.h" line, and add a descriptive 
+tag to each call. Recompile, and run Asterisk, exit asterisk with
+"stop gracefully", which should result in every object being destroyed.
+Then, you can "sort -k 1 /tmp/refs > x1" to get a sorted list of
+all the objects, or you can use "util/refcounter" to scan the file
+for you and output any problems it finds.
+
+The above may seem astronomically more work than it is worth to debug
+reference counts, which may be true in "simple" situations, but for 
+more complex situations, it is easily worth 100 times this effort to
+help find problems.
+
+To debug, pair all calls so that each call that increments the 
+refcount is paired with a corresponding call that decrements the
+count for the same reason. Hopefully, you will be left with one 
+or more unpaired calls. This is where you start your search!
+
+For instance, here is an example of this for a dialog object in 
+chan_sip, that was not getting destroyed, after I moved the lines around
+to pair operations:
+
+   0x83787a0 =1   chan_sip.c:5733:sip_alloc (allocate a dialog(pvt) struct)
+   0x83787a0 -1   chan_sip.c:19173:sip_poke_peer (unref dialog at end of sip_poke_peer, obtained from sip_alloc, just before it goes out of scope) [@4]
+
+   0x83787a0 +1   chan_sip.c:5854:sip_alloc (link pvt into dialogs table) [@1]
+   0x83787a0 -1   chan_sip.c:19150:sip_poke_peer (About to change the callid -- remove the old name) [@3]
+   0x83787a0 +1   chan_sip.c:19152:sip_poke_peer (Linking in under new name) [@2]
+   0x83787a0 -1   chan_sip.c:2399:dialog_unlink_all (unlinking dialog via ao2_unlink) [@5]
+
+   0x83787a0 +1   chan_sip.c:19130:sip_poke_peer (copy sip alloc from p to peer->call) [@2]
+
+
+   0x83787a0 +1   chan_sip.c:2996:__sip_reliable_xmit (__sip_reliable_xmit: setting pkt->owner) [@3]
+   0x83787a0 -1   chan_sip.c:2425:dialog_unlink_all (remove all current packets in this dialog, and the pointer to the dialog too as part of __sip_destroy) [@4]
+
+   0x83787a0 +1   chan_sip.c:22356:unload_module (iterate thru dialogs) [@4]
+   0x83787a0 -1   chan_sip.c:22359:unload_module (toss dialog ptr from iterator_next) [@5]
+
+
+   0x83787a0 +1   chan_sip.c:22373:unload_module (iterate thru dialogs) [@3]
+   0x83787a0 -1   chan_sip.c:22375:unload_module (throw away iterator result) [@2]
+
+   0x83787a0 +1   chan_sip.c:2397:dialog_unlink_all (Let's bump the count in the unlink so it doesn't accidentally become dead before we are done) [@4]
+   0x83787a0 -1   chan_sip.c:2436:dialog_unlink_all (Let's unbump the count in the unlink so the poor pvt can disappear if it is time) [@3]
+
+As you can see, only one unbalanced operation is in the list, a ref count increment when
+the peer->call was set, but no corresponding decrement was made...
+
+Hopefully this helps you narrow your search and find those bugs.
+
+THE ART OF REFERENCE COUNTING
+(by Steve Murphy)
+SOME TIPS for complicated code, and ref counting:
+
+1. Theoretically, passing a refcounted object pointer into a function 
+call is an act of copying the reference, and could be refcounted.
+But, upon examination, this sort of refcounting will explode the amount
+of code you have to enter, and for no tangible benefit, beyond
+creating more possible failure points/bugs. It will even 
+complicate your code and make debugging harder, slow down your program
+doing useless increments and decrements of the ref counts.
+
+2. It is better to track places where a ref counted pointer
+is copied into a structure or stored. Make sure to decrement the refcount
+of any previous pointer that might have been there, if setting
+this field might erase a previous pointer. ao2_find and iterate_next
+internally increment the ref count when they return a pointer, so 
+you need to decrement the count before the pointer goes out of scope.
+
+3. Any time you decrement a ref count, it may be possible that the
+object will be destroyed (freed) immediately by that call. If you
+are destroying a series of fields in a refcounted object, and 
+any of the unref calls might possibly result in immediate destruction,
+you can first increment the count to prevent such behavior, then 
+after the last test, decrement the pointer to allow the object
+to be destroyed, if the refcount would be zero.
+
+Example:
+
+	dialog_ref(dialog, "Let's bump the count in the unlink so it doesn't accidentally become dead before we are done");
+
+	ao2_t_unlink(dialogs, dialog, "unlinking dialog via ao2_unlink");
+
+	*//* Unlink us from the owner (channel) if we have one *//*
+	if (dialog->owner) {
+		if (lockowner)
+			ast_channel_lock(dialog->owner);
+		ast_debug(1, "Detaching from channel %s\n", dialog->owner->name);
+		dialog->owner->tech_pvt = dialog_unref(dialog->owner->tech_pvt, "resetting channel dialog ptr in unlink_all");
+		if (lockowner)
+			ast_channel_unlock(dialog->owner);
+	}
+	if (dialog->registry) {
+		if (dialog->registry->call == dialog)
+			dialog->registry->call = dialog_unref(dialog->registry->call, "nulling out the registry's call dialog field in unlink_all");
+		dialog->registry = registry_unref(dialog->registry, "delete dialog->registry");
+	}
+    ...
+ 	dialog_unref(dialog, "Let's unbump the count in the unlink so the poor pvt can disappear if it is time");
+   
+In the above code, the ao2_t_unlink could end up destroying the dialog
+object; if this happens, then the subsequent usages of the dialog
+pointer could result in a core dump. So, we 'bump' the
+count upwards before beginning, and then decrementing the count when
+we are finished. This is analogous to 'locking' or 'protecting' operations
+for a short while.
+
+4. One of the most insidious problems I've run into when converting
+code to do ref counted automatic destruction, is in the destruction
+routines. Where a "destroy" routine had previously been called to
+get rid of an object in non-refcounted code, the new regime demands
+that you tear that "destroy" routine into two pieces, one that will
+tear down the links and 'unref' them, and the other to actually free
+and reset fields. A destroy routine that does any reference deletion
+for its own object, will never be called. Another insidious problem
+occurs in mutually referenced structures. As an example, a dialog contains
+a pointer to a peer, and a peer contains a pointer to a dialog. Watch
+out that the destruction of one doesn't depend on the destruction of the
+other, as in this case a dependency loop will result in neither being
+destroyed!
+
+Given the above, you should be ready to do a good job!
+
+murf
+
+*/
+
+
 
 /*! \brief
  * Typedef for an object destructor. This is called just before freeing

Modified: team/murf/bug11210/utils/refcounter.c
URL: http://svn.digium.com/view/asterisk/team/murf/bug11210/utils/refcounter.c?view=diff&rev=103695&r1=103694&r2=103695
==============================================================================
--- team/murf/bug11210/utils/refcounter.c (original)
+++ team/murf/bug11210/utils/refcounter.c Thu Feb 14 17:13:03 2008
@@ -17,13 +17,24 @@
  */
 /*! \file
  *
- *  \brief A program to thoroughly thrash a hash table, testing
- *         out locking safety, and making sure all functionality
- *         is functioning. Run with 5 or more threads to get that
- *         fully intense firestorm of activity. If your
- *         hash tables don't crash, lock up, or go weird, it must
- *         be good code! Even features some global counters
- *         that will get slightly behind because they aren't lock-protected.
+ *  \brief A program to read in the /tmp/refs file generated
+ *         by astobj2 code when the REF_DEBUG macro is defined.
+ *         It will read in the file line by line, and
+ *         sort the data out by object, and check to see
+ *         if the refcounts balance to zero, and the object
+ *         was destroyed just once. Any problems that are 
+ *         found are reported to stdout and the objects
+ *         ref count history is printed out. If all is well,
+ *         this program reads in the /tmp/refs file and 
+ *         generates no output. No news is good news.
+ *  The contents of the /tmp/refs file looks like this:
+ *
+0x84fd718 -1   astobj2.c:926:cd_cb_debug (deref object via container destroy) [@1]
+0x84fd718 =1   chan_sip.c:19760:build_user (allocate a user struct)
+0x84fd718 +1   chan_sip.c:21558:reload_config (link user into users table) [@1]
+0x84fd718 -1   chan_sip.c:2376:unref_user (Unref the result of build_user. Now, the table link is the only one left.) [@2]
+0x84fd718 **call destructor** astobj2.c:926:cd_cb_debug (deref object via container destroy)
+ *
  *
  *  \author Steve Murphy <murf at digium.com>
  */




More information about the svn-commits mailing list