[asterisk-commits] dlee: trunk r396842 - in /trunk: include/asterisk/ main/ tests/

SVN commits to the Asterisk project asterisk-commits at lists.digium.com
Fri Aug 16 11:03:42 CDT 2013


Author: dlee
Date: Fri Aug 16 11:03:34 2013
New Revision: 396842

URL: http://svnview.digium.com/svn/asterisk?view=rev&rev=396842
Log:
Stasis: address refcount races; implementation comments

Change r395954 reordered some stasis object destruction, which should
have been fine. Unfortunately, it caused some hard to reproduce issues
related to objects being accessed after they had been destroyed. The
patch in r396329 fixed the destruction order problem; this patch
addresses the underlying issue. A few other stasis-related fixes were
also added.

 * Add ref-bumps around areas where objects may get transitively
   destroyed. (For example, where we lock a topic, unref a subscription,
   which unrefs the topic, which explodes the topic when we try to
   unlock it.)

 * Wrote an extensive doxygen page about Stasis implementation,
   relationships between objects, lifecycles of objects, how the
   refcounting works, etc. Many other comments were added, corrected, or
   cleaned up.

 * Added an assert to the topic dtor to catch extra ref decrements.

 * Fixed type used after destruction errors for graceful shutdown in
   stasis_channels.c.

 * I added two unit tests in an attempt to catch destruction order
   issues. Since the underlying cause is a race condition, though, the
   tests rarely failed even when the code was wrong.

 * Fixed a leak in stasis_cache_pattern.c.

(closes issue ASTERISK-22243)
Review: https://reviewboard.asterisk.org/r/2746/

Modified:
    trunk/include/asterisk/astobj2.h
    trunk/main/stasis.c
    trunk/main/stasis_cache.c
    trunk/main/stasis_cache_pattern.c
    trunk/main/stasis_channels.c
    trunk/tests/test_stasis.c

Modified: trunk/include/asterisk/astobj2.h
URL: http://svnview.digium.com/svn/asterisk/trunk/include/asterisk/astobj2.h?view=diff&rev=396842&r1=396841&r2=396842
==============================================================================
--- trunk/include/asterisk/astobj2.h (original)
+++ trunk/include/asterisk/astobj2.h Fri Aug 16 11:03:34 2013
@@ -509,6 +509,25 @@
 #define ao2_ref(o,delta)       __ao2_ref((o), (delta))
 
 #endif
+
+/*!
+ * \since 12
+ * \brief Bump refcount on an AO2 object by one, returning the object.
+ *
+ * This is useful for inlining a ref bump, and you don't care about the ref
+ * count. Also \c NULL safe, for even more convenience.
+ *
+ * \param obj AO2 object to bump the refcount on.
+ * \retval The given \a obj pointer.
+ */
+#define ao2_bump(obj)						\
+	({							\
+		typeof(obj) __obj_ ## __LINE__ = (obj);		\
+		if (__obj_ ## __LINE__) {			\
+			ao2_ref(__obj_ ## __LINE__, +1);	\
+		}						\
+		__obj_ ## __LINE__;				\
+	})
 
 int __ao2_ref_debug(void *o, int delta, const char *tag, const char *file, int line, const char *func);
 int __ao2_ref(void *o, int delta);

Modified: trunk/main/stasis.c
URL: http://svnview.digium.com/svn/asterisk/trunk/main/stasis.c?view=diff&rev=396842&r1=396841&r2=396842
==============================================================================
--- trunk/main/stasis.c (original)
+++ trunk/main/stasis.c Fri Aug 16 11:03:34 2013
@@ -39,6 +39,95 @@
 #include "asterisk/utils.h"
 #include "asterisk/uuid.h"
 
+/*!
+ * \page stasis-impl Stasis Implementation Notes
+ *
+ * \par Reference counting
+ *
+ * Stasis introduces a number of objects, which are tightly related to one
+ * another. Because we rely on ref-counting for memory management, understanding
+ * these relationships is important to understanding this code.
+ *
+ * \code{.txt}
+ *
+ *   stasis_topic <----> stasis_subscription
+ *             ^          ^
+ *              \        /
+ *               \      /
+ *               dispatch
+ *                  |
+ *                  |
+ *                  v
+ *            stasis_message
+ *                  |
+ *                  |
+ *                  v
+ *          stasis_message_type
+ *
+ * \endcode
+ *
+ * The most troubling thing in this chart is the cyclic reference between
+ * stasis_topic and stasis_subscription. This is both unfortunate, and
+ * necessary. Topics need the subscription in order to dispatch messages;
+ * subscriptions need the topics to unsubscribe and check subscription status.
+ *
+ * The cycle is broken by stasis_unsubscribe(). The unsubscribe will remove the
+ * topic's reference to a subscription. When the subcription is destroyed, it
+ * will remove its reference to the topic.
+ *
+ * This means that until a subscription has be explicitly unsubscribed, it will
+ * not be destroyed. Neither will a topic be destroyed while it has subscribers.
+ * The destructors of both have assertions regarding this to catch ref-counting
+ * problems where a subscription or topic has had an extra ao2_cleanup().
+ *
+ * The \ref dispatch object is a transient object, which is posted to a
+ * subscription's taskprocessor to send a message to the subscriber. They have
+ * short life cycles, allocated on one thread, destroyed on another.
+ *
+ * During shutdown, or the deletion of a domain object, there are a flurry of
+ * ao2_cleanup()s on subscriptions and topics, as the final in-flight messages
+ * are processed. Any one of these cleanups could be the one to actually destroy
+ * a given object, so care must be taken to ensure that an object isn't
+ * referenced after an ao2_cleanup(). This includes the implicit ao2_unlock()
+ * that might happen when a RAII_VAR() goes out of scope.
+ *
+ * \par Typical life cycles
+ *
+ *  \li stasis_topic - There are several topics which live for the duration of
+ *      the Asterisk process (ast_channel_topic_all(), etc.) but most of these
+ *      are actually fed by shorter-lived topics whose lifetime is associated
+ *      with some domain object (like ast_channel_topic() for a given
+ *      ast_channel).
+ *
+ *  \li stasis_subscription - Subscriptions have a similar mix of lifetimes as
+ *      topics, for similar reasons.
+ *
+ *  \li dispatch - Very short lived; just long enough to post a message to a
+ *      subscriber.
+ *
+ *  \li stasis_message - Short to intermediate lifetimes, but that is mostly
+ *      irrelevant. Messages are strictly data and have no behavior associated
+ *      with them, so it doesn't really matter if/when they are destroyed. By
+ *      design, a component could hold a ref to a message forever without any
+ *      ill consequences (aside from consuming more memory).
+ *
+ *  \li stasis_message_type - Long life cycles, typically only destroyed on
+ *      module unloading or _clean_ process exit.
+ *
+ * \par Subscriber shutdown sequencing
+ *
+ * Subscribers are sensitive to shutdown sequencing, specifically in how the
+ * reference message types. This is fully detailed on the wiki at
+ * https://wiki.asterisk.org/wiki/x/K4BqAQ.
+ *
+ * In short, the lifetime of the \a data (and \a callback, if in a module) must
+ * be held until the stasis_subscription_final_message() has been received.
+ * Depending on the structure of the subscriber code, this can be handled by
+ * using stasis_subscription_final_message() to free resources on the final
+ * message, or using stasis_subscription_join()/stasis_unsubscribe_and_join() to
+ * block until the unsubscribe has completed.
+ */
+
 /*! Initial size of the subscribers list. */
 #define INITIAL_SUBSCRIBERS_MAX 4
 
@@ -53,7 +142,7 @@
 /*! \internal */
 struct stasis_topic {
 	char *name;
-	/*! Variable length array of the subscribers (raw pointer to avoid cyclic references) */
+	/*! Variable length array of the subscribers */
 	struct stasis_subscription **subscribers;
 	/*! Allocated length of the subscribers array */
 	size_t num_subscribers_max;
@@ -67,6 +156,10 @@
 static void topic_dtor(void *obj)
 {
 	struct stasis_topic *topic = obj;
+
+	/* Subscribers hold a reference to topics, so they should all be
+	 * unsubscribed before we get here. */
+	ast_assert(topic->num_subscribers_current == 0);
 	ast_free(topic->name);
 	topic->name = NULL;
 	ast_free(topic->subscribers);
@@ -116,7 +209,7 @@
 	/*! Data pointer to be handed to the callback. */
 	void *data;
 
-	/*! Lock for joining with subscription. */
+	/*! Lock for completion flags \c final_message_{rxed,processed}. */
 	ast_mutex_t join_lock;
 	/*! Condition for joining with subscription. */
 	ast_cond_t join_cond;
@@ -131,8 +224,14 @@
 static void subscription_dtor(void *obj)
 {
 	struct stasis_subscription *sub = obj;
+
+	/* Subscriptions need to be manually unsubscribed before destruction
+	 * b/c there's a cyclic reference between topics and subscriptions */
 	ast_assert(!stasis_subscription_is_subscribed(sub));
+	/* If there are any messages in flight to this subscription; that would
+	 * be bad. */
 	ast_assert(stasis_subscription_is_done(sub));
+
 	ao2_cleanup(sub->topic);
 	sub->topic = NULL;
 	ast_taskprocessor_unreference(sub->mailbox);
@@ -221,7 +320,10 @@
 {
 	if (sub) {
 		size_t i;
-		struct stasis_topic *topic = sub->topic;
+		/* The subscription may be the last ref to this topic. Hold
+		 * the topic ref open until after the unlock. */
+		RAII_VAR(struct stasis_topic *, topic, ao2_bump(sub->topic),
+			ao2_cleanup);
 		SCOPED_AO2LOCK(lock_topic, topic);
 
 		for (i = 0; i < topic->num_subscribers_current; ++i) {
@@ -240,11 +342,6 @@
 	return NULL;
 }
 
-/*!
- * \brief Block until the final message has been received on a subscription.
- *
- * \param subscription Subscription to wait on.
- */
 void stasis_subscription_join(struct stasis_subscription *subscription)
 {
 	if (subscription) {
@@ -347,7 +444,12 @@
 		topic->num_subscribers_max *= 2;
 	}
 
-	/* Don't ref sub here or we'll cause a reference cycle. */
+	/* The reference from the topic to the subscription is shared with
+	 * the owner of the subscription, which will explicitly unsubscribe
+	 * to release it.
+	 *
+	 * If we bumped the refcount here, the owner would have to unsubscribe
+	 * and cleanup, which is a bit awkward. */
 	topic->subscribers[topic->num_subscribers_current++] = sub;
 	return 0;
 }
@@ -413,9 +515,13 @@
 	return 0;
 }
 
-void stasis_forward_message(struct stasis_topic *topic, struct stasis_topic *publisher_topic, struct stasis_message *message)
+void stasis_forward_message(struct stasis_topic *_topic, struct stasis_topic *publisher_topic, struct stasis_message *message)
 {
 	size_t i;
+	/* The topic may be unref'ed by the subscription invocation.
+	 * Make sure we hold onto a reference while dispatching. */
+	RAII_VAR(struct stasis_topic *, topic, ao2_bump(_topic),
+		ao2_cleanup);
 	SCOPED_AO2LOCK(lock, topic);
 
 	ast_assert(topic != NULL);
@@ -625,7 +731,7 @@
 	ast_log(LOG_ERROR, "Use of %s() before init/after destruction\n", name);
 }
 
-/*! \brief Cleanup function */
+/*! \brief Shutdown function */
 static void stasis_exit(void)
 {
 	ast_threadpool_shutdown(pool);

Modified: trunk/main/stasis_cache.c
URL: http://svnview.digium.com/svn/asterisk/trunk/main/stasis_cache.c?view=diff&rev=396842&r1=396841&r2=396842
==============================================================================
--- trunk/main/stasis_cache.c (original)
+++ trunk/main/stasis_cache.c Fri Aug 16 11:03:34 2013
@@ -59,8 +59,14 @@
 
 static void stasis_caching_topic_dtor(void *obj) {
 	struct stasis_caching_topic *caching_topic = obj;
+
+	/* Caching topics contain subscriptions, and must be manually
+	 * unsubscribed. */
 	ast_assert(!stasis_subscription_is_subscribed(caching_topic->sub));
+	/* If there are any messages in flight to this subscription; that would
+	 * be bad. */
 	ast_assert(stasis_subscription_is_done(caching_topic->sub));
+
 	ao2_cleanup(caching_topic->sub);
 	caching_topic->sub = NULL;
 	ao2_cleanup(caching_topic->cache);

Modified: trunk/main/stasis_cache_pattern.c
URL: http://svnview.digium.com/svn/asterisk/trunk/main/stasis_cache_pattern.c?view=diff&rev=396842&r1=396841&r2=396842
==============================================================================
--- trunk/main/stasis_cache_pattern.c (original)
+++ trunk/main/stasis_cache_pattern.c Fri Aug 16 11:03:34 2013
@@ -177,6 +177,9 @@
 	stasis_unsubscribe(one->forward_cached_to_all);
 	one->forward_cached_to_all = NULL;
 	stasis_caching_unsubscribe(one->topic_cached);
+	one->topic_cached = NULL;
+
+	ao2_cleanup(one);
 }
 
 struct stasis_topic *stasis_cp_single_topic(struct stasis_cp_single *one)

Modified: trunk/main/stasis_channels.c
URL: http://svnview.digium.com/svn/asterisk/trunk/main/stasis_channels.c?view=diff&rev=396842&r1=396841&r2=396842
==============================================================================
--- trunk/main/stasis_channels.c (original)
+++ trunk/main/stasis_channels.c Fri Aug 16 11:03:34 2013
@@ -912,6 +912,13 @@
 
 static void stasis_channels_cleanup(void)
 {
+	stasis_caching_unsubscribe_and_join(channel_by_name_topic);
+	channel_by_name_topic = NULL;
+	ao2_cleanup(channel_cache_by_name);
+	channel_cache_by_name = NULL;
+	ao2_cleanup(channel_cache_all);
+	channel_cache_all = NULL;
+
 	STASIS_MESSAGE_TYPE_CLEANUP(ast_channel_snapshot_type);
 	STASIS_MESSAGE_TYPE_CLEANUP(ast_channel_dial_type);
 	STASIS_MESSAGE_TYPE_CLEANUP(ast_channel_varset_type);
@@ -931,13 +938,6 @@
 	STASIS_MESSAGE_TYPE_CLEANUP(ast_channel_monitor_stop_type);
 	STASIS_MESSAGE_TYPE_CLEANUP(ast_channel_agent_login_type);
 	STASIS_MESSAGE_TYPE_CLEANUP(ast_channel_agent_logoff_type);
-
-	stasis_caching_unsubscribe_and_join(channel_by_name_topic);
-	channel_by_name_topic = NULL;
-	ao2_cleanup(channel_cache_by_name);
-	channel_cache_by_name = NULL;
-	ao2_cleanup(channel_cache_all);
-	channel_cache_all = NULL;
 }
 
 int ast_stasis_channels_init(void)

Modified: trunk/tests/test_stasis.c
URL: http://svnview.digium.com/svn/asterisk/trunk/tests/test_stasis.c?view=diff&rev=396842&r1=396841&r2=396842
==============================================================================
--- trunk/tests/test_stasis.c (original)
+++ trunk/tests/test_stasis.c Fri Aug 16 11:03:34 2013
@@ -1271,6 +1271,97 @@
 	return AST_TEST_PASS;
 }
 
+static void noop(void *data, struct stasis_subscription *sub,
+	struct stasis_topic *topic, struct stasis_message *message)
+{
+	/* no-op */
+}
+
+AST_TEST_DEFINE(dtor_order)
+{
+	RAII_VAR(struct stasis_topic *, topic, NULL, ao2_cleanup);
+	RAII_VAR(struct stasis_subscription *, sub, NULL, stasis_unsubscribe);
+
+	switch (cmd) {
+	case TEST_INIT:
+		info->name = __func__;
+		info->category = test_category;
+		info->summary = "Test that destruction order doesn't bomb stuff";
+		info->description = "Test that destruction order doesn't bomb stuff";
+		return AST_TEST_NOT_RUN;
+	case TEST_EXECUTE:
+		break;
+	}
+
+	topic = stasis_topic_create("test-topic");
+	ast_test_validate(test, NULL != topic);
+
+	sub = stasis_subscribe(topic, noop, NULL);
+	ast_test_validate(test, NULL != sub);
+
+	/* With any luck, this won't completely blow everything up */
+	ao2_cleanup(topic);
+	stasis_unsubscribe(sub);
+
+	/* These refs were cleaned up manually */
+	topic = NULL;
+	sub = NULL;
+
+	return AST_TEST_PASS;
+}
+
+static const char *noop_get_id(struct stasis_message *message)
+{
+	return NULL;
+}
+
+AST_TEST_DEFINE(caching_dtor_order)
+{
+	RAII_VAR(struct stasis_topic *, topic, NULL, ao2_cleanup);
+	RAII_VAR(struct stasis_cache *, cache, NULL, ao2_cleanup);
+	RAII_VAR(struct stasis_caching_topic *, caching_topic, NULL,
+		stasis_caching_unsubscribe);
+	RAII_VAR(struct stasis_subscription *, sub, NULL, stasis_unsubscribe);
+
+	switch (cmd) {
+	case TEST_INIT:
+		info->name = __func__;
+		info->category = test_category;
+		info->summary = "Test that destruction order doesn't bomb stuff";
+		info->description = "Test that destruction order doesn't bomb stuff";
+		return AST_TEST_NOT_RUN;
+	case TEST_EXECUTE:
+		break;
+	}
+
+	cache = stasis_cache_create(noop_get_id);
+	ast_test_validate(test, NULL != cache);
+
+	topic = stasis_topic_create("test-topic");
+	ast_test_validate(test, NULL != topic);
+
+	caching_topic = stasis_caching_topic_create(topic, cache);
+	ast_test_validate(test, NULL != caching_topic);
+
+	sub = stasis_subscribe(stasis_caching_get_topic(caching_topic), noop,
+		NULL);
+	ast_test_validate(test, NULL != sub);
+
+	/* With any luck, this won't completely blow everything up */
+	ao2_cleanup(cache);
+	ao2_cleanup(topic);
+	stasis_caching_unsubscribe(caching_topic);
+	stasis_unsubscribe(sub);
+
+	/* These refs were cleaned up manually */
+	cache = NULL;
+	topic = NULL;
+	caching_topic = NULL;
+	sub = NULL;
+
+	return AST_TEST_PASS;
+}
+
 static int unload_module(void)
 {
 	AST_TEST_UNREGISTER(message_type);
@@ -1290,6 +1381,8 @@
 	AST_TEST_UNREGISTER(to_json);
 	AST_TEST_UNREGISTER(no_to_ami);
 	AST_TEST_UNREGISTER(to_ami);
+	AST_TEST_UNREGISTER(dtor_order);
+	AST_TEST_UNREGISTER(caching_dtor_order);
 	return 0;
 }
 
@@ -1312,6 +1405,8 @@
 	AST_TEST_REGISTER(to_json);
 	AST_TEST_REGISTER(no_to_ami);
 	AST_TEST_REGISTER(to_ami);
+	AST_TEST_REGISTER(dtor_order);
+	AST_TEST_REGISTER(caching_dtor_order);
 	return AST_MODULE_LOAD_SUCCESS;
 }
 




More information about the asterisk-commits mailing list