[Asterisk-code-review] res sorcery memory cache.c: Fix deadlock with scheduler. (asterisk[certified/13.1])

Richard Mudgett asteriskteam at digium.com
Thu Oct 1 17:32:22 CDT 2015


Richard Mudgett has uploaded a new change for review.

  https://gerrit.asterisk.org/1365

Change subject: res_sorcery_memory_cache.c: Fix deadlock with scheduler.
......................................................................

res_sorcery_memory_cache.c: Fix deadlock with scheduler.

A deadlock can happen when a sorcery object is being expired from the
memory cache when at the same time another object is being placed into the
memory cache.  There are a couple other variations on this theme that
could cause the deadlock.  Basically if an object is being expired from
the sorcery memory cache at the same time as another thread tries to
update the next object expiration timer the deadlock can happen.

* Add a deadlock avoidance loop in expire_objects_from_cache() to check if
someone is trying to remove the scheduler callback from the scheduler.

ASTERISK-25441 #close

Change-Id: Iec7b0bdb81a72b39477727b1535b2539ad0cf4dc
---
M res/res_sorcery_memory_cache.c
1 file changed, 22 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.asterisk.org:29418/asterisk refs/changes/65/1365/1

diff --git a/res/res_sorcery_memory_cache.c b/res/res_sorcery_memory_cache.c
index 2e909dc..f2b14eb 100644
--- a/res/res_sorcery_memory_cache.c
+++ b/res/res_sorcery_memory_cache.c
@@ -122,6 +122,8 @@
 	struct ast_heap *object_heap;
 	/*! \brief Scheduler item for expiring oldest object. */
 	int expire_id;
+	/*! TRUE if trying to stop the oldest object expiration scheduler item. */
+	unsigned int del_expire:1;
 #ifdef TEST_FRAMEWORK
 	/*! \brief Variable used to indicate we should notify a test when we reach empty */
 	unsigned int cache_notify;
@@ -441,7 +443,21 @@
 	struct sorcery_memory_cache *cache = (struct sorcery_memory_cache *)data;
 	struct sorcery_memory_cached_object *cached;
 
-	ao2_wrlock(cache->objects);
+	/*
+	 * We need to do deadlock avoidance between a non-scheduler thread
+	 * blocking when trying to delete the scheduled entry for this
+	 * callback because the scheduler thread is running this callback
+	 * and this callback waiting for the cache->objects container lock
+	 * that the blocked non-scheduler thread already holds.
+	 */
+	while (ao2_trywrlock(cache->objects)) {
+		if (cache->del_expire) {
+			cache->expire_id = -1;
+			ao2_ref(cache, -1);
+			return 0;
+		}
+		sched_yield();
+	}
 
 	cache->expire_id = -1;
 
@@ -486,7 +502,9 @@
 	ao2_callback(cache->objects, OBJ_UNLINK | OBJ_NOLOCK | OBJ_NODATA | OBJ_MULTIPLE,
 		NULL, NULL);
 
+	cache->del_expire = 1;
 	AST_SCHED_DEL_UNREF(sched, cache->expire_id, ao2_ref(cache, -1));
+	cache->del_expire = 0;
 }
 
 /*!
@@ -579,18 +597,9 @@
 		return 0;
 	}
 
-	if (cache->expire_id != -1) {
-		/* If we can't unschedule this expiration then it is currently attempting to run,
-		 * so let it run - it just means that it'll be the one scheduling instead of us.
-		 */
-		if (ast_sched_del(sched, cache->expire_id)) {
-			return 0;
-		}
-
-		/* Since it successfully cancelled we need to drop the ref to the cache it had */
-		ao2_ref(cache, -1);
-		cache->expire_id = -1;
-	}
+	cache->del_expire = 1;
+	AST_SCHED_DEL_UNREF(sched, cache->expire_id, ao2_ref(cache, -1));
+	cache->del_expire = 0;
 
 	cached = ast_heap_peek(cache->object_heap, 1);
 	if (!cached) {

-- 
To view, visit https://gerrit.asterisk.org/1365
To unsubscribe, visit https://gerrit.asterisk.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iec7b0bdb81a72b39477727b1535b2539ad0cf4dc
Gerrit-PatchSet: 1
Gerrit-Project: asterisk
Gerrit-Branch: certified/13.1
Gerrit-Owner: Richard Mudgett <rmudgett at digium.com>



More information about the asterisk-code-review mailing list