[Asterisk-code-review] sched: Don't allow ast sched del to deadlock ast sched runq ... (asterisk[13])

Walter Doekes asteriskteam at digium.com
Tue Feb 12 10:47:46 CST 2019


Walter Doekes has uploaded this change for review. ( https://gerrit.asterisk.org/10991


Change subject: sched: Don't allow ast_sched_del to deadlock ast_sched_runq from same thread
......................................................................

sched: Don't allow ast_sched_del to deadlock ast_sched_runq from same thread

When fixing ASTERISK-24212, a change was done so a scheduled callback could not
be removed while it was running. The caller of ast_sched_del would have to wait.

However, when the caller of ast_sched_del is the callback itself (however wrong
this might be), this new check would cause a deadlock: it would wait forever
for itself.

This changeset introduces an additional check: if ast_sched_del is called
by the callback itself, it is immediately rejected (along with an ERROR log and
a backtrace). Additionally, the AST_SCHED_DEL_UNREF macro is adjusted so the
after-ast_sched_del-refcall function is only run if ast_sched_del returned
success.

This should fix the following spurious race condition found in chan_sip:
- thread 1: schedule sip_poke_peer_now (using AST_SCHED_REPLACE)
- thread 2: run sip_poke_peer_now
- thread 2: blank out sched-ID (too soon!)
- thread 1: set sched-ID (too late!)
- thread 2: try to delete the currently running sched-ID

After this fix, an ERROR would be logged, but no deadlocks (in do_monitor) nor
excess calls to sip_unref_peer(peer) (causing double frees of rtp_instances and
other madness) should occur.

ASTERISK-28282

Change-Id: Ic26777fa0732725e6ca7010df17af77a012aa856
---
M include/asterisk/sched.h
M main/sched.c
2 files changed, 25 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.asterisk.org:29418/asterisk refs/changes/91/10991/1

diff --git a/include/asterisk/sched.h b/include/asterisk/sched.h
index 804b05c..8e2a990 100644
--- a/include/asterisk/sched.h
+++ b/include/asterisk/sched.h
@@ -71,20 +71,24 @@
 
 /*!
  * \brief schedule task to get deleted and call unref function
+ *
+ * Only calls unref function if the delete succeeded.
+ *
  * \sa AST_SCHED_DEL
  * \since 1.6.1
  */
 #define AST_SCHED_DEL_UNREF(sched, id, refcall)			\
 	do { \
-		int _count = 0; \
-		while (id > -1 && ast_sched_del(sched, id) && ++_count < 10) { \
+		int _count = 0, _id = id; \
+		while (_id > -1 && ast_sched_del(sched, _id) && ++_count < 10) { \
 			usleep(1); \
 		} \
-		if (_count == 10) \
-			ast_log(LOG_WARNING, "Unable to cancel schedule ID %d.  This is probably a bug (%s: %s, line %d).\n", id, __FILE__, __PRETTY_FUNCTION__, __LINE__); \
-		if (id > -1) \
+		if (_count == 10) { \
+			ast_log(LOG_WARNING, "Unable to cancel schedule ID %d.  This is probably a bug (%s: %s, line %d).\n", _id, __FILE__, __PRETTY_FUNCTION__, __LINE__); \
+		} else if (_id > -1) { \
 			refcall; \
-		id = -1; \
+			id = -1; \
+		} \
 	} while (0);
 
 /*!
diff --git a/main/sched.c b/main/sched.c
index e5a6e52..a092bf4 100644
--- a/main/sched.c
+++ b/main/sched.c
@@ -118,6 +118,7 @@
 	struct sched_thread *sched_thread;
 	/*! The scheduled task that is currently executing */
 	struct sched *currently_executing;
+	pthread_t currently_executing_on_thread_id;
 
 #ifdef SCHED_MAX_CACHE
 	AST_LIST_HEAD_NOLOCK(, sched) schedc;   /*!< Cache of unused schedule structures and how many */
@@ -626,6 +627,18 @@
 			ast_log(LOG_WARNING,"sched entry %d not in the sched heap?\n", s->sched_id->id);
 		}
 		sched_release(con, s);
+	} else if (con->currently_executing_on_thread_id == pthread_self()) {
+		/* We might trample on deleted memory at this point. Not good,
+		 * but it's better than a deadlock.
+		 * Thou shalt not reschedule things from a scheduled callback!
+		 */
+		ast_log(LOG_ERROR,
+			"BUG! Trying to delete sched %d from the same callback %p (sched %d). "
+			"Ignoring so we don't deadlock\n",
+			id, con->currently_executing->callback, con->currently_executing->sched_id->id);
+		ast_log_backtrace();
+		/* We'll return -1 below, because s is NULL. The caller
+		 * will rightly assume that the unscheduling failed. */
 	} else if (con->currently_executing && (id == con->currently_executing->sched_id->id)) {
 		s = con->currently_executing;
 		s->deleted = 1;
@@ -775,10 +788,12 @@
 		 */
 
 		con->currently_executing = current;
+		con->currently_executing_on_thread_id = pthread_self();
 		ast_mutex_unlock(&con->lock);
 		res = current->callback(current->data);
 		ast_mutex_lock(&con->lock);
 		con->currently_executing = NULL;
+		con->currently_executing_on_thread_id = 0;
 		ast_cond_signal(&current->cond);
 
 		if (res && !current->deleted) {

-- 
To view, visit https://gerrit.asterisk.org/10991
To unsubscribe, or for help writing mail filters, visit https://gerrit.asterisk.org/settings

Gerrit-Project: asterisk
Gerrit-Branch: 13
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic26777fa0732725e6ca7010df17af77a012aa856
Gerrit-Change-Number: 10991
Gerrit-PatchSet: 1
Gerrit-Owner: Walter Doekes <walter+asterisk at wjd.nu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-code-review/attachments/20190212/7bfbd30f/attachment-0001.html>


More information about the asterisk-code-review mailing list