[Asterisk-code-review] sched: Don't allow ast_sched_del to deadlock ast_sched_runq from same... (...asterisk[master])

George Joseph asteriskteam at digium.com
Fri Jul 19 08:46:21 CDT 2019


George Joseph has submitted this change and it was merged. ( https://gerrit.asterisk.org/c/asterisk/+/11578 )

Change subject: sched: Don't allow ast_sched_del to deadlock ast_sched_runq from same thread
......................................................................

sched: Don't allow ast_sched_del to deadlock ast_sched_runq from same thread

When fixing ASTERISK~24212, a change was done so a scheduled callback could not
be removed while it was running. The caller of ast_sched_del would have to wait.

However, when the caller of ast_sched_del is the callback itself (however wrong
this might be), this new check would cause a deadlock: it would wait forever
for itself.

This changeset introduces an additional check: if ast_sched_del is called
by the callback itself, it is immediately rejected (along with an ERROR log and
a backtrace). Additionally, the AST_SCHED_DEL_UNREF macro is adjusted so the
after-ast_sched_del-refcall function is only run if ast_sched_del returned
success.

This should fix the following spurious race condition found in chan_sip:
- thread 1: schedule sip_poke_peer_now (using AST_SCHED_REPLACE)
- thread 2: run sip_poke_peer_now
- thread 2: blank out sched-ID (too soon!)
- thread 1: set sched-ID (too late!)
- thread 2: try to delete the currently running sched-ID

After this fix, an ERROR would be logged, but no deadlocks (in do_monitor) nor
excess calls to sip_unref_peer(peer) (causing double frees of rtp_instances and
other madness) should occur.

(Thanks Richard Mudgett for reviewing/improving this "scary" change.)

Note that this change does not fix the observed race condition: unlocked
access to peer->pokeexpire (and potentially other scheduled items in chan_sip),
causing AST_SCHED_DEL_UNREF to look at a changing id. But it will make the
deadlock go away. And in the observed case, it will not have adverse affects
(like memory leaks) because the scheduled item is removed through a different
path.

ASTERISK-28282

Change-Id: Ic26777fa0732725e6ca7010df17af77a012aa856
---
M include/asterisk/sched.h
M main/sched.c
2 files changed, 32 insertions(+), 14 deletions(-)

Approvals:
  Richard Mudgett: Looks good to me, but someone else must approve
  George Joseph: Looks good to me, approved; Approved for Submit



diff --git a/include/asterisk/sched.h b/include/asterisk/sched.h
index 804b05c..7ea6709 100644
--- a/include/asterisk/sched.h
+++ b/include/asterisk/sched.h
@@ -71,20 +71,24 @@
 
 /*!
  * \brief schedule task to get deleted and call unref function
+ *
+ * Only calls unref function if the delete succeeded.
+ *
  * \sa AST_SCHED_DEL
  * \since 1.6.1
  */
 #define AST_SCHED_DEL_UNREF(sched, id, refcall)			\
 	do { \
-		int _count = 0; \
-		while (id > -1 && ast_sched_del(sched, id) && ++_count < 10) { \
+		int _count = 0, _id; \
+		while ((_id = id) > -1 && ast_sched_del(sched, _id) && ++_count < 10) { \
 			usleep(1); \
 		} \
-		if (_count == 10) \
-			ast_log(LOG_WARNING, "Unable to cancel schedule ID %d.  This is probably a bug (%s: %s, line %d).\n", id, __FILE__, __PRETTY_FUNCTION__, __LINE__); \
-		if (id > -1) \
+		if (_count == 10) { \
+			ast_log(LOG_WARNING, "Unable to cancel schedule ID %d.  This is probably a bug (%s: %s, line %d).\n", _id, __FILE__, __PRETTY_FUNCTION__, __LINE__); \
+		} else if (_id > -1) { \
 			refcall; \
-		id = -1; \
+			id = -1; \
+		} \
 	} while (0);
 
 /*!
diff --git a/main/sched.c b/main/sched.c
index d141e70..e3a7d30 100644
--- a/main/sched.c
+++ b/main/sched.c
@@ -116,6 +116,8 @@
 	struct sched_thread *sched_thread;
 	/*! The scheduled task that is currently executing */
 	struct sched *currently_executing;
+	/*! Valid while currently_executing is not NULL */
+	pthread_t executing_thread_id;
 
 #ifdef SCHED_MAX_CACHE
 	AST_LIST_HEAD_NOLOCK(, sched) schedc;   /*!< Cache of unused schedule structures and how many */
@@ -625,15 +627,26 @@
 		}
 		sched_release(con, s);
 	} else if (con->currently_executing && (id == con->currently_executing->sched_id->id)) {
-		s = con->currently_executing;
-		s->deleted = 1;
-		/* Wait for executing task to complete so that caller of ast_sched_del() does not
-		 * free memory out from under the task.
-		 */
-		while (con->currently_executing && (id == con->currently_executing->sched_id->id)) {
-			ast_cond_wait(&s->cond, &con->lock);
+		if (con->executing_thread_id == pthread_self()) {
+			/* The scheduled callback is trying to delete itself.
+			 * Not good as that is a deadlock. */
+			ast_log(LOG_ERROR,
+				"BUG! Trying to delete sched %d from within the callback %p.  "
+				"Ignoring so we don't deadlock\n",
+				id, con->currently_executing->callback);
+			ast_log_backtrace();
+			/* We'll return -1 below because s is NULL.
+			 * The caller will rightly assume that the unscheduling failed. */
+		} else {
+			s = con->currently_executing;
+			s->deleted = 1;
+			/* Wait for executing task to complete so that the caller of
+			 * ast_sched_del() does not free memory out from under the task. */
+			while (con->currently_executing && (id == con->currently_executing->sched_id->id)) {
+				ast_cond_wait(&s->cond, &con->lock);
+			}
+			/* Do not sched_release() here because ast_sched_runq() will do it */
 		}
-		/* Do not sched_release() here because ast_sched_runq() will do it */
 	}
 
 #ifdef DUMP_SCHEDULER
@@ -773,6 +786,7 @@
 		 */
 
 		con->currently_executing = current;
+		con->executing_thread_id = pthread_self();
 		ast_mutex_unlock(&con->lock);
 		res = current->callback(current->data);
 		ast_mutex_lock(&con->lock);

-- 
To view, visit https://gerrit.asterisk.org/c/asterisk/+/11578
To unsubscribe, or for help writing mail filters, visit https://gerrit.asterisk.org/settings

Gerrit-Project: asterisk
Gerrit-Branch: master
Gerrit-Change-Id: Ic26777fa0732725e6ca7010df17af77a012aa856
Gerrit-Change-Number: 11578
Gerrit-PatchSet: 2
Gerrit-Owner: Walter Doekes <walter+asterisk at wjd.nu>
Gerrit-Reviewer: Friendly Automation
Gerrit-Reviewer: George Joseph <gjoseph at digium.com>
Gerrit-Reviewer: Richard Mudgett <rmudgett at digium.com>
Gerrit-MessageType: merged
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-code-review/attachments/20190719/30021c6f/attachment-0001.html>


More information about the asterisk-code-review mailing list