<p>George Joseph <strong>merged</strong> this change.</p><p><a href="https://gerrit.asterisk.org/c/asterisk/+/10991">View Change</a></p><div style="white-space:pre-wrap">Approvals:
Richard Mudgett: Looks good to me, but someone else must approve
George Joseph: Looks good to me, approved; Approved for Submit
</div><pre style="font-family: monospace,monospace; white-space: pre-wrap;">sched: Don't allow ast_sched_del to deadlock ast_sched_runq from same thread<br><br>When fixing ASTERISK~24212, a change was done so a scheduled callback could not<br>be removed while it was running. The caller of ast_sched_del would have to wait.<br><br>However, when the caller of ast_sched_del is the callback itself (however wrong<br>this might be), this new check would cause a deadlock: it would wait forever<br>for itself.<br><br>This changeset introduces an additional check: if ast_sched_del is called<br>by the callback itself, it is immediately rejected (along with an ERROR log and<br>a backtrace). Additionally, the AST_SCHED_DEL_UNREF macro is adjusted so the<br>after-ast_sched_del-refcall function is only run if ast_sched_del returned<br>success.<br><br>This should fix the following spurious race condition found in chan_sip:<br>- thread 1: schedule sip_poke_peer_now (using AST_SCHED_REPLACE)<br>- thread 2: run sip_poke_peer_now<br>- thread 2: blank out sched-ID (too soon!)<br>- thread 1: set sched-ID (too late!)<br>- thread 2: try to delete the currently running sched-ID<br><br>After this fix, an ERROR would be logged, but no deadlocks (in do_monitor) nor<br>excess calls to sip_unref_peer(peer) (causing double frees of rtp_instances and<br>other madness) should occur.<br><br>(Thanks Richard Mudgett for reviewing/improving this "scary" change.)<br><br>Note that this change does not fix the observed race condition: unlocked<br>access to peer->pokeexpire (and potentially other scheduled items in chan_sip),<br>causing AST_SCHED_DEL_UNREF to look at a changing id. But it will make the<br>deadlock go away. And in the observed case, it will not have adverse affects<br>(like memory leaks) because the scheduled item is removed through a different<br>path.<br><br>ASTERISK-28282<br><br>Change-Id: Ic26777fa0732725e6ca7010df17af77a012aa856<br>---<br>M include/asterisk/sched.h<br>M main/sched.c<br>2 files changed, 32 insertions(+), 14 deletions(-)<br><br></pre><pre style="font-family: monospace,monospace; white-space: pre-wrap;"><span>diff --git a/include/asterisk/sched.h b/include/asterisk/sched.h</span><br><span>index 804b05c..7ea6709 100644</span><br><span>--- a/include/asterisk/sched.h</span><br><span>+++ b/include/asterisk/sched.h</span><br><span>@@ -71,20 +71,24 @@</span><br><span> </span><br><span> /*!</span><br><span> * \brief schedule task to get deleted and call unref function</span><br><span style="color: hsl(120, 100%, 40%);">+ *</span><br><span style="color: hsl(120, 100%, 40%);">+ * Only calls unref function if the delete succeeded.</span><br><span style="color: hsl(120, 100%, 40%);">+ *</span><br><span> * \sa AST_SCHED_DEL</span><br><span> * \since 1.6.1</span><br><span> */</span><br><span> #define AST_SCHED_DEL_UNREF(sched, id, refcall) \</span><br><span> do { \</span><br><span style="color: hsl(0, 100%, 40%);">- int _count = 0; \</span><br><span style="color: hsl(0, 100%, 40%);">- while (id > -1 && ast_sched_del(sched, id) && ++_count < 10) { \</span><br><span style="color: hsl(120, 100%, 40%);">+ int _count = 0, _id; \</span><br><span style="color: hsl(120, 100%, 40%);">+ while ((_id = id) > -1 && ast_sched_del(sched, _id) && ++_count < 10) { \</span><br><span> usleep(1); \</span><br><span> } \</span><br><span style="color: hsl(0, 100%, 40%);">- if (_count == 10) \</span><br><span style="color: hsl(0, 100%, 40%);">- ast_log(LOG_WARNING, "Unable to cancel schedule ID %d. This is probably a bug (%s: %s, line %d).\n", id, __FILE__, __PRETTY_FUNCTION__, __LINE__); \</span><br><span style="color: hsl(0, 100%, 40%);">- if (id > -1) \</span><br><span style="color: hsl(120, 100%, 40%);">+ if (_count == 10) { \</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_log(LOG_WARNING, "Unable to cancel schedule ID %d. This is probably a bug (%s: %s, line %d).\n", _id, __FILE__, __PRETTY_FUNCTION__, __LINE__); \</span><br><span style="color: hsl(120, 100%, 40%);">+ } else if (_id > -1) { \</span><br><span> refcall; \</span><br><span style="color: hsl(0, 100%, 40%);">- id = -1; \</span><br><span style="color: hsl(120, 100%, 40%);">+ id = -1; \</span><br><span style="color: hsl(120, 100%, 40%);">+ } \</span><br><span> } while (0);</span><br><span> </span><br><span> /*!</span><br><span>diff --git a/main/sched.c b/main/sched.c</span><br><span>index e5a6e52..a65d7e2 100644</span><br><span>--- a/main/sched.c</span><br><span>+++ b/main/sched.c</span><br><span>@@ -118,6 +118,8 @@</span><br><span> struct sched_thread *sched_thread;</span><br><span> /*! The scheduled task that is currently executing */</span><br><span> struct sched *currently_executing;</span><br><span style="color: hsl(120, 100%, 40%);">+ /*! Valid while currently_executing is not NULL */</span><br><span style="color: hsl(120, 100%, 40%);">+ pthread_t executing_thread_id;</span><br><span> </span><br><span> #ifdef SCHED_MAX_CACHE</span><br><span> AST_LIST_HEAD_NOLOCK(, sched) schedc; /*!< Cache of unused schedule structures and how many */</span><br><span>@@ -627,15 +629,26 @@</span><br><span> }</span><br><span> sched_release(con, s);</span><br><span> } else if (con->currently_executing && (id == con->currently_executing->sched_id->id)) {</span><br><span style="color: hsl(0, 100%, 40%);">- s = con->currently_executing;</span><br><span style="color: hsl(0, 100%, 40%);">- s->deleted = 1;</span><br><span style="color: hsl(0, 100%, 40%);">- /* Wait for executing task to complete so that caller of ast_sched_del() does not</span><br><span style="color: hsl(0, 100%, 40%);">- * free memory out from under the task.</span><br><span style="color: hsl(0, 100%, 40%);">- */</span><br><span style="color: hsl(0, 100%, 40%);">- while (con->currently_executing && (id == con->currently_executing->sched_id->id)) {</span><br><span style="color: hsl(0, 100%, 40%);">- ast_cond_wait(&s->cond, &con->lock);</span><br><span style="color: hsl(120, 100%, 40%);">+ if (con->executing_thread_id == pthread_self()) {</span><br><span style="color: hsl(120, 100%, 40%);">+ /* The scheduled callback is trying to delete itself.</span><br><span style="color: hsl(120, 100%, 40%);">+ * Not good as that is a deadlock. */</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_log(LOG_ERROR,</span><br><span style="color: hsl(120, 100%, 40%);">+ "BUG! Trying to delete sched %d from within the callback %p. "</span><br><span style="color: hsl(120, 100%, 40%);">+ "Ignoring so we don't deadlock\n",</span><br><span style="color: hsl(120, 100%, 40%);">+ id, con->currently_executing->callback);</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_log_backtrace();</span><br><span style="color: hsl(120, 100%, 40%);">+ /* We'll return -1 below because s is NULL.</span><br><span style="color: hsl(120, 100%, 40%);">+ * The caller will rightly assume that the unscheduling failed. */</span><br><span style="color: hsl(120, 100%, 40%);">+ } else {</span><br><span style="color: hsl(120, 100%, 40%);">+ s = con->currently_executing;</span><br><span style="color: hsl(120, 100%, 40%);">+ s->deleted = 1;</span><br><span style="color: hsl(120, 100%, 40%);">+ /* Wait for executing task to complete so that the caller of</span><br><span style="color: hsl(120, 100%, 40%);">+ * ast_sched_del() does not free memory out from under the task. */</span><br><span style="color: hsl(120, 100%, 40%);">+ while (con->currently_executing && (id == con->currently_executing->sched_id->id)) {</span><br><span style="color: hsl(120, 100%, 40%);">+ ast_cond_wait(&s->cond, &con->lock);</span><br><span style="color: hsl(120, 100%, 40%);">+ }</span><br><span style="color: hsl(120, 100%, 40%);">+ /* Do not sched_release() here because ast_sched_runq() will do it */</span><br><span> }</span><br><span style="color: hsl(0, 100%, 40%);">- /* Do not sched_release() here because ast_sched_runq() will do it */</span><br><span> }</span><br><span> </span><br><span> #ifdef DUMP_SCHEDULER</span><br><span>@@ -775,6 +788,7 @@</span><br><span> */</span><br><span> </span><br><span> con->currently_executing = current;</span><br><span style="color: hsl(120, 100%, 40%);">+ con->executing_thread_id = pthread_self();</span><br><span> ast_mutex_unlock(&con->lock);</span><br><span> res = current->callback(current->data);</span><br><span> ast_mutex_lock(&con->lock);</span><br><span></span><br></pre><p>To view, visit <a href="https://gerrit.asterisk.org/c/asterisk/+/10991">change 10991</a>. To unsubscribe, or for help writing mail filters, visit <a href="https://gerrit.asterisk.org/settings">settings</a>.</p><div itemscope itemtype="http://schema.org/EmailMessage"><div itemscope itemprop="action" itemtype="http://schema.org/ViewAction"><link itemprop="url" href="https://gerrit.asterisk.org/c/asterisk/+/10991"/><meta itemprop="name" content="View Change"/></div></div>
<div style="display:none"> Gerrit-Project: asterisk </div>
<div style="display:none"> Gerrit-Branch: 13 </div>
<div style="display:none"> Gerrit-Change-Id: Ic26777fa0732725e6ca7010df17af77a012aa856 </div>
<div style="display:none"> Gerrit-Change-Number: 10991 </div>
<div style="display:none"> Gerrit-PatchSet: 4 </div>
<div style="display:none"> Gerrit-Owner: Walter Doekes <walter+asterisk@wjd.nu> </div>
<div style="display:none"> Gerrit-Reviewer: Friendly Automation </div>
<div style="display:none"> Gerrit-Reviewer: George Joseph <gjoseph@digium.com> </div>
<div style="display:none"> Gerrit-Reviewer: Joshua Colp <jcolp@digium.com> </div>
<div style="display:none"> Gerrit-Reviewer: Matthew Fredrickson <creslin@digium.com> </div>
<div style="display:none"> Gerrit-Reviewer: Richard Mudgett <rmudgett@digium.com> </div>
<div style="display:none"> Gerrit-Reviewer: Walter Doekes <walter+asterisk@wjd.nu> </div>
<div style="display:none"> Gerrit-MessageType: merged </div>