<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 solid;">
<tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="https://reviewboard.asterisk.org/r/2197/">https://reviewboard.asterisk.org/r/2197/</a>
</td>
</tr>
</table>
<br />
<div>
<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; border-collapse: collapse; margin: 2px padding: 2px;">
<thead>
<tr>
<th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; font-size: 9pt; padding: 4px 8px; text-align: left;">
<a href="https://reviewboard.asterisk.org/r/2197/diff/2/?file=32171#file32171line3726" style="color: black; font-weight: bold; text-decoration: underline;">/trunk/main/asterisk.c</a>
<span style="font-weight: normal;">
(Diff revision 2)
</span>
</th>
</tr>
</thead>
<tbody style="background-color: #e4d9cb; padding: 4px 8px; text-align: center;">
<tr>
<td colspan="4"><pre style="font-size: 8pt; line-height: 140%; margin: 0; ">int main(int argc, char *argv[])</pre></td>
</tr>
</tbody>
<tbody>
<tr>
<th bgcolor="#ebb1ba" style="border-right: 1px solid #C0C0C0;" align="right"><font size="2">3723</font></th>
<td bgcolor="#ffc5ce" width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; ">        <span class="k">if</span> <span class="p">(</span><span class="n">ast_opt_console</span> <span class="o">&&</span> <span class="o">!</span><span class="n">option_verbose</span><span class="p">)</span> <span class="p">{</span></pre></td>
<th bgcolor="#ebb1ba" style="border-left: 1px solid #C0C0C0; border-right: 1px solid #C0C0C0;" align="right"><font size="2"></font></th>
<td bgcolor="#ffc5ce" width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>
</tr>
</tbody>
</table>
<pre style="margin-left: 2em; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Did you mean to get rid of this, too?</pre>
</div>
<br />
<div>
<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; border-collapse: collapse; margin: 2px padding: 2px;">
<thead>
<tr>
<th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; font-size: 9pt; padding: 4px 8px; text-align: left;">
<a href="https://reviewboard.asterisk.org/r/2197/diff/2/?file=32171#file32171line4045" style="color: black; font-weight: bold; text-decoration: underline;">/trunk/main/asterisk.c</a>
<span style="font-weight: normal;">
(Diff revision 2)
</span>
</th>
</tr>
</thead>
<tbody style="background-color: #e4d9cb; padding: 4px 8px; text-align: center;">
<tr>
<td colspan="4"><pre style="font-size: 8pt; line-height: 140%; margin: 0; ">int main(int argc, char *argv[])</pre></td>
</tr>
</tbody>
<tbody>
<tr>
<th bgcolor="#b1ebb0" style="border-right: 1px solid #C0C0C0;" align="right"><font size="2"></font></th>
<td bgcolor="#c5ffc4" width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>
<th bgcolor="#b1ebb0" style="border-left: 1px solid #C0C0C0; border-right: 1px solid #C0C0C0;" align="right"><font size="2">3986</font></th>
<td bgcolor="#c5ffc4" width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; ">                        <span class="n">multi_thread_safe</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span></pre></td>
</tr>
</tbody>
</table>
<pre style="margin-left: 2em; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Wasn't this already set above?</pre>
</div>
<br />
<p>- David</p>
<br />
<p>On November 16th, 2012, 3:44 p.m., Matt Jordan wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" style="background-image: url('https://reviewboard.asterisk.org/media/rb/images/review_request_box_top_bg.png'); background-position: left top; background-repeat: repeat-x; border: 1px black solid;">
<tr>
<td>
<div>Review request for Asterisk Developers.</div>
<div>By Matt Jordan.</div>
<p style="color: grey;"><i>Updated Nov. 16, 2012, 3:44 p.m.</i></p>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">This is a very convoluted issue - for a full explanation, see ASTERISK-19463.
For a long time now, the Asterisk Test Suite has run into a problem where Asterisk appears to hit a deadlock during startup. This only occurs if DEBUG_THREADS is enabled; only occurs very rarely (on the 64-bit test system about 1 out of every 500 or so tests; on a 32-bit test system about 3-5 out of every 150 or so tests); occurs in the lock tracking object's recursive mutex (see the backtrace on ASTERISK-19463 for an example of what it looks like); and *only* occurs during startup. Once the Asterisk system is up and running, this error hasn't been seen.
Initially, we suspected that there was some unbalance in the lock/unlock of said mutex; however, inspection and enough testing verified that to not be the case. When we actually put in debug code to check the return value of the recursive mutex, we found that an unlock operation (a call to ast_reentrancy_unlock) was failing with an error code of EPERM, implying that the thread that was performing the unlock operation did not own the mutex. Note that the code where this occurs is very simple:
                ast_reentrancy_lock(lt);
                if (lt->reentrancy < AST_MAX_REENTRANCY) {
                        lt->file[lt->reentrancy] = filename;
                        lt->lineno[lt->reentrancy] = line;
                        lt->func[lt->reentrancy] = func;
                        lt->thread[lt->reentrancy] = pthread_self();
                        lt->reentrancy++;
                }
                ast_reentrancy_unlock(lt);
In a recursive pthread mutexes, the thread that owns the mutex is tracked to determine if the lock/unlock call is recursive. There's very few ways that the thread identifier could change in a protected section of the code. One way in which it can happen is if the process is forked.
When Asterisk starts up and is told to run in the background, it makes a call to daemon. In order to send the process into the background, the process is forked and the forked process is sent to the background, while the parent process exits. While we were unable to determine what code path was causing the thread identifier in the recursive mutex to change or be mangled, we were able to identify that it occured in a call to logger.
The patch attached does the following:
1) During startup, prior to calling daemon all output methods use fprintf to send error messages to stderr. This prevents ever having to go into the Asterisk logger subsystem, thereby potentially accessing one of it's mutexes.
2) The call to daemon has been moved up earlier in the startup sequence. Things in Asterisk that can be moved later have been moved further down in the start up sequence to minimize the risk that they call into logger and access a recursive mutex before the fork.
Additionally, this patch includes one additional fix. When DEBUG_THREADS is enabled threadstorage is initialized used to store lock information. This must occur before the thread is registered, as the thread registration itself will attempt to use the threadstorage to store the lock information. This was not causing the problem, but initializing a mutex after its been used is bad.</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">A lot.
A bamboo test agent (32-bit) was used to test the failures, as it would routinely reproduce the behavior (approximately 3 failures out of 150 or so tests) in every run. After this patch was applied, the tests were run an additional 13 times with no failures (more are being run currently).
Evidence for the patch resolving the failures can be seen here:
http://bamboo.asterisk.org/browse/ASTTEAM-MJORDAN-94
Note that this issue has caused numerous false test failures in the continuous integration tests, and resolving it will go a long ways towards making that process more stable.</pre>
</td>
</tr>
</table>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Bugs: </b>
<a href="https://issues.asterisk.org/jira/browse/ASTERISK-19463">ASTERISK-19463</a>
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>/trunk/main/asterisk.c <span style="color: grey">(376374)</span></li>
<li>/trunk/main/utils.c <span style="color: grey">(376374)</span></li>
</ul>
<p><a href="https://reviewboard.asterisk.org/r/2197/diff/" style="margin-left: 3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>