<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 solid;">
<tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="https://reviewboard.asterisk.org/r/3300/">https://reviewboard.asterisk.org/r/3300/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<p style="margin-top: 0;">On March 6th, 2014, 9:20 p.m. UTC, <b>Mark Michelson</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I'm not a fan of this change, because I think it's not really fixing the root problem. bridge_p2p_rtp_write() is called in only one place in res_rtp_asterisk.c, and it's called like this:
/* If we are directly bridged to another instance send the audio directly out */
if (ast_rtp_instance_get_bridged(instance) && !bridge_p2p_rtp_write(instance, rtpheader, res, hdrlen)) {
return &ast_null_frame;
}
In other words, before entering bridge_p2p_rtp_write(), ast_rtp_instance_get_bridged() was returning non-NULL. So something is NULLing out the pointer between its initial check and the call to bridge_p2p_rtp_write(). This means that adding a second check for the same thing is likely not to actually catch the problem and instead just make the crash happen less often.
Looking at the linked issue, the crash is occurring outside of bridging code, when a channel is read from in app_dial(). My guess is that this channel had been RTP bridged but was yanked out of the bridge via a masquerade and thrown into the dial application. The dial application attempted to read from the channel before the previous local bridge loop had finished with the channel, thus resulting in crashiness.
The crash on the linked issue seems to verify this. The crashing thread has channel 0x7f5e943a3cc0 being read from while simultaneously in thread 7380, the same channel is part of a masquerade that occured as a result of a SIP attended transfer.</pre>
</blockquote>
<p>On March 10th, 2014, 5:34 p.m. UTC, <b>Leif Madsen</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I have to agree, and have to thank you for your analysis of the situation. For now this patch may help us stabilize some of our platform, but I definitely would prefer to figure out what the underlying problem is.
We just had another crash today with the same type of issue. What else can I do to help debug this further?</pre>
</blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><leifmadsen> putnopvut: on r3300 you commented after reviewing the backtrace. We had another crash in the same part of the code today, and thought I'd check to see if any more information could be had through gdb to help?
<putnopvut> leifmadsen: yeah,I saw your reviewboard comment. I think at this stage, the information-gathering stage is pretty much done, so I can't think of anything else that you could give us that would help out. The problem is that there's a race condition that needs to be patched.
<putnopvut> From the provided backtrace, I can tell what the problem is, but fixing it is not going to be easy.
<putnopvut> That's also why I didn't do my usual thing and provide a suggestion on the review of a different approach.
<leifmadsen> indeed
<putnopvut> leifmadsen: as a temporary workaround, you can apply russellb's patch. It's not going to hurt anything, and it may lessen the chances of the crash occurring, but the possibility will still be there.</pre>
<br />
<p>- Mark</p>
<br />
<p>On March 5th, 2014, 6:49 p.m. UTC, Russell Bryant wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" style="background-image: url('https://reviewboard.asterisk.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); background-position: left top; background-repeat: repeat-x; border: 1px black solid;">
<tr>
<td>
<div>Review request for Asterisk Developers and leifmadsen.</div>
<div>By Russell Bryant.</div>
<p style="color: grey;"><i>Updated March 5, 2014, 6:49 p.m.</i></p>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Bugs: </b>
<a href="https://issues.asterisk.org/jira/browse/ASTERISK-23419">ASTERISK-23419</a>
</div>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt;">Repository: </b>
Asterisk
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">This bug shows a crash in bridge_p2p_rtp_write(). There is no bridged rtp instance so it goes boom. The patch just catches the case and returns. I'm not really sure *why* there's no bridged instance, but this seems like a pretty safe function input sanity check.</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>/tags/1.8.18.1/res/res_rtp_asterisk.c <span style="color: grey">(405155)</span></li>
</ul>
<p><a href="https://reviewboard.asterisk.org/r/3300/diff/" style="margin-left: 3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>