<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 solid;">
<tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="https://reviewboard.asterisk.org/r/2567/">https://reviewboard.asterisk.org/r/2567/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<p style="margin-top: 0;">On May 30th, 2013, 10:46 p.m. UTC, <b>Mark Michelson</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">This is a nifty idea, but I'm a bit skeptical about how well it's going to work in the case where, say, one or more of the bt addresses is in a loadable module that gets unloaded between the time the allocation occurs and when the memory error is detected. Since the ast_bt just stores addresses and those get interpreted to strings at the time the backtrace is printed, I don't know how smoothly the conversion to strings will go if the module has been unloaded. The same goes for if a module is unloaded and then loaded again.
What may work better is to store the strings in the ast_bt object rather than addresses.</pre>
</blockquote>
<p>On June 1st, 2013, 8:30 p.m. UTC, <b>Matt Jordan</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Rather than modifying the ast_bt structure - which is used elsewhere - I'd rather change the memory region in astmm to store strings. That should solve the possible issue without causing ripple effects everywhere else.</pre>
</blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">So this failed miserably.
When BETTER_BACKTRACES is enabled, we do quite a lot of work to get the symbols, filenames, functions, and line numbers. This includes opening the asterisk binary and doing quite a bit of processing on it. Suffice to say that doing on this on each allocation slowed Asterisk more than having valgrind. I suspect the impact is still quite heavy even without BETTER_BACKTRACES; however, since this patch is most useful when you have the compilation flag enabled, this is kind of a non-starter.
I took a look at both the bfd library calls as well as backtrace_symbols.
In the case of the bfd library calls, we explicitly parse over the sections in the asterisk binary, determining if they're within the range of the addresses in the ast_bt object. Since we base this off of the sections in the asterisk binary, we simply won't do anything with an address that isn't in a valid range. So, if we have an address out of range, we should be okay.
If an address is in a valid range, we use bfd_find_nearest_line to get the line. While there isn't really any documentation for this, the name alone suggests that it's going to give back the best effort it can - which means the worse case here is incorrect information.
backtrace_symbols takes in the addresses generated from backtrace. Looking at execinfo.c where backtrace_symbols is defined, it (most likely) will use dladdr to resolve the address provided to a symbol. dladdr states:
"If no symbol matching addr could be found, then dli_sname and dli_saddr are set to NULL."
This return is handled in backtrace_symbols.
So, I think we're effectively okay even if we have:
* a dynamic module that returns memory
* that memory allocation creates a backtrace
* the module is unloaded
* a corruption occurs and we calculate the backtrace
The backtrace may be wrong or incomplete, but it shouldn't crash.
</pre>
<br />
<p>- Matt</p>
<br />
<p>On May 27th, 2013, 6:33 p.m. UTC, Matt Jordan wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" style="background-image: url('https://reviewboard.asterisk.org/static/rb/images/review_request_box_top_bg.png'); background-position: left top; background-repeat: repeat-x; border: 1px black solid;">
<tr>
<td>
<div>Review request for Asterisk Developers and rmudgett.</div>
<div>By Matt Jordan.</div>
<p style="color: grey;"><i>Updated May 27, 2013, 6:33 p.m.</i></p>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt;">Repository: </b>
Asterisk
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Richard's patch that added a 'mini-valgrind' into Asterisk is incredibly useful. When you have a memory corruption, it will tell you the particular location in Asterisk that allocated the memory. Unfortunately, this often simply tells you the needle to look for in a stack of needles. For example, a memory corruption caused by improper JSON reference counting may just tell you this:
WARNING: Memory corrupted after free of 0x27eb8e0 allocated at json.c json_malloc() line 52
Since there's a whole mess of json_malloc calls, this is only so useful.
Luckily, we have backtrace generation in Asterisk - which is used primarily by DEBUG_THREADS and locations where Asterisk with DO_CRASH enabled will abort. This patch refactors the backtrace generation code into its own translation unit so that astmm.c can get at it safely, and adds an ast_bt object to the region memory structure. When a memory region is allocated or used, a backtrace is generated so that if the memory becomes corrupted, we know who originally allocated it.
That turns the previous line into this:
WARNING: Memory corrupted after free of 0x27eb8e0 allocated at json.c json_malloc() line 52
Memory allocation backtrace:
#0: [0x4593e5] main/astmm.c:498 __ast_malloc() (0x4593a9+3C)
#1: [0x532f3a] main/json.c:53 json_malloc()
#2: [0x7f9e93c3a8ca] src/value.c:40 json_object() (0x7f9e93c3a8b0+1A)
#3: [0x7f9e93c396dd] src/pack_unpack.c:91 pack_object()
#4: [0x7f9e93c39be0] src/pack_unpack.c:550 json_vpack_ex() (0x7f9e93c39b50+90)
#5: [0x533ce6] main/json.c:496 ast_json_vpack() (0x533caa+3C)
#6: [0x533c9a] main/json.c:488 ast_json_pack() (0x533bfb+9F)
#7: [0x44e580] main/asterisk.c:1168 publish_fully_booted()
#8: [0x4583bd] main/asterisk.c:4444 main()
</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Fixed two memory corruptions thanks to this patch. Yay MALLOC_DEBUG.</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>/trunk/include/asterisk/backtrace.h <span style="color: grey">(PRE-CREATION)</span></li>
<li>/trunk/include/asterisk/lock.h <span style="color: grey">(389768)</span></li>
<li>/trunk/include/asterisk/logger.h <span style="color: grey">(389768)</span></li>
<li>/trunk/main/astmm.c <span style="color: grey">(389768)</span></li>
<li>/trunk/main/astobj2.c <span style="color: grey">(389768)</span></li>
<li>/trunk/main/backtrace.c <span style="color: grey">(PRE-CREATION)</span></li>
<li>/trunk/main/logger.c <span style="color: grey">(389768)</span></li>
<li>/trunk/utils/extconf.c <span style="color: grey">(389768)</span></li>
</ul>
<p><a href="https://reviewboard.asterisk.org/r/2567/diff/" style="margin-left: 3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>