<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 solid;">
<tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="https://reviewboard.asterisk.org/r/2930/">https://reviewboard.asterisk.org/r/2930/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<p style="margin-top: 0;">On October 23rd, 2013, 4:39 a.m. CDT, <b>Guenther Kelleter</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I don't know whats wrong here, but I can crash asterisk with this patch on top of asterisk 11.2-cert2/Rev:397839 after a while in strcmp() with the dialplan from AST-1235 and executing "core restart gracefully" every 5 seconds:
(gdb) bt
#0 0x40088c10 in strcmp () from /lib/libc.so.6
#1 0x08144206 in ast_context_remove_extension_callerid2 (con=0x41418ed0, extension=0xba9ff014 "60.", priority=1,
callerid=0xba9fefc4 "", matchcallerid=0, registrar=0x0, already_locked=1) at pbx.c:6976
#2 0x08152ab1 in __ast_context_destroy (list=0x88a3f78, contexttab=0x41414d00, con=0x0, registrar=0x413115a0 "pbx_config")
at pbx.c:10607
#3 0x08152d21 in ast_context_destroy (con=0x0, registrar=0x413115a0 "pbx_config") at pbx.c:10673
#4 0x4130d88f in unload_module () at pbx_config.c:1407
#5 0x08119186 in ast_module_shutdown () at loader.c:519
#6 0x0807f726 in really_quit (num=0, niceness=-1163923436, restart=1) at asterisk.c:1774
#7 0x0807fc05 in handle_restart_gracefully (e=0xba9ff014, cmd=0, a=0x0) at asterisk.c:1692
#8 0x080cb91c in ast_cli_command_full (uid=0, gid=0, fd=50, s=0xba9ff3c4 "core restart gracefully") at cli.c:2560
#9 0x080cbac9 in ast_cli_command_multiple_full (uid=0, gid=0, fd=50, size=24, s=0xba9ff863 "core restart gracefully")
at cli.c:2583
#10 0x0807d6aa in netconsole (vconsole=0x82276a0) at asterisk.c:1336
#11 0x0819580b in dummy_start (data=0xba9ff014) at utils.c:1028
#12 0x40515d92 in pthread_detach () from /lib/libpthread.so.0
#13 0x400e6e4a in clone () from /lib/libc.so.6
(gdb) info registers
eax 0x0 0
ecx 0x0 0 <------
edx 0xba9ff014 -1163923436
ebx 0x414197a8 1094817704
esp 0xba9feae4 0xba9feae4
ebp 0xba9feaec 0xba9feaec
esi 0x0 0
edi 0x41419030 1094815792
eip 0x40088c10 0x40088c10
eflags 0x10202 66050
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x0 0
(gdb) disas
Dump of assembler code for function strcmp:
0x40088c00 <strcmp+0>: push %ebp
0x40088c01 <strcmp+1>: mov %esp,%ebp
0x40088c03 <strcmp+3>: push %eax
0x40088c04 <strcmp+4>: push %eax
0x40088c05 <strcmp+5>: mov 0x8(%ebp),%ecx
0x40088c08 <strcmp+8>: mov 0xc(%ebp),%edx
0x40088c0b <strcmp+11>: nop
0x40088c0c <strcmp+12>: lea 0x0(%esi),%esi
0x40088c10 <strcmp+16>: mov (%ecx),%al <------
0x40088c12 <strcmp+18>: inc %ecx
(gdb) bt full 4
#0 0x40088c10 in strcmp () from /lib/libc.so.6
No symbol table info available.
#1 0x08144206 in ast_context_remove_extension_callerid2 (con=0x41418ed0, extension=0xba9ff014 "60.", priority=1,
callerid=0xba9fefc4 "", matchcallerid=0, registrar=0x0, already_locked=1) at pbx.c:6976
x = (struct match_char *) 0x0
exten = (struct ast_exten *) 0x0
prev_exten = (struct ast_exten *) 0x0
peer = (struct ast_exten *) 0x414197a8
ex = {exten = 0xba9feb14 "60.", matchcid = 0, cidmatch = 0xba9fefc4 "", priority = 1, label = 0x413115a0 "pbx_config",
parent = 0xba9ff014, app = 0xba9fef5c "|?\237?\024?\237??\216AA", cached_app = 0x811c32b, data = 0x81db87a, datad = 0x296a,
peer = 0x81de187, peer_table = 0x3, peer_label_table = 0x0,
registrar = 0x81e0e48 "Remove %s/%s/%d, registrar=%s; con=%s(%p); con->root=%p\n", next = 0xba9fef78,
stuff = 0xba9fef50 "\001"}
exten2 = (struct ast_exten *) 0x0
exten3 = (struct ast_exten *) 0x0
dummy_name = "60.\000$\016R@@?Q@X\016R@@\016R@`?Q@H\016R@\000\000\000\000\002\000\000\000\000\000\000\000nD\b@\025\000\000\000.?Q@??Q@?\002\216@\020\000p@\204?\237???Q@\020\000p@\020\000p@\204?\237??sQ@ \000p@\020\000p@d\202\023@?rQ@d\202\023@?\002\216@\234?\237?\227g\b@\020\000p@\017\000\000\000\204?\237?\000\001\000\000??\237?\006P\030\b?\002\216@\236?Q@?\002\216@\204?\237?\236?Q@\016?Q@??\237?\016?Q@H\230\037\bX\213Q@p\030\000\000??Q@\210w\037\b*\227AA"...
previous_peer = (struct ast_exten *) 0x41418f50
next_peer = (struct ast_exten *) 0x0
found = 1
__PRETTY_FUNCTION__ = "ast_context_remove_extension_callerid2"
#2 0x08152ab1 in __ast_context_destroy (list=0x88a3f78, contexttab=0x41414d00, con=0x0, registrar=0x413115a0 "pbx_config")
at pbx.c:10607
extension = "60.\000??Q@\020\000p@\020\000p@D?\237??sQ@??\237???Q@??\237?\000\212\037\b\\?\237??oQ@\\?\237?\227g\b@\020\000p@d\202\023@\b?E\b??D\bl?\237?g \017@"
cidmatch = "\000\016R@`?Q@H\016R@??\023@\236?Q@$\016R@@?Q@X\016R@@\016R@`?Q@H\016R@\020\000p@$?\237?\000\000\000\000nD\b@\035\000\000\000.?Q@\236?Q@?LAA\020\000p@"
end_traversal = 1
tmp = (struct ast_context *) 0x41418ed0
---Type <return> to continue, or q <return> to quit---
tmpl = (struct ast_context *) 0x88a3f78
exten_item = (struct ast_exten *) 0x0
prio_item = (struct ast_exten *) 0x41419a38
__PRETTY_FUNCTION__ = "__ast_context_destroy"
#3 0x08152d21 in ast_context_destroy (con=0x0, registrar=0x413115a0 "pbx_config") at pbx.c:10673
No locals.
(More stack frames follow...)
-----------------------------------------------
And this crash also happens very often in this test:
(gdb) bt
#0 0x0814879d in ast_unregister_application (app=0x4067fc7f "Monitor") at pbx.c:8403
#1 0x4067f8d0 in unload_module () at res_monitor.c:939
#2 0x08119186 in ast_module_shutdown () at loader.c:519
#3 0x0807f726 in really_quit (num=0, niceness=1099017792, restart=1) at asterisk.c:1774
#4 0x0807fc05 in handle_restart_gracefully (e=0x4181ae40, cmd=1099007104, a=0x41818480) at asterisk.c:1692
#5 0x080cb91c in ast_cli_command_full (uid=0, gid=0, fd=50, s=0xba3ff3c4 "core restart gracefully") at cli.c:2560
#6 0x080cbac9 in ast_cli_command_multiple_full (uid=0, gid=0, fd=50, size=24, s=0xba3ff863 "core restart gracefully")
at cli.c:2583
#7 0x0807d6aa in netconsole (vconsole=0x82276a0) at asterisk.c:1336
#8 0x0819580b in dummy_start (data=0x4181ae40) at utils.c:1028
#9 0x40515d92 in pthread_detach () from /lib/libpthread.so.0
#10 0x400e6e4a in clone () from /lib/libc.so.6
this is here (comments by me):
static void unreference_cached_app(struct ast_app *app)
{
struct ast_context *context = NULL;
struct ast_exten *eroot = NULL, *e = NULL;
ast_rdlock_contexts(); // take a read-lock...
while ((context = ast_walk_contexts(context))) {
while ((eroot = ast_walk_context_extensions(context, eroot))) {
while ((e = ast_walk_extension_priorities(eroot, e))) {
//line 8304----> if (e->cached_app == app)
e->cached_app = NULL; // ...and then write???
}
}
}
ast_unlock_contexts();
return;
}
There is write operations under a read-lock?
</pre>
</blockquote>
<p>On October 23rd, 2013, 10:29 a.m. CDT, <b>rmudgett</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">If you look at the ast_rdlock_contexts() you will see that the contexts lock is not a rd/wr lock but actually a mutex. It has not been a rd/wr lock for quite some time. So something else is going on.</pre>
</blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">There is a mistake in that patch. However, even after correcting that I'm seeing additional problems via valgrind, even though it's not crashing for me. I will submit another patch after I've resolved these issues.</pre>
<br />
<p>- Scott</p>
<br />
<p>On October 21st, 2013, 4:18 p.m. CDT, Scott Griepentrog wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" style="background-image: url('https://reviewboard.asterisk.org/static/rb/images/review_request_box_top_bg.png'); background-position: left top; background-repeat: repeat-x; border: 1px black solid;">
<tr>
<td>
<div>Review request for Asterisk Developers and Guenther Kelleter.</div>
<div>By Scott Griepentrog.</div>
<p style="color: grey;"><i>Updated Oct. 21, 2013, 4:18 p.m.</i></p>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Bugs: </b>
<a href="https://issues.asterisk.org/jira/browse/AST-1235">AST-1235</a>
</div>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt;">Repository: </b>
Asterisk
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">This patch fixes a bug report where restarting Asterisk repeatedly would cause a crash due to accidentally deleting extensions that were still referenced in the context's root_table. This occurred due to confusion over matching or not matching extensions based on callerid, which is also being cleared up via an enum'd third state.
There are two values in an extension entry: matchcid which says whether cid should be matched, and cidmatch which is the string value to match. The matchcid value was previously non-zero when cidmatch was signficant. This is important since a null length callerid match must be kept distinct from a match-anything condition.
This patch adds an enum to define the AST_EXT_MATCHCID_OFF (0) or AST_EXT_MATCHCID_ON (1) behaviors, and adds AST_EXT_MATCHCID_ANY (2) which allows matching to both exten->matchcid=0 and exten->matchcid=1 cases. For example, all exten's with the same exten (number) can be deleted with ANY, but when looking for a dialplan match it must match to matchcid, and if set to ON, further match cidmatch.
The logic to match extensions has been reworked to implement these three types, and then matchcid values updated to use the enum for clarity.
The most significant change is in ast_context_remove_extension_callerid2() where after removing extensions from the hashtable, the deletion would destroy the extensions. This was deleting exten's that didn't have matchcid set but matched a null cidmatch, where they hadn't been removed from the hashtable.
- (!callerid || !matchcallerid || (matchcallerid && !strcmp(peer->cidmatch, callerid))) &&
+ (!callerid || (!matchcallerid && !peer->matchcid) || (matchcallerid && peer->matchcid && !strcmp(peer->cidmatch, callerid))) &&
</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">This patch has been tested with the testsuite additions dialplan_reload and pbx/callerid_match (posted separately) to insure that the crash case has been resolved without any other unwanted changes in behavior.
</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>/branches/1.8/include/asterisk/pbx.h <span style="color: grey">(400531)</span></li>
<li>/branches/1.8/main/pbx.c <span style="color: grey">(400531)</span></li>
</ul>
<p><a href="https://reviewboard.asterisk.org/r/2930/diff/" style="margin-left: 3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>