<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi All,</p>
<p>So I've received a few backtraces from a customer (Still using
ast 11 ... we're in progress of getting them migrated). They all
follow this pattern (full stack below):</p>
<p class="MsoNormal">#0 ast_hashtab_lookup_internal
(tab=0x7f41bc277920, tab=0x7f41bc277920, h=<optimized out>,
obj=0x7f4203176a50) at hashtab.c:602</p>
<b>b = 0x1041</b><b><br>
</b>#1 ast_hashtab_lookup (tab=0x7f41bc277920,
obj=obj@entry=0x7f4203176a50) at hashtab.c:548#2 0x000000000053af60
in pbx_find_extension (chan=chan@entry=0x0, bypass=bypass@entry=0x0,
q=q@entry=0x7f4203176e10, context=context@entry=0x7f4207f12c70
<sip_cfg+80> "sip_registrations",
exten=exten@entry=0x7f4203176d10 "filtered",
priority=priority@entry=1, label=label@entry=0x0,
callerid=callerid@entry=0x7f4207cdde03 "",
action=action@entry=E_MATCH) at pbx.c:3438 <br>
<p class="MsoNormal">
#3 0x00007f4207c4d7c1 in register_peer_exten
(peer=peer@entry=0x7f41bd4a0ca8, onoff=0) at chan_sip.c:5023</p>
#4 0x00007f4207c65bfa in sip_destroy_peer (peer=0x7f41bd4a0ca8) at
chan_sip.c:5084
<p class="MsoNormal">Note the pointer value of b which comes from
tab->array. In this particular case this was the actual value
from tab->array[1] ... a pointer which points to the first page
of memory, and an obvious invalid pointer.<br>
</p>
<p class="MsoNormal">After some digging I realized that the
following hashtab creations does NOT set the do_locking argument
to 1 (master branch, but verified against 13 too, so I'm assuming
15, 16 and 17 as well):</p>
<p class="MsoNormal">./main/pbx.c:</p>
<p class="MsoNormal">6197 contexts_table =
ast_hashtab_create(17,<br>
6198 ast_hashtab_compare_contexts,<br>
6199 ast_hashtab_resize_java,<br>
6200 ast_hashtab_newsize_java,<br>
6201 ast_hashtab_hash_contexts,<br>
<b>6202 0);</b></p>
<p class="MsoNormal">7512 tmp->peer_table =
ast_hashtab_create(13,<br>
7513 hashtab_compare_exten_numbers,<br>
7514 ast_hashtab_resize_java,<br>
7515 ast_hashtab_newsize_java,<br>
7516 hashtab_hash_priority,<br>
<b>7517 0);</b></p>
<p class="MsoNormal">7518 tmp->peer_label_table =
ast_hashtab_create(7,<br>
7519 hashtab_compare_exten_labels,<br>
7520 ast_hashtab_resize_java,<br>
7521 ast_hashtab_newsize_java,<br>
7522 hashtab_hash_labels,<br>
<b>7523 0);</b><br>
</p>
<p class="MsoNormal">After scoping out hashtab.c, and in spite of
various comments in the header, I'm by no means convinced that
this code is in any way threadsafe without that argument being set
to 1. For pbx.c it may be just fine in a read-heavy environment,
but with regcontext as per above that assumption goes out the
door. And I'm surprised the frequency of this particular crash is
approximately once a week over an 8-node setup.</p>
<p class="MsoNormal">In master (same in 13 branch) there is four
more uses in pbx/pbx_*.c which also does the same thing, and I'm
assuming will suffer the same consequences. The EOL 11 branch has
a few extra uses, mostly setting locking to on.<br>
</p>
<p class="MsoNormal">In 13 branch there is actually a few extra
users in main/pbx.c, three specifically, none of them sets
locking.</p>
<p class="MsoNormal">Since the dialplan code is accessed from
multiple threads - can we please get an explanation of how
hashtab.c protects against concurrent access given the lack of
do_locking? From the header:</p>
<p class="MsoNormal"> 66 6. Recently updated the hash routines to
use Doubly-linked lists for buckets,<br>
67 and added a doubly-linked list that threads thru every
bucket in the table.<br>
68 The list of all buckets is on the HashTab struct. The
Traversal was modified<br>
69 to go thru this list instead of searching the bucket array
for buckets.<br>
70 <b>This also should make it safe to remove a bucket
during the traversal.</b><br>
71 Removal and destruction routines will work faster.<br>
</p>
<p class="MsoNormal">On line 70 specifically, I don't see any way
this can be true.<br>
</p>
<p class="MsoNormal">May I suggest that the argument for do_locking
is discarded, and do_locking is forced on:</p>
<p class="MsoNormal">diff --git a/main/hashtab.c b/main/hashtab.c<br>
index 048dc79274..9e154e476f 100644<br>
--- a/main/hashtab.c<br>
+++ b/main/hashtab.c<br>
@@ -260,10 +260,9 @@ ast_hashtab_create<br>
ht->resize = resize;<br>
ht->newsize = newsize;<br>
ht->hash = hash;<br>
- ht->do_locking = do_locking;<br>
+ ht->do_locking = 1;<br>
<br>
- if (do_locking)<br>
- ast_rwlock_init(&ht->lock);<br>
+ ast_rwlock_init(&ht->lock);<br>
<br>
if (!ht->resize)<br>
ht->resize = ast_hashtab_resize_java;<br>
</p>
<br>
<p>Happy to submit both a bug report and code review based on
feedback from the ML. Alternatively I'd push a patch that sets
do_locking to 1 for all invocations of ast_hashtab_create. Or one
could even remove the do_locking variable completely. Happy to
follow any of these approaches, the above is probably the least
correct of the three, but the smallest change.<br>
</p>
<p class="MsoNormal">(gdb) bt full</p>
<p class="MsoNormal">#0 ast_hashtab_lookup_internal
(tab=0x7f41bc277920, tab=0x7f41bc277920, h=<optimized out>,
obj=0x7f4203176a50) at hashtab.c:602</p>
b = 0x1041#1 ast_hashtab_lookup (tab=0x7f41bc277920,
obj=obj@entry=0x7f4203176a50) at hashtab.c:548<br>
h = <optimized out><br>
__PRETTY_FUNCTION__ = "ast_hashtab_lookup"
<p class="MsoNormal">#2 0x000000000053af60 in pbx_find_extension
(chan=chan@entry=0x0, bypass=bypass@entry=0x0,
q=q@entry=0x7f4203176e10, context=context@entry=0x7f4207f12c70
<sip_cfg+80> "sip_registrations",
exten=exten@entry=0x7f4203176d10 "filtered",
priority=priority@entry=1, label=label@entry=0x0,
callerid=callerid@entry=0x7f4207cdde03 "",
action=action@entry=E_MATCH) at pbx.c:3438
<br>
match = 1</p>
x = <optimized out><br>
res = <optimized out>
<p class="MsoNormal"> tmp = <optimized out></p>
e = 0x0<br>
eroot = <optimized out>
<p class="MsoNormal"> i = 0x0</p>
sw = 0x0<br>
pattern = {exten = 0x0, name = 0x0, matchcid = 0, cidmatch =
0x0, cidmatch_display = 0x0, priority = 1, label = 0x0, parent =
0x0, app = 0x0, cached_app = 0x0, data = 0x0, datad = 0x0, peer =
0x0, peer_table = 0x0,
<p class="MsoNormal"> peer_label_table = 0x0, registrar =
0x0, next = 0x0, stuff = 0x7f4203176ad8
"\377\377\377\377\377\377\377\377}\202s_B\177"}</p>
score = {total_specificity = 0, total_length = 0, last_char
= 0 '\000', canmatch = 0, node = 0x0, canmatch_exten = 0x0, exten =
0x0}<br>
tmpdata = 0x0
<p class="MsoNormal">#3 0x00007f4207c4d7c1 in register_peer_exten
(peer=peer@entry=0x7f41bd4a0ca8, onoff=0) at chan_sip.c:5023</p>
multi =
"filtered\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000"...<br>
stringp = 0x0
<p class="MsoNormal"> ext = 0x7f4203176d10 "filtered"</p>
context = 0x7f4207f12c70 <sip_cfg+80>
"sip_registrations"<br>
q = {incstack = {0x0 <repeats 128 times>}, stacklen =
0, status = 3, swo = 0x0, data = 0x0, foundcontext = 0x0}
<p class="MsoNormal">#4 0x00007f4207c65bfa in sip_destroy_peer
(peer=0x7f41bd4a0ca8) at chan_sip.c:5084</p>
No locals.<br>
#5 sip_destroy_peer_fn (peer=0x7f41bd4a0ca8) at chan_sip.c:5048<br>
No locals.
<p class="MsoNormal">#6 0x000000000044a5f5 in internal_ao2_ref
(func=0x659bf3 <__FUNCTION__.9120> "__ao2_ref", line=551,
file=0x659853 "astobj2.c", delta=delta@entry=-1,
user_data=user_data@entry=0x7f41bd4a0ca8) at astobj2.c:469</p>
obj_mutex = <optimized out><br>
obj_rwlock = <optimized out>
<p class="MsoNormal"> current_value = 0</p>
#7 __ao2_ref (user_data=user_data@entry=0x7f41bd4a0ca8,
delta=delta@entry=-1) at astobj2.c:551<br>
No locals.
<p class="MsoNormal">#8 0x00007f4207c5e49e in sip_unref_peer
(peer=peer@entry=0x7f41bd4a0ca8, tag=tag@entry=0x7f4207cef7b8
"register_verify: sip_unref_peer: tossing stack peer pointer at
end of func") at chan_sip.c:3140</p>
No locals.<br>
#9 0x00007f4207ccba69 in register_verify (p=p@entry=0x7f41bd4a93d8,
addr=addr@entry=0x7f42031781b0, req=req@entry=0x7f4203178240,
uri=uri@entry=0x7f41bc45f731 "sip:filtered") at chan_sip.c:17280
<p class="MsoNormal"> res = AUTH_SUCCESSFUL</p>
peer = <optimized out><br>
tmp =
"<sip:filtered\000\000B\177\000\000\000\000\000\000\000\000\000\000ؓJ\275A\177\000\000\020\201\027\003B\177\000\000@\212\346\274A\177\000\000\200\210\346\274A\177\000\000
\000\000\274A\177\000\000\300\001\000\000\000\000\000\000P\002\000\000\000\000\000\000\320\373#\274A\177\000\000\362\360v_B\177\000\000\260u\027\003B\177\000\000\360t\027\003B\177",
'\000' <repeats 18 times>,
"@v\027\003B\177\000\000Pv\027\003B\177\000\000\001\000\000\000\000\000\000\000\220\210\346\274A\177\000\000
\000\000\274A\177\000\000G\002\000\000\000\000\000\000\200\210\346\274A\177\000\000P"...
<p class="MsoNormal"> c = <optimized out></p>
name = 0x7f4203177475 "filtered"<br>
unused_password = 0x7f4207cdde03 "filtered"
<p class="MsoNormal"> domain = 0x7f420317747d "filtered"</p>
uri2 = <optimized out><br>
send_mwi = <optimized out><br>
__PRETTY_FUNCTION__ = "register_verify"
<p class="MsoNormal">#10 0x00007f4207cce07e in
handle_request_register (e=0x7f41bc45f731 "sip:filtered",
addr=0x7f42031781b0, req=0x7f4203178240, p=0x7f41bd4a93d8) at
chan_sip.c:28358</p>
res = <optimized out><br>
#11 handle_incoming (p=p@entry=0x7f41bd4a93d8,
req=req@entry=0x7f4203178240, addr=addr@entry=0x7f42031781b0,
recount=recount@entry=0x7f4203178140,
nounlock=nounlock@entry=0x7f4203178150) at chan_sip.c:28666
<p class="MsoNormal"> cmd = 0x7f41bc45f728 "REGISTER"</p>
cseq = <optimized out><br>
useragent = <optimized out>
<p class="MsoNormal"> via = <optimized out></p>
callid = <optimized out><br>
via_pos = 2
<p class="MsoNormal"> seqno = 139</p>
len = 3<br>
respid = 0
<p class="MsoNormal"> res = 0</p>
e = 0x7f41bc45f731 "sip:filtered"<br>
error = 0
<p class="MsoNormal"> oldmethod = 2</p>
acked = 0<br>
__PRETTY_FUNCTION__ = "handle_incoming"
<p class="MsoNormal">#12 0x00007f4207cd01c7 in handle_request_do
(req=req@entry=0x7f4203178240, addr=addr@entry=0x7f42031781b0) at
chan_sip.c:28834</p>
p = 0x7f41bd4a93d8<br>
owner_chan_ref = 0x0
<p class="MsoNormal"> recount = 0</p>
nounlock = 0<br>
__PRETTY_FUNCTION__ = "handle_request_do"
<p class="MsoNormal">#13 0x00007f4207cd1a6d in sipsock_read
(id=<optimized out>, fd=<optimized out>,
events=<optimized out>, ignore=<optimized out>) at
chan_sip.c:28765</p>
req = {rlpart1 = 0, rlpart2 = 9, headers = 12, method = 2,
lines = 0, sdp_start = 0, sdp_count = 0, debug = 0 '\000',
has_to_tag = 0 '\000', ignore = 0 '\000', authenticated = 0 '\000',
header = {0, 35, 123, 141, 181, 223, 242,<br>
317, 334, 347, 379, 399, 558, 0 <repeats 51
times>}, line = {558, 0 <repeats 255 times>}, data =
0x7f41bc45f710, content = 0x0, socket = {type = SIP_TRANSPORT_UDP,
fd = -1, port = 50195, tcptls_session = 0x0,
<p class="MsoNormal"> ws_session = 0x0}, next = {next =
0x0}, reqsipoptions = 0}</p>
addr = {ss = {ss_family = 2,<br>
__ss_padding =
"\023\330\n.JK\000\000\000\000\000\000\000\000\364\000\000\000\000\000\000\000\062\001\000\000\000\000\000\000E\001",
'\000' <repeats 14 times>,
"\214\001\000\000\000\000\000\000\260}\027\003B\177\000\000\377\377\377\377\000\000\000\000\020\002\000\001
\000\000\000@\204\027\003B\177\000\000\200\360\252_B\177", '\000'
<repeats 25 times>, __ss_align = 0}, len = 16}
<p class="MsoNormal"> res = <optimized out></p>
readbuf = "REGISTER sip:filtered SIP/2.0\r\nVia: SIP/2.0/UDP
filtered:5080;rport;branch=z9hG4bK-1576026621656-MyPhone-filtered-70\r\nContent-Length:
0\r\nContact: <sip:filtered@filtered:5080>\r\nCall-ID:
157602"...<br>
__PRETTY_FUNCTION__ = "sipsock_read"
<p class="MsoNormal">#14 0x00000000004fbbea in ast_io_wait
(ioc=0x2630220, howlong=<optimized out>) at io.c:292</p>
x = <optimized out><br>
origcnt = <optimized out>
<p class="MsoNormal">#15 0x00007f4207cade9f in do_monitor
(data=data@entry=0x0) at chan_sip.c:29364</p>
res = <optimized out><br>
t = 1576026621
<p class="MsoNormal"> reloading = <optimized out></p>
__PRETTY_FUNCTION__ = "do_monitor"<br>
#16 0x000000000058e23a in dummy_start (data=<optimized out>)
at utils.c:1223
<p class="MsoNormal"> __cancel_buf = {__cancel_jmp_buf =
{{__cancel_jmp_buf = {41022448, -2003115773786864544, 0, 512000,
0, 139921496446720, 1925155791345796192, -2003116255834028960},
__mask_was_saved = 0}}, __pad = {0x7f4203178e30, 0x0, 0x0, 0x0}}</p>
__cancel_arg = 0x7f4203179700<br>
__not_first_call = <optimized out>
<p class="MsoNormal"> ret = <optimized out></p>
a = {start_routine = 0x7f4207cadbb0 <do_monitor>, data
= 0x0, name = 0x271f3f0 "do_monitor", ' ' <repeats 11 times>,
"started at [29397] chan_sip.c restart_monitor()"}<br>
#17 0x00007f425e0c6e65 in start_thread () from
/lib64/libpthread.so.0
<p><br>
</p>
<div class="moz-signature">
<style type="text/css">
* { padding: 0px; margin: 0px; }
body, html { font-family: Arial, San-Serif; font-size: small; color: black; padding-left: 10px; padding-top: 3px; }
a { text-decoration: none; color: #818285; }
h1 { font-size: large; }
table { font-size: 12px; }
p + p { padding-top: 1em; }
</style>
<p>Kind Regards,<br>
Jaco</p>
</div>
</body>
</html>