[asterisk-dev] hashtab memory corruption and resulting segfaults due to lack of thread safety

Jaco Kroon jaco at uls.co.za
Thu Dec 12 12:50:46 CST 2019


Hi All,

So I've received a few backtraces from a customer (Still using ast 11
... we're in progress of getting them migrated).  They all follow this
pattern (full stack below):

#0  ast_hashtab_lookup_internal (tab=0x7f41bc277920, tab=0x7f41bc277920,
h=<optimized out>, obj=0x7f4203176a50) at hashtab.c:602

        *b = 0x1041**
*#1  ast_hashtab_lookup (tab=0x7f41bc277920,
obj=obj at entry=0x7f4203176a50) at hashtab.c:548#2  0x000000000053af60 in
pbx_find_extension (chan=chan at entry=0x0, bypass=bypass at entry=0x0,
q=q at entry=0x7f4203176e10, context=context at entry=0x7f4207f12c70
<sip_cfg+80> "sip_registrations", exten=exten at entry=0x7f4203176d10
"filtered",    priority=priority at entry=1, label=label at entry=0x0,
callerid=callerid at entry=0x7f4207cdde03 "", action=action at entry=E_MATCH)
at pbx.c:3438

#3  0x00007f4207c4d7c1 in register_peer_exten
(peer=peer at entry=0x7f41bd4a0ca8, onoff=0) at chan_sip.c:5023

#4  0x00007f4207c65bfa in sip_destroy_peer (peer=0x7f41bd4a0ca8) at
chan_sip.c:5084

Note the pointer value of b which comes from tab->array.  In this
particular case this was the actual value from tab->array[1] ... a
pointer which points to the first page of memory, and an obvious invalid
pointer.

After some digging I realized that the following hashtab creations does
NOT set the do_locking argument to 1 (master branch, but verified
against 13 too, so I'm assuming 15, 16 and 17 as well):

./main/pbx.c:

6197             contexts_table = ast_hashtab_create(17,
6198                 ast_hashtab_compare_contexts,
6199                 ast_hashtab_resize_java,
6200                 ast_hashtab_newsize_java,
6201                 ast_hashtab_hash_contexts,
*6202                 0);*

7512         tmp->peer_table = ast_hashtab_create(13,
7513                         hashtab_compare_exten_numbers,
7514                         ast_hashtab_resize_java,
7515                         ast_hashtab_newsize_java,
7516                         hashtab_hash_priority,
*7517                         0);*

7518         tmp->peer_label_table = ast_hashtab_create(7,
7519                             hashtab_compare_exten_labels,
7520                             ast_hashtab_resize_java,
7521                             ast_hashtab_newsize_java,
7522                             hashtab_hash_labels,
*7523                             0);*

After scoping out hashtab.c, and in spite of various comments in the
header, I'm by no means convinced that this code is in any way
threadsafe without that argument being set to 1.  For pbx.c it may be
just fine in a read-heavy environment, but with regcontext as per above
that assumption goes out the door.  And I'm surprised the frequency of
this particular crash is approximately once a week over an 8-node setup.

In master (same in 13 branch) there is four more uses in pbx/pbx_*.c
which also does the same thing, and I'm assuming will suffer the same
consequences.  The EOL 11 branch has a few extra uses, mostly setting
locking to on.

In 13 branch there is actually a few extra users in main/pbx.c, three
specifically, none of them sets locking.

Since the dialplan code is accessed from multiple threads - can we
please get an explanation of how hashtab.c protects against concurrent
access given the lack of do_locking?  From the header:

 66  6. Recently updated the hash routines to use Doubly-linked lists
for buckets,
 67     and added a doubly-linked list that threads thru every bucket in
the table.
 68     The list of all buckets is on the HashTab struct. The Traversal
was modified
 69     to go thru this list instead of searching the bucket array for
buckets.
 70     *This also should make it safe to remove a bucket during the
traversal.*
 71     Removal and destruction routines will work faster.

On line 70 specifically, I don't see any way this can be true.

May I suggest that the argument for do_locking is discarded, and
do_locking is forced on:

diff --git a/main/hashtab.c b/main/hashtab.c
index 048dc79274..9e154e476f 100644
--- a/main/hashtab.c
+++ b/main/hashtab.c
@@ -260,10 +260,9 @@ ast_hashtab_create
        ht->resize = resize;
        ht->newsize = newsize;
        ht->hash = hash;
-       ht->do_locking = do_locking;
+       ht->do_locking = 1;
 
-       if (do_locking)
-               ast_rwlock_init(&ht->lock);
+       ast_rwlock_init(&ht->lock);
 
        if (!ht->resize)
                ht->resize = ast_hashtab_resize_java;


Happy to submit both a bug report and code review based on feedback from
the ML.  Alternatively I'd push a patch that sets do_locking to 1 for
all invocations of ast_hashtab_create.  Or one could even remove the
do_locking variable completely.  Happy to follow any of these
approaches, the above is probably the least correct of the three, but
the smallest change.

(gdb) bt full

#0  ast_hashtab_lookup_internal (tab=0x7f41bc277920, tab=0x7f41bc277920,
h=<optimized out>, obj=0x7f4203176a50) at hashtab.c:602

        b = 0x1041#1  ast_hashtab_lookup (tab=0x7f41bc277920,
obj=obj at entry=0x7f4203176a50) at hashtab.c:548
        h = <optimized out>
        __PRETTY_FUNCTION__ = "ast_hashtab_lookup"

#2  0x000000000053af60 in pbx_find_extension (chan=chan at entry=0x0,
bypass=bypass at entry=0x0, q=q at entry=0x7f4203176e10,
context=context at entry=0x7f4207f12c70 <sip_cfg+80> "sip_registrations",
exten=exten at entry=0x7f4203176d10 "filtered",   
priority=priority at entry=1, label=label at entry=0x0,
callerid=callerid at entry=0x7f4207cdde03 "", action=action at entry=E_MATCH)
at pbx.c:3438
        match = 1

        x = <optimized out>
        res = <optimized out>

        tmp = <optimized out>

        e = 0x0
        eroot = <optimized out>

        i = 0x0

        sw = 0x0
        pattern = {exten = 0x0, name = 0x0, matchcid = 0, cidmatch =
0x0, cidmatch_display = 0x0, priority = 1, label = 0x0, parent = 0x0,
app = 0x0, cached_app = 0x0, data = 0x0, datad = 0x0, peer = 0x0,
peer_table = 0x0,

          peer_label_table = 0x0, registrar = 0x0, next = 0x0, stuff =
0x7f4203176ad8 "\377\377\377\377\377\377\377\377}\202s_B\177"}

        score = {total_specificity = 0, total_length = 0, last_char = 0
'\000', canmatch = 0, node = 0x0, canmatch_exten = 0x0, exten = 0x0}
        tmpdata = 0x0

#3  0x00007f4207c4d7c1 in register_peer_exten
(peer=peer at entry=0x7f41bd4a0ca8, onoff=0) at chan_sip.c:5023

        multi =
"filtered\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000\002C\225\000\000\000\000\000"...
        stringp = 0x0

        ext = 0x7f4203176d10 "filtered"

        context = 0x7f4207f12c70 <sip_cfg+80> "sip_registrations"
        q = {incstack = {0x0 <repeats 128 times>}, stacklen = 0, status
= 3, swo = 0x0, data = 0x0, foundcontext = 0x0}

#4  0x00007f4207c65bfa in sip_destroy_peer (peer=0x7f41bd4a0ca8) at
chan_sip.c:5084

No locals.
#5  sip_destroy_peer_fn (peer=0x7f41bd4a0ca8) at chan_sip.c:5048
No locals.

#6  0x000000000044a5f5 in internal_ao2_ref (func=0x659bf3
<__FUNCTION__.9120> "__ao2_ref", line=551, file=0x659853 "astobj2.c",
delta=delta at entry=-1, user_data=user_data at entry=0x7f41bd4a0ca8) at
astobj2.c:469

        obj_mutex = <optimized out>
        obj_rwlock = <optimized out>

        current_value = 0

#7  __ao2_ref (user_data=user_data at entry=0x7f41bd4a0ca8,
delta=delta at entry=-1) at astobj2.c:551
No locals.

#8  0x00007f4207c5e49e in sip_unref_peer
(peer=peer at entry=0x7f41bd4a0ca8, tag=tag at entry=0x7f4207cef7b8
"register_verify: sip_unref_peer: tossing stack peer pointer at end of
func") at chan_sip.c:3140

No locals.
#9  0x00007f4207ccba69 in register_verify (p=p at entry=0x7f41bd4a93d8,
addr=addr at entry=0x7f42031781b0, req=req at entry=0x7f4203178240,
uri=uri at entry=0x7f41bc45f731 "sip:filtered") at chan_sip.c:17280

        res = AUTH_SUCCESSFUL

        peer = <optimized out>
        tmp =
"<sip:filtered\000\000B\177\000\000\000\000\000\000\000\000\000\000ؓJ\275A\177\000\000\020\201\027\003B\177\000\000@\212\346\274A\177\000\000\200\210\346\274A\177\000\000
\000\000\274A\177\000\000\300\001\000\000\000\000\000\000P\002\000\000\000\000\000\000\320\373#\274A\177\000\000\362\360v_B\177\000\000\260u\027\003B\177\000\000\360t\027\003B\177",
'\000' <repeats 18 times>,
"@v\027\003B\177\000\000Pv\027\003B\177\000\000\001\000\000\000\000\000\000\000\220\210\346\274A\177\000\000
\000\000\274A\177\000\000G\002\000\000\000\000\000\000\200\210\346\274A\177\000\000P"...


        c = <optimized out>

        name = 0x7f4203177475 "filtered"
        unused_password = 0x7f4207cdde03 "filtered"

        domain = 0x7f420317747d "filtered"

        uri2 = <optimized out>
        send_mwi = <optimized out>
        __PRETTY_FUNCTION__ = "register_verify"

#10 0x00007f4207cce07e in handle_request_register (e=0x7f41bc45f731
"sip:filtered", addr=0x7f42031781b0, req=0x7f4203178240,
p=0x7f41bd4a93d8) at chan_sip.c:28358

        res = <optimized out>
#11 handle_incoming (p=p at entry=0x7f41bd4a93d8,
req=req at entry=0x7f4203178240, addr=addr at entry=0x7f42031781b0,
recount=recount at entry=0x7f4203178140,
nounlock=nounlock at entry=0x7f4203178150) at chan_sip.c:28666

        cmd = 0x7f41bc45f728 "REGISTER"

        cseq = <optimized out>
        useragent = <optimized out>

        via = <optimized out>

        callid = <optimized out>
        via_pos = 2

        seqno = 139

        len = 3
        respid = 0

        res = 0

        e = 0x7f41bc45f731 "sip:filtered"
        error = 0

        oldmethod = 2

        acked = 0
        __PRETTY_FUNCTION__ = "handle_incoming"

#12 0x00007f4207cd01c7 in handle_request_do
(req=req at entry=0x7f4203178240, addr=addr at entry=0x7f42031781b0) at
chan_sip.c:28834

        p = 0x7f41bd4a93d8
        owner_chan_ref = 0x0

        recount = 0

        nounlock = 0
        __PRETTY_FUNCTION__ = "handle_request_do"

#13 0x00007f4207cd1a6d in sipsock_read (id=<optimized out>,
fd=<optimized out>, events=<optimized out>, ignore=<optimized out>) at
chan_sip.c:28765

        req = {rlpart1 = 0, rlpart2 = 9, headers = 12, method = 2, lines
= 0, sdp_start = 0, sdp_count = 0, debug = 0 '\000', has_to_tag = 0
'\000', ignore = 0 '\000', authenticated = 0 '\000', header = {0, 35,
123, 141, 181, 223, 242,
            317, 334, 347, 379, 399, 558, 0 <repeats 51 times>}, line =
{558, 0 <repeats 255 times>}, data = 0x7f41bc45f710, content = 0x0,
socket = {type = SIP_TRANSPORT_UDP, fd = -1, port = 50195,
tcptls_session = 0x0,

            ws_session = 0x0}, next = {next = 0x0}, reqsipoptions = 0}

        addr = {ss = {ss_family = 2,
            __ss_padding =
"\023\330\n.JK\000\000\000\000\000\000\000\000\364\000\000\000\000\000\000\000\062\001\000\000\000\000\000\000E\001",
'\000' <repeats 14 times>,
"\214\001\000\000\000\000\000\000\260}\027\003B\177\000\000\377\377\377\377\000\000\000\000\020\002\000\001
\000\000\000@\204\027\003B\177\000\000\200\360\252_B\177", '\000'
<repeats 25 times>, __ss_align = 0}, len = 16}

        res = <optimized out>

        readbuf = "REGISTER sip:filtered SIP/2.0\r\nVia: SIP/2.0/UDP
filtered:5080;rport;branch=z9hG4bK-1576026621656-MyPhone-filtered-70\r\nContent-Length:
0\r\nContact: <sip:filtered at filtered:5080>\r\nCall-ID: 157602"...
        __PRETTY_FUNCTION__ = "sipsock_read"

#14 0x00000000004fbbea in ast_io_wait (ioc=0x2630220, howlong=<optimized
out>) at io.c:292

        x = <optimized out>
        origcnt = <optimized out>

#15 0x00007f4207cade9f in do_monitor (data=data at entry=0x0) at
chan_sip.c:29364

        res = <optimized out>
        t = 1576026621

        reloading = <optimized out>

        __PRETTY_FUNCTION__ = "do_monitor"
#16 0x000000000058e23a in dummy_start (data=<optimized out>) at
utils.c:1223

        __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf =
{41022448, -2003115773786864544, 0, 512000, 0, 139921496446720,
1925155791345796192, -2003116255834028960}, __mask_was_saved = 0}},
__pad = {0x7f4203178e30, 0x0, 0x0, 0x0}}

        __cancel_arg = 0x7f4203179700
        __not_first_call = <optimized out>

        ret = <optimized out>

        a = {start_routine = 0x7f4207cadbb0 <do_monitor>, data = 0x0,
name = 0x271f3f0 "do_monitor", ' ' <repeats 11 times>, "started at
[29397] chan_sip.c restart_monitor()"}
#17 0x00007f425e0c6e65 in start_thread () from /lib64/libpthread.so.0


Kind Regards,
Jaco

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20191212/79661b61/attachment-0001.html>


More information about the asterisk-dev mailing list