[asterisk-dev] Deadlock in SIP processing in Asterisk 16.

Kevin Harwell kharwell at sangoma.com
Tue Nov 23 15:49:11 CST 2021


Response inline.

On Tue, Nov 23, 2021 at 2:06 PM Steve Sether <ssether at usinternet.com> wrote:

> We've had a couple instances of a deadlocks recently.  This happened while
> we were trying to move phones from one Asterisk server to another.  It's
> too much to describe here what goes in in this process, but from an
> Asterisk perspective it's a lot of reloading the dialplan regenerating
> pjsip.conf, reloading pjsip, and phones re-registering to the new Asterisk
> server through a proxy.
>
> The sip freeze has happened twice now when we move upwards of 80 phones at
> a time.
>
> The second time it happened we were able to get a core dump, and run it
> through the scripts built in to Asterisk.  The output is...  voluminous.  I
> tried running the core file through a scripting tool that for gdb that
> claims to find deadlocks (and worked before), but it crashes when it's
> expecting (I think?) a certain kind of locking that it's not seeing.
>
> Mainly, I need some kind of guidance on how to find out what's locking up
> Asterisk.  I can't seem to find anything that references the locks being
> held.  i.e. I don't really know what threads are locking the others, and
> other guides in using gdb don't seem to be much help.  I'm not a C
> programmer, so I don't have much experience in semaphores, low level locks,
> etc.
>
> Can anyone provide some kind of assistance, and look through the following
> for where to look?  Something is certainly locking up our Asterisk server,
> but at the moment I'm pretty stuck.  (I have the log ending in "-full", if
> needed but posting it here seems like more of a distraction than anything
> else).
>
> I've tried to edit this file down to more unique entries, and removed
> anything that's doesn't look like it's locked so it's at least possibly
> semi-readable. There's quite a few threads that are largely duplicates
>
>
> Any help is appreciated.  Thanks.
>
>
> Thread 113 (Thread 0x7faf4deac700 (LWP 17772)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007fb093276c1b in pbx_load_module () at pbx_config.c:2091
> #4  0x00007fb093276e1e in handle_cli_dialplan_reload (e=<value optimized
> out>, cmd=<value optimized out>, a=<value optimized out>) at
> pbx_config.c:1597
> #5  0x00000000004ba668 in ast_cli_command_full (uid=-1, gid=-1, fd=112,
> s=0x7fafe800d839 "dialplan reload") at cli.c:2995
> #6  0x000000000060213b in action_command (s=0x7faf4deabcd0,
> m=0x7faf4deab3e0) at manager.c:5318
> #7  0x0000000000606af0 in process_message (s=0x7faf4deabcd0,
> m=0x7faf4deab3e0) at manager.c:6689
> #8  0x0000000000609dcb in do_message (data=0x7fb034002930) at
> manager.c:6902
> #9  session_do (data=0x7fb034002930) at manager.c:7007
> #10 0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
>
This thread is attempting to grab the 'reload_lock' lock, but is waiting on
...

I'm going to call the 'unknown thread id'. Note, this 'unknown thread id'
is missing some of the stack trace, but will still try to make an educated
guess.

> --
> #7  0x00000000005137b0 in execute_state_callback (cb=0x5fa500
> <manager_state_cb>, context=0x7faf703f3f88 "blf-danielBarnes126266",
> exten=0x7faf703f4118 "danielBarnes126266-user1", data=0x0, reason=<value
> optimized out>, hint=0x7faf
> 703f4600, device_state_info=0x0) at pbx.c:3284
> #8  0x00000000005169d6 in ast_add_hint (e=0x7faf703f4080) at pbx.c:4057
> #9  0x0000000000521c53 in ast_add_extension2_lockopt (con=0x7faf703f3e90,
> replace=0, extension=<value optimized out>, priority=-1, label=<value
> optimized out>, callerid=<value optimized out>, application=<value
> optimized out>, data
> =0x7faf703f4060, datad=0x45bdd0 <ast_free_ptr>, registrar=0x7fb09327b9b8
> "pbx_config", registrar_file=0x7faf70091221
> "exec.1636994706829225.140397385758464", registrar_line=1218,
> lock_context=1) at pbx.c:7572
> #10 0x00000000005224da in ast_add_extension2 (con=<value optimized out>,
> replace=<value optimized out>, extension=<value optimized out>,
> priority=<value optimized out>, label=<value optimized out>,
> callerid=<value optimized out>, a
> pplication=0x7faf703f3fee
> "PJSIP/xxxxxxx&Custom:unavailable&Custom:call_state_xxxxxxx",
> data=0x7faf703f4060, datad=0x45bdd0 <ast_free_ptr>,
> registrar=0x7fb09327b9b8 "pbx_config", registrar_f
> ile=0x7faf70091221 "exec.1636994706829225.140397385758464",
> registrar_line=1218) at pbx.c:7304
> #11 0x00007fb09327641b in pbx_load_config (config_file=0x7fb09327b9e0
> "extensions.conf") at pbx_config.c:1890
> #12 0x00007fb093276c2e in pbx_load_module () at pbx_config.c:2101
> #13 0x00007fb093276e1e in handle_cli_dialplan_reload (e=<value optimized
> out>, cmd=<value optimized out>, a=<value optimized out>) at
> pbx_config.c:1597
> #14 0x00000000004ba668 in ast_cli_command_full (uid=-1, gid=-1, fd=106,
> s=0x7faf70000f39 "dialplan reload") at cli.c:2995
> #15 0x000000000060213b in action_command (s=0x7fb0d04e2cd0,
> m=0x7fb0d04e23e0) at manager.c:5318
> #16 0x0000000000606af0 in process_message (s=0x7fb0d04e2cd0,
> m=0x7fb0d04e23e0) at manager.c:6689
> #17 0x0000000000609dcb in do_message (data=0x7fb034005820) at
> manager.c:6902
> #18 session_do (data=0x7fb034005820) at manager.c:7007
> #19 0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> --
>
This thread is holding the 'reload_lock', which is blocking Thread 113
above. It then grabs the hints container lock, in ast_add_hint. While
holding the hints lock, ast_add_hint then calls the manager_state_cb
function. Looking there you can see the manager_state_cb function attempts
to get ast_rdlock_contexts.

So, this 'unknown thread id' holds the hints container lock, and is blocked
getting the ast_rdlock_contexts, i.e. 'conlock'.

Now let's jump to Thread 61 below.

We have many of these threads running dialplan reload.  I suspect something
> else is locking them?
>
> Thread 110 (Thread 0x7fb0d0467700 (LWP 20543)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007fb093276c1b in pbx_load_module () at pbx_config.c:2091
> #4  0x00007fb093276e1e in handle_cli_dialplan_reload (e=<value optimized
> out>, cmd=<value optimized out>, a=<value optimized out>) at
> pbx_config.c:1597
> #5  0x00000000004ba668 in ast_cli_command_full (uid=-1, gid=-1, fd=118,
> s=0x7fafbc002009 "dialplan reload") at cli.c:2995
> #6  0x000000000060213b in action_command (s=0x7fb0d0466cd0,
> m=0x7fb0d04663e0) at manager.c:5318
> #7  0x0000000000606af0 in process_message (s=0x7fb0d0466cd0,
> m=0x7fb0d04663e0) at manager.c:6689
> #8  0x0000000000609dcb in do_message (data=0x7fb03400b860) at
> manager.c:6902
> #9  session_do (data=0x7fb03400b860) at manager.c:7007
> #10 0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
>
>
> Thread 120 (Thread 0x7faf4d860700 (LWP 11287)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x000000000051cbb5 in ast_rdlock_contexts (c=0x7fb0bc238720, con=0x0,
> context=0x7fb0bc2390e0 "call_to_user_core", exten=0x652011 "h", priority=1,
> label=0x0, callerid=0x7faff4e2ff60 "6121111111", action=E_MATCH, found=0x0,
> combined_find_spawn=0) at pbx.c:8509
> #4  pbx_extension_helper (c=0x7fb0bc238720, con=0x0,
> context=0x7fb0bc2390e0 "call_to_user_core", exten=0x652011 "h", priority=1,
> label=0x0, callerid=0x7faff4e2ff60 "6126166579", action=E_MATCH, found=0x0,
> combined_find_spawn=0) at pbx.c:2886
> #5  0x0000000000527228 in ast_exists_extension (c=0x7fb0bc238720,
> args=0x0) at pbx.c:4181
> #6  __ast_pbx_run (c=0x7fb0bc238720, args=0x0) at pbx.c:4595
> #7  0x000000000052752b in pbx_thread (data=<value optimized out>) at
> pbx.c:4704
> #8  0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #9  0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
>
>
> There's also many of these threads locked on this line in
> taskprocessor.c:202
>
> Thread 119 (Thread 0x7fb0443b8700 (LWP 11099)):
> #0  0x00007fb0d292ca00 in sem_wait () from /lib64/libpthread.so.0
> #1  0x000000000058eab0 in ast_sem_wait (data=0x4ce9c60) at
> /usr/src/debug/asterisk-16.20.0/include/asterisk/sem.h:61
> #2  default_tps_processing_function (data=0x4ce9c60) at taskprocessor.c:202
> #3  0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #4  0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #5  0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
> Thread 118 (Thread 0x7fb0507ca700 (LWP 27278)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x000000000051cbb5 in ast_rdlock_contexts (c=0x0, con=0x0,
> context=0x7fb02a3f405e "from-sip-external", exten=0x7fb0016480e0
> "16121111111", priority=1, label=0x0, callerid=0x0, action=E_MATCH,
> found=0x0, combined_find_spawn=0) at pbx.c:8509
> #4  pbx_extension_helper (c=0x0, con=0x0, context=0x7fb02a3f405e
> "from-sip-external", exten=0x7fb0016480e0 "16121111111", priority=1,
> label=0x0, callerid=0x0, action=E_MATCH, found=0x0, combined_find_spawn=0)
> at pbx.c:2886
> #5  0x000000000051d504 in ast_exists_extension (c=<value optimized out>,
> context=<value optimized out>, exten=<value optimized out>, priority=<value
> optimized out>, callerid=<value optimized out>) at pbx.c:4181
> #6  0x00007fb05ea8766e in get_destination (session=0x7fb0016480e0,
> rdata=<value optimized out>) at res_pjsip_session.c:3703
> #7  0x00007fb05ea9a1ec in new_invite (rdata=0x7faffc086e78) at
> res_pjsip_session.c:3939
> #8  handle_new_invite_request (rdata=0x7faffc086e78) at
> res_pjsip_session.c:4114
> #9  session_on_rx_request (rdata=0x7faffc086e78) at
> res_pjsip_session.c:4189
> #10 0x00007fb0d471184e in pjsip_endpt_process_rx_data (endpt=<value
> optimized out>, rdata=0x7faffc086e78, p=0x7fb0905fc500,
> p_handled=0x7fb0507c9c7c) at ../src/pjsip/sip_endpoint.c:930
> #11 0x00007fb0903d1dcd in distribute (data=0x7faffc086e78) at
> res_pjsip/pjsip_distributor.c:955
> #12 0x000000000058e046 in ast_taskprocessor_execute (tps=0x4388e10) at
> taskprocessor.c:1237
> #13 0x00000000005963e0 in execute_tasks (data=0x4388e10) at
> threadpool.c:1354
> #14 0x000000000058e046 in ast_taskprocessor_execute (tps=0x1274b70) at
> taskprocessor.c:1237
> #15 0x00000000005980a1 in threadpool_execute (arg=0x7fb01c001af0) at
> threadpool.c:367
> #16 worker_active (arg=0x7fb01c001af0) at threadpool.c:1137
> #17 worker_start (arg=0x7fb01c001af0) at threadpool.c:1056
> #18 0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #19 0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
>
> Thread 107 (Thread 0x7faf4dd38700 (LWP 22760)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x000000000051cbb5 in ast_rdlock_contexts (c=0x7fb008118cb0, con=0x0,
> context=0x7fb008119670 "call_to_user_core", exten=0x652011 "h", priority=1,
> label=0x0, callerid=0x7fafd866b440 "612xxxxxxx", action=E_MATCH, found=0x0,
> combined_find_spawn=0) at pbx.c:8509
> #4  pbx_extension_helper (c=0x7fb008118cb0, con=0x0,
> context=0x7fb008119670 "call_to_user_core", exten=0x652011 "h", priority=1,
> label=0x0, callerid=0x7fafd866b440 "612xxxxxxx", action=E_MATCH, found=0x0,
> combined_find_spawn=0) at pbx.c:2886
> #5  0x0000000000527228 in ast_exists_extension (c=0x7fb008118cb0,
> args=0x0) at pbx.c:4181
> #6  __ast_pbx_run (c=0x7fb008118cb0, args=0x0) at pbx.c:4595
> #7  0x000000000052752b in pbx_thread (data=<value optimized out>) at
> pbx.c:4704
> #8  0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #9  0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
> Thread 105 (Thread 0x7faf4e09c700 (LWP 663)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x000000000051cbb5 in ast_rdlock_contexts (c=0x7fb0bfeadc50, con=0x0,
> context=0x7fb0bfeae610 "proxy-dial", exten=0x652011 "h", priority=1,
> label=0x0, callerid=0x7faff82d65d0 "612xxxxxxx", action=E_MATCH, found=0x0,
> combined_find_spawn=0) at pbx.c:8509
> #4  pbx_extension_helper (c=0x7fb0bfeadc50, con=0x0,
> context=0x7fb0bfeae610 "proxy-dial", exten=0x652011 "h", priority=1,
> label=0x0, callerid=0x7faff82d65d0 "612xxxxxxx", action=E_MATCH, found=0x0,
> combined_find_spawn=0) at pbx.c:2886
> #5  0x0000000000527228 in ast_exists_extension (c=0x7fb0bfeadc50,
> args=0x0) at pbx.c:4181
> #6  __ast_pbx_run (c=0x7fb0bfeadc50, args=0x0) at pbx.c:4595
> #7  0x000000000052752b in pbx_thread (data=<value optimized out>) at
> pbx.c:4704
> #8  0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #9  0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
> Thread 103 (Thread 0x7faf4e020700 (LWP 25155)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x000000000051cbb5 in ast_rdlock_contexts (c=0x0, con=0x0,
> context=0x7faf0743c47e "from-internal-phone", exten=0x7fb01800c8c0
> "8744405", priority=1, label=0x0, callerid=0x0, action=E_MATCH, found=0x0,
> combined_find_spawn=0) at pbx.c:8509
> #4  pbx_extension_helper (c=0x0, con=0x0, context=0x7faf0743c47e
> "from-internal-phone", exten=0x7fb01800c8c0 "8744405", priority=1,
> label=0x0, callerid=0x0, action=E_MATCH, found=0x0, combined_find_spawn=0)
> at pbx.c:2886
> #5  0x000000000051d504 in ast_exists_extension (c=<value optimized out>,
> context=<value optimized out>, exten=<value optimized out>, priority=<value
> optimized out>, callerid=<value optimized out>) at pbx.c:4181
> #6  0x00007fb05ea8766e in get_destination (session=0x7fb01800c8c0,
> rdata=<value optimized out>) at res_pjsip_session.c:3703
> #7  0x00007fb05ea9a1ec in new_invite (rdata=0x7faffc135f28) at
> res_pjsip_session.c:3939
> #8  handle_new_invite_request (rdata=0x7faffc135f28) at
> res_pjsip_session.c:4114
> #9  session_on_rx_request (rdata=0x7faffc135f28) at
> res_pjsip_session.c:4189
> #10 0x00007fb0d471184e in pjsip_endpt_process_rx_data (endpt=<value
> optimized out>, rdata=0x7faffc135f28, p=0x7fb0905fc500,
> p_handled=0x7faf4e01fc7c) at ../src/pjsip/sip_endpoint.c:930
> #11 0x00007fb0903d1dcd in distribute (data=0x7faffc135f28) at
> res_pjsip/pjsip_distributor.c:955
> #12 0x000000000058e046 in ast_taskprocessor_execute (tps=0x37b1530) at
> taskprocessor.c:1237
> #13 0x00000000005963e0 in execute_tasks (data=0x37b1530) at
> threadpool.c:1354
> #14 0x000000000058e046 in ast_taskprocessor_execute (tps=0x1274b70) at
> taskprocessor.c:1237
> #15 0x00000000005980a1 in threadpool_execute (arg=0x7fb01c050b60) at
> threadpool.c:367
> #16 worker_active (arg=0x7fb01c050b60) at threadpool.c:1137
> #17 worker_start (arg=0x7fb01c050b60) at threadpool.c:1056
> #18 0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #19 0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
> Thread 101 (Thread 0x7fb05074e700 (LWP 26565)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x000000000051cbb5 in ast_rdlock_contexts (c=0x7fb00813ece0, con=0x0,
> context=0x7fb00813f6a0 "call_to_user_core", exten=0x652011 "h", priority=1,
> label=0x0, callerid=0x7fb0ac16dd10 "952xxxxxxx", action=E_MATCH, found=0x0,
> combined_find_spawn=0) at pbx.c:8509
> #4  pbx_extension_helper (c=0x7fb00813ece0, con=0x0,
> context=0x7fb00813f6a0 "call_to_user_core", exten=0x652011 "h", priority=1,
> label=0x0, callerid=0x7fb0ac16dd10 "952xxxxxxx", action=E_MATCH, found=0x0,
> combined_find_spawn=0) at pbx.c:2886
> #5  0x0000000000527228 in ast_exists_extension (c=0x7fb00813ece0,
> args=0x0) at pbx.c:4181
> #6  __ast_pbx_run (c=0x7fb00813ece0, args=0x0) at pbx.c:4595
> #7  0x000000000052752b in pbx_thread (data=<value optimized out>) at
> pbx.c:4704
> #8  0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #9  0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
>
> Thread 95 (Thread 0x7faf4edb0700 (LWP 24067)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00000000005242bd in ast_rdlock_contexts (c=<value optimized out>,
> context=<value optimized out>, exten=<value optimized out>,
> device_state_info=0x0) at pbx.c:8509
> #4  ast_hint_extension (c=<value optimized out>, context=<value optimized
> out>, exten=<value optimized out>, device_state_info=0x0) at pbx.c:3000
> #5  internal_extension_state_extended (c=<value optimized out>,
> context=<value optimized out>, exten=<value optimized out>,
> device_state_info=0x0) at pbx.c:3146
> #6  0x000000000060486b in action_extensionstate (s=0x7faf4edafcd0,
> m=0x7faf4edaf3e0) at manager.c:5988
> #7  0x0000000000606af0 in process_message (s=0x7faf4edafcd0,
> m=0x7faf4edaf3e0) at manager.c:6689
> #8  0x0000000000609dcb in do_message (data=0x7fb034011680) at
> manager.c:6902
> #9  session_do (data=0x7fb034011680) at manager.c:7007
> #10 0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #11 0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #12 0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
> Thread 90 (Thread 0x7fb0d0657700 (LWP 27240)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x000000000051cbb5 in ast_rdlock_contexts (c=0x0, con=0x0,
> context=0x7fb02a3f153e "from-sip-external", exten=0x7fb00814e570
> "1612xxxxxxx", priority=1, label=0x0, callerid=0x0, action=E_MATCH,
> found=0x0, combined_find_spawn=0) at pbx.c:8509
> #4  pbx_extension_helper (c=0x0, con=0x0, context=0x7fb02a3f153e
> "from-sip-external", exten=0x7fb00814e570 "1612xxxxxxx", priority=1,
> label=0x0, callerid=0x0, action=E_MATCH, found=0x0, combined_find_spawn=0)
> at pbx.c:2886
> #5  0x000000000051d504 in ast_exists_extension (c=<value optimized out>,
> context=<value optimized out>, exten=<value optimized out>, priority=<value
> optimized out>, callerid=<value optimized out>) at pbx.c:4181
> #6  0x00007fb05ea8766e in get_destination (session=0x7fb00814e570,
> rdata=<value optimized out>) at res_pjsip_session.c:3703
> #7  0x00007fb05ea9a1ec in new_invite (rdata=0x7faffc023458) at
> res_pjsip_session.c:3939
> #8  handle_new_invite_request (rdata=0x7faffc023458) at
> res_pjsip_session.c:4114
> #9  session_on_rx_request (rdata=0x7faffc023458) at
> res_pjsip_session.c:4189
> #10 0x00007fb0d471184e in pjsip_endpt_process_rx_data (endpt=<value
> optimized out>, rdata=0x7faffc023458, p=0x7fb0905fc500,
> p_handled=0x7fb0d0656c7c) at ../src/pjsip/sip_endpoint.c:930
> #11 0x00007fb0903d1dcd in distribute (data=0x7faffc023458) at
> res_pjsip/pjsip_distributor.c:955
> #12 0x000000000058e046 in ast_taskprocessor_execute (tps=0x5203c50) at
> taskprocessor.c:1237
> #13 0x00000000005963e0 in execute_tasks (data=0x5203c50) at
> threadpool.c:1354
> #14 0x000000000058e046 in ast_taskprocessor_execute (tps=0x1274b70) at
> taskprocessor.c:1237
> #15 0x00000000005980a1 in threadpool_execute (arg=0x7fb01c0073c0) at
> threadpool.c:367
> #16 worker_active (arg=0x7fb01c0073c0) at threadpool.c:1137
> #17 worker_start (arg=0x7fb01c0073c0) at threadpool.c:1056
> #18 0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #19 0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
>
> Thread 61 (Thread 0x7faf4dcbc700 (LWP 13778)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x000000000045cb16 in __ao2_lock (user_data=<value optimized out>,
> lock_how=<value optimized out>, file=0x647703 "pbx.c", func=0x64b031
> "ast_add_hint", line=4026, var=<value optimized out>) at astobj2.c:241
> #4  0x00000000005167b9 in ast_add_hint (e=0x7faf6401dff0) at pbx.c:4026
> #5  0x0000000000521c53 in ast_add_extension2_lockopt (con=0x7faf6401dee0,
> replace=1, extension=<value optimized out>, priority=-1, label=<value
> optimized out>, callerid=<value optimized out>, application=<value
> optimized out>, data=0x7faf640017b0, datad=0x45bdd0 <ast_free_ptr>,
> registrar=0x7fb09327b9b8 "pbx_config", registrar_file=0x0,
> registrar_line=0, lock_context=1) at pbx.c:7572
> #6  0x00000000005235e5 in ast_add_extension2 (context=<value optimized
> out>, replace=1, extension=0x7faf4dcbab20 "zzzzzzz-user1", priority=-1,
> label=0x0, callerid=0x0, application=0x7faf4dcbab36
> "PJSIP/zzzzzzz&Custom:unavailable", data=0x7faf640017b0, datad=0x45bdd0
> <ast_free_ptr>, registrar=0x7fb09327b9b8 "pbx_config") at pbx.c:7304
> #7  ast_add_extension (context=<value optimized out>, replace=1,
> extension=0x7faf4dcbab20 "zzzzzzz-user1", priority=-1, label=0x0,
> callerid=0x0, application=0x7faf4dcbab36
> "PJSIP/zzzzzzz&Custom:unavailable", data=0x7faf640017b0, datad=0x45bdd0
> <ast_free_ptr>, registrar=0x7fb09327b9b8 "pbx_config") at pbx.c:6979
> #8  0x00007fb093277204 in handle_cli_dialplan_add_extension (e=<value
> optimized out>, cmd=<value optimized out>, a=0x7faf4dcbae90) at
> pbx_config.c:1156
> #9  0x00000000004ba668 in ast_cli_command_full (uid=-1, gid=-1, fd=107,
> s=0x7faf6401dca9 "dialplan add extension
> zzzzzzz-user1,hint,PJSIP/zzzzzzz&Custom:unavailable into blf-zzzzzzz
> replace") at cli.c:2995
> #10 0x000000000060213b in action_command (s=0x7faf4dcbbcd0,
> m=0x7faf4dcbb3e0) at manager.c:5318
> #11 0x0000000000606af0 in process_message (s=0x7faf4dcbbcd0,
> m=0x7faf4dcbb3e0) at manager.c:6689
> #12 0x0000000000609dcb in do_message (data=0x7fb034007790) at
> manager.c:6902
> #13 session_do (data=0x7fb034007790) at manager.c:7007
> #14 0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #15 0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #16 0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
> Here, ast_add_extension calls find_context_locked, which locks 'conlock'
(in ast_rdlock_contexts). ast_add_extension then calls ast_add_extension2,
which calls ast_add_extension2_lockopt, which calls ast_add_hint.
ast_add_hint then tries to lock the hints container.

So thread 61 holds a read lock on the contexts list, i.e. 'conlock', which
is blocking most of the other threads (any requesting ast_rdlock_context).
It also is requesting a lock on the hints container, so it too is being
blocked by the 'unknown thread id'.

More below at the end of the reply ...


> Thread 12 (Thread 0x7fb093fff700 (LWP 7836)):
> #0  0x00007fb0d292d334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fb0d29285f3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007fb0d29284d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x000000000051dd53 in ast_rdlock_contexts (hint=0x7fb093ffe7b0 "",
> hintsize=512, name=0x0, namesize=0, c=0x0, context=0x7fb093ffeb30 "blf",
> exten=0x7fb093ffeae0 "yyyyyyy-user1") at pbx.c:8509
> #4  ast_hint_extension (hint=0x7fb093ffe7b0 "", hintsize=512, name=0x0,
> namesize=0, c=0x0, context=0x7fb093ffeb30 "blf", exten=0x7fb093ffeae0
> "yyyyyyy-user1") at pbx.c:3000
> #5  ast_get_hint (hint=0x7fb093ffe7b0 "", hintsize=512, name=0x0,
> namesize=0, c=0x0, context=0x7fb093ffeb30 "blf", exten=0x7fb093ffeae0
> "yyyyyyy-user1") at pbx.c:4143
> #6  0x00000000005fa566 in manager_state_cb (context=0x7fb093ffeb30 "blf",
> exten=0x7fb093ffeae0 "yyyyyyy-user1", info=0x7fb093ffea10, data=<value
> optimized out>) at manager.c:7299
> #7  0x00000000005137b0 in execute_state_callback (cb=0x5fa500
> <manager_state_cb>, context=0x7fb093ffeb30 "blf", exten=0x7fb093ffeae0
> "yyyyyyy-user1", data=0x0, reason=<value optimized out>,
> hint=0x7faf301e6670, device_state_info=0x0) at pbx.c:3284
> #8  0x0000000000515834 in device_state_notify_callbacks
> (hint=0x7faf301e6670, hint_app=<value optimized out>) at pbx.c:3427
> #9  0x0000000000523801 in device_state_cb (unused=<value optimized out>,
> sub=<value optimized out>, msg=<value optimized out>) at pbx.c:3659
> #10 0x0000000000573250 in dispatch_exec_async (local=<value optimized
> out>) at stasis.c:1266
> #11 0x000000000058e0ee in ast_taskprocessor_execute (tps=0x1776c10) at
> taskprocessor.c:1235
> #12 0x000000000058eac8 in default_tps_processing_function (data=0x1757ce0)
> at taskprocessor.c:209
> #13 0x00000000005a24b7 in dummy_start (data=<value optimized out>) at
> utils.c:1428
> #14 0x00007fb0d2926aa1 in start_thread () from /lib64/libpthread.so.0
> #15 0x00007fb0d1d02c4d in clone () from /lib64/libc.so.6
>
This appears to be a "classic" deadlock. The 'unknown thread id' is holding
the hints lock and is being blocked by the contexts lock ('conlock').
Thread 61 is holding the contexts lock ('conlock') and is being blocked by
the hints lock.

You did not explicitly state what version of Asterisk 16, but it appears
you are running Asterisk 16.20. While not the latest release I have tracked
this in the latest code, so believe it is still an existing problem. Please
file an issue for this on the issue tracker: https://issues.asterisk.org/.
Please include your description here, and attach any files you have from
running the ast_coredumper script.

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20211123/a1121b8f/attachment-0001.html>


More information about the asterisk-dev mailing list