[asterisk-dev] Asterisk 16 deadlock (maybe in AMI) stops responding to SIP

Steve Sether ssether at usinternet.com
Mon Oct 4 09:12:27 CDT 2021


We've been running Asterisk 16 in a production environment for several 
months.  During this time we've had several times where Asterisk locks 
up, and stops responding to SIP messages and won't recover until 
Asterisk is restarted.  RTP passes through, so existing conversations 
aren't affected, but no new calls can be made.


We finally got a dump of this, and did some analysis.  We found two 
threads in a deadlock. It appears the threads are involved in reporting 
output from an AMI command.

Details below.

We got GDB to detect deadlocks, and found the following:

********************************************************************************
Displaying blocking threads using 'blocked' command
Thread: 13348 waits for thread: 24853
Thread: 28752 waits for thread: 24853
Thread: 30322 waits for thread: 28005
Thread: 18038 waits for thread: 24853
Thread: 25449 waits for thread: 28005
Thread: 32629 waits for thread: 28005
Thread: 2934 waits for thread: 28005
Thread: 2935 waits for thread: 28005
Thread: 2765 waits for thread: 28005
Thread: 24853 waits for thread: 28005 AND DEADLOCKED
Thread: 28005 waits for thread: 24853 AND DEADLOCKED
Thread: 27509 waits for thread: 28005
Thread: 32701 waits for thread: 28005
Thread: 17857 waits for thread: 24853
Thread: 31231 waits for thread: 28005
********************************************************************************


Relevant threads:

Thread 28005:

  Thread 102 (Thread 0x7f80a0591700 (LWP 28005)):
#0  __lll_lock_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00000039c04095f3 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x00000039c04094d7 in __pthread_mutex_lock (mutex=0x5ff1a70) at 
pthread_mutex_lock.c:82
#3  0x000000000045b7a6 in __ao2_lock (user_data=<value optimized out>, 
lock_how=<value optimized out>, file=0x7f80d0d02280 "app_queue.c", 
func=0x7f80d0d08b59 "__queues_show", line=9687, var=<value optimized 
out>) at astobj2.c:241
#4  0x00007f80d0cf67e0 in __queues_show (s=<value optimized out>, 
fd=126, argc=3, argv=0x7f80a058fc08) at app_queue.c:9687
#5  0x00000000004b9048 in ast_cli_command_full (uid=-1, gid=-1, fd=126, 
s=0x7f803c1ac789 "queue show usi-Q-t1_support") at cli.c:2946
#6  0x00000000005fb1cb in action_command (s=0x7f80a0590cd0, 
m=0x7f80a05903e0) at manager.c:5315
#7  0x00000000005ffb80 in process_message (s=0x7f80a0590cd0, 
m=0x7f80a05903e0) at manager.c:6686
#8  0x0000000000602e5b in do_message (data=0x7f8084037ac0) at manager.c:6899
#9  session_do (data=0x7f8084037ac0) at manager.c:7004
#10 0x000000000059bc77 in dummy_start (data=<value optimized out>) at 
utils.c:1249
#11 0x00000039c0407aa1 in start_thread (arg=0x7f80a0591700) at 
pthread_create.c:301
#12 0x00000039c00e8c4d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115


Thread 24853:

  Thread 19 (Thread 0x7f811ad6c700 (LWP 24853)):
#0  __lll_lock_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00000039c04095f3 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x00000039c04094d7 in __pthread_mutex_lock (mutex=0x68c5800) at 
pthread_mutex_lock.c:82
#3  0x000000000045b7a6 in __ao2_lock (user_data=<value optimized out>, 
lock_how=<value optimized out>, file=0x61da46 "astobj2_container.c", 
func=0x61ddb0 "__ao2_iterator_next", line=589, var=<value optimized 
out>) at astobj2.c:241
#4  0x000000000045e110 in __ao2_iterator_next (iter=0x7f811ad6b810, 
tag=0x7f80d0d027e7 "Iterate through queues", file=0x7f80d0d02280 
"app_queue.c", line=5743, func=0x7f80d0d08533 "update_queue") at 
astobj2_container.c:589
#5  0x00007f80d0cf8f67 in update_queue (q=0x5ff1ab0, member=0x5054840, 
callcompletedinsl=1, starttime=<value optimized out>) at app_queue.c:5743
#6  0x00007f80d0cfa3e3 in update_status (context=0x7f811ad6bb40 "blf", 
exten=0x7f811ad6baf0 "usi-sraines", info=<value optimized out>, 
data=<value optimized out>) at app_queue.c:2453
#7  extension_state_cb (context=0x7f811ad6bb40 "blf", 
exten=0x7f811ad6baf0 "usi-sraines", info=<value optimized out>, 
data=<value optimized out>) at app_queue.c:2649
#8  0x000000000050f810 in execute_state_callback (cb=0x7f80d0cf9d20 
<extension_state_cb>, context=0x7f811ad6bb40 "blf", exten=0x7f811ad6baf0 
"usi-sraines", data=0x0, reason=<value optimized out>, hint=0x6489b80, 
device_state_info=0x0) at pbx.c:3277
#9  0x00000000005117a5 in device_state_notify_callbacks (hint=0x6489b80, 
hint_app=<value optimized out>) at pbx.c:3443
#10 0x0000000000520cf1 in device_state_cb (unused=<value optimized out>, 
sub=<value optimized out>, msg=<value optimized out>) at pbx.c:3652
#11 0x000000000056e500 in dispatch_exec_async (local=<value optimized 
out>) at stasis.c:1262
#12 0x0000000000588c8e in ast_taskprocessor_execute (tps=0x1cbe870) at 
taskprocessor.c:1235
#13 0x0000000000589668 in default_tps_processing_function 
(data=0x1cc1bd0) at taskprocessor.c:209
#14 0x000000000059bc77 in dummy_start (data=<value optimized out>) at 
utils.c:1249
#15 0x00000039c0407aa1 in start_thread (arg=0x7f811ad6c700) at 
pthread_create.c:301
#16 0x00000039c00e8c4d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115


Can anyone more familiar with Asterisk shed any light on this?

-- 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20211004/4112be31/attachment.html>


More information about the asterisk-dev mailing list