<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>We've been running Asterisk 16 in a production environment for
several months. During this time we've had several times where
Asterisk locks up, and stops responding to SIP messages and won't
recover until Asterisk is restarted. RTP passes through, so
existing conversations aren't affected, but no new calls can be
made.</p>
<p><br>
</p>
<p>We finally got a dump of this, and did some analysis. We found
two threads in a deadlock. It appears the threads are involved in
reporting output from an AMI command.<br>
</p>
<p>Details below.</p>
<p>We got GDB to detect deadlocks, and found the following:</p>
<p>********************************************************************************<br>
Displaying blocking threads using 'blocked' command<br>
Thread: 13348 waits for thread: 24853 <br>
Thread: 28752 waits for thread: 24853 <br>
Thread: 30322 waits for thread: 28005 <br>
Thread: 18038 waits for thread: 24853 <br>
Thread: 25449 waits for thread: 28005 <br>
Thread: 32629 waits for thread: 28005 <br>
Thread: 2934 waits for thread: 28005 <br>
Thread: 2935 waits for thread: 28005 <br>
Thread: 2765 waits for thread: 28005 <br>
Thread: 24853 waits for thread: 28005 AND DEADLOCKED<br>
Thread: 28005 waits for thread: 24853 AND DEADLOCKED<br>
Thread: 27509 waits for thread: 28005 <br>
Thread: 32701 waits for thread: 28005 <br>
Thread: 17857 waits for thread: 24853 <br>
Thread: 31231 waits for thread: 28005 <br>
********************************************************************************</p>
<p><br>
</p>
<p>Relevant threads:</p>
<p>Thread 28005:<br>
</p>
<p> Thread 102 (Thread 0x7f80a0591700 (LWP 28005)):<br>
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
#1 0x00000039c04095f3 in _L_lock_892 () from
/lib64/libpthread.so.0<br>
#2 0x00000039c04094d7 in __pthread_mutex_lock (mutex=0x5ff1a70)
at pthread_mutex_lock.c:82<br>
#3 0x000000000045b7a6 in __ao2_lock (user_data=<value
optimized out>, lock_how=<value optimized out>,
file=0x7f80d0d02280 "app_queue.c", func=0x7f80d0d08b59
"__queues_show", line=9687, var=<value optimized out>) at
astobj2.c:241<br>
#4 0x00007f80d0cf67e0 in __queues_show (s=<value optimized
out>, fd=126, argc=3, argv=0x7f80a058fc08) at app_queue.c:9687<br>
#5 0x00000000004b9048 in ast_cli_command_full (uid=-1, gid=-1,
fd=126, s=0x7f803c1ac789 "queue show usi-Q-t1_support") at
cli.c:2946<br>
#6 0x00000000005fb1cb in action_command (s=0x7f80a0590cd0,
m=0x7f80a05903e0) at manager.c:5315<br>
#7 0x00000000005ffb80 in process_message (s=0x7f80a0590cd0,
m=0x7f80a05903e0) at manager.c:6686<br>
#8 0x0000000000602e5b in do_message (data=0x7f8084037ac0) at
manager.c:6899<br>
#9 session_do (data=0x7f8084037ac0) at manager.c:7004<br>
#10 0x000000000059bc77 in dummy_start (data=<value optimized
out>) at utils.c:1249<br>
#11 0x00000039c0407aa1 in start_thread (arg=0x7f80a0591700) at
pthread_create.c:301<br>
#12 0x00000039c00e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115<br>
</p>
<p><br>
</p>
<p>Thread 24853:</p>
<p> Thread 19 (Thread 0x7f811ad6c700 (LWP 24853)):<br>
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
#1 0x00000039c04095f3 in _L_lock_892 () from
/lib64/libpthread.so.0<br>
#2 0x00000039c04094d7 in __pthread_mutex_lock (mutex=0x68c5800)
at pthread_mutex_lock.c:82<br>
#3 0x000000000045b7a6 in __ao2_lock (user_data=<value
optimized out>, lock_how=<value optimized out>,
file=0x61da46 "astobj2_container.c", func=0x61ddb0
"__ao2_iterator_next", line=589, var=<value optimized out>)
at astobj2.c:241<br>
#4 0x000000000045e110 in __ao2_iterator_next
(iter=0x7f811ad6b810, tag=0x7f80d0d027e7 "Iterate through queues",
file=0x7f80d0d02280 "app_queue.c", line=5743, func=0x7f80d0d08533
"update_queue") at astobj2_container.c:589<br>
#5 0x00007f80d0cf8f67 in update_queue (q=0x5ff1ab0,
member=0x5054840, callcompletedinsl=1, starttime=<value
optimized out>) at app_queue.c:5743<br>
#6 0x00007f80d0cfa3e3 in update_status (context=0x7f811ad6bb40
"blf", exten=0x7f811ad6baf0 "usi-sraines", info=<value
optimized out>, data=<value optimized out>) at
app_queue.c:2453<br>
#7 extension_state_cb (context=0x7f811ad6bb40 "blf",
exten=0x7f811ad6baf0 "usi-sraines", info=<value optimized
out>, data=<value optimized out>) at app_queue.c:2649<br>
#8 0x000000000050f810 in execute_state_callback
(cb=0x7f80d0cf9d20 <extension_state_cb>,
context=0x7f811ad6bb40 "blf", exten=0x7f811ad6baf0 "usi-sraines",
data=0x0, reason=<value optimized out>, hint=0x6489b80,
device_state_info=0x0) at pbx.c:3277<br>
#9 0x00000000005117a5 in device_state_notify_callbacks
(hint=0x6489b80, hint_app=<value optimized out>) at
pbx.c:3443<br>
#10 0x0000000000520cf1 in device_state_cb (unused=<value
optimized out>, sub=<value optimized out>, msg=<value
optimized out>) at pbx.c:3652<br>
#11 0x000000000056e500 in dispatch_exec_async (local=<value
optimized out>) at stasis.c:1262<br>
#12 0x0000000000588c8e in ast_taskprocessor_execute
(tps=0x1cbe870) at taskprocessor.c:1235<br>
#13 0x0000000000589668 in default_tps_processing_function
(data=0x1cc1bd0) at taskprocessor.c:209<br>
#14 0x000000000059bc77 in dummy_start (data=<value optimized
out>) at utils.c:1249<br>
#15 0x00000039c0407aa1 in start_thread (arg=0x7f811ad6c700) at
pthread_create.c:301<br>
#16 0x00000039c00e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115<br>
</p>
<p><br>
</p>
<p>Can anyone more familiar with Asterisk shed any light on this? <br>
</p>
<div class="moz-signature">-- <br>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>