[asterisk-bugs] [JIRA] (ASTERISK-21162) Deadlock in cdr.c: cdr_batch_lock vs cdr_pending_lock
Chase Venters (JIRA)
noreply at issues.asterisk.org
Mon Feb 25 12:21:18 CST 2013
Chase Venters created ASTERISK-21162:
----------------------------------------
Summary: Deadlock in cdr.c: cdr_batch_lock vs cdr_pending_lock
Key: ASTERISK-21162
URL: https://issues.asterisk.org/jira/browse/ASTERISK-21162
Project: Asterisk
Issue Type: Bug
Security Level: None
Components: CDR/General
Affects Versions: 1.8.19.0
Environment: CentOS release 6.3 (Final)
Linux ivr01.XXXXXXXX 2.6.32-279.22.1.el6.i686 #1 SMP Wed Feb 6 00:31:03 UTC 2013 i686 i686 i386 GNU/Linux
Asterisk 1.8.19.0 from EPEL
Reporter: Chase Venters
Severity: Critical
I've run into a deadlock in Asterisk cdr.c. I have hundreds of threads blocked like this:
Thread 3 (Thread 0xab1f9b70 (LWP 8232)):
#0 0x0027d416 in __kernel_vsyscall ()
#1 0x0023f019 in __lll_lock_wait () from /lib/libpthread.so.0
#2 0x0023a43e in _L_lock_731 () from /lib/libpthread.so.0
#3 0x0023a34a in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x00370436 in pthread_mutex_lock () from /lib/libc.so.6
#5 0x08096841 in ast_cdr_detach (cdr=0xb5718bb0) at cdr.c:1354
#6 0x08124b5c in ast_pbx_outgoing_cdr_failed () at pbx.c:8797
#7 0x08134552 in ast_pbx_outgoing_exten (type=0x973ec64 "Local", format=64, data=0x973ec6c, timeout=180000, context=0x973ec82 "pams-out-agi",
exten=0x973ec92 "s", priority=1, reason=0xab1f9278, synchronous=1, cid_num=0x0, cid_name=0x0, vars=0x973eb58, account=0x8214506 "",
channel=0xab1f9274) at pbx.c:8877
#8 0x08119143 in fast_originate (data=0x973ebd8) at manager.c:3757
#9 0x08174c97 in dummy_start (data=0x967c770) at utils.c:1012
#10 0x00238a49 in start_thread () from /lib/libpthread.so.0
#11 0x0036063e in clone () from /lib/libc.so.6
ast_cdr_detach() is blocked trying to lock cdr_batch_lock. If it can get cdr_batch_lock, it calls submit_unscheduled_batch(). That function adds
a 1ms scheduler timeout to submit_scheduled_batch(), then locks cdr_pending_lock and fires a pthread condition.
Separately, the do_cdr() thread is locking cdr_pending_lock and when the condition fires, it calls ast_sched_runq() to start dispatching the
calls to submit_scheduled_batch(). The problem is that submit_scheduled_batch() calls ast_cdr_submit_batch() with cdr_pending_lock held, and it
then ast_cdr_submit_batch() tries to lock cdr_batch_lock. A classic lock ordering conflict ensues.
You can see that do_cdr() is holding cdr_pending_lock and is blocked trying to grab cdr_batch_lock on this thread:
Thread 771 (Thread 0xb744bb70 (LWP 2640)):
#0 0x0027d416 in __kernel_vsyscall ()
#1 0x0023f019 in __lll_lock_wait () from /lib/libpthread.so.0
#2 0x0023a43e in _L_lock_731 () from /lib/libpthread.so.0
#3 0x0023a34a in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x00370436 in pthread_mutex_lock () from /lib/libc.so.6
#5 0x08096a63 in ast_cdr_submit_batch (do_shutdown=0) at cdr.c:1279
#6 0x08096c43 in submit_scheduled_batch (data=0x0) at cdr.c:1301
#7 0x0815d5be in ast_sched_runq (con=0x95be6b8) at sched.c:640
#8 0x080954dc in do_cdr (data=0x0) at cdr.c:1394
#9 0x08174c97 in dummy_start (data=0x9380038) at utils.c:1012
#10 0x00238a49 in start_thread () from /lib/libpthread.so.0
#11 0x0036063e in clone () from /lib/libc.so.6
Last but not least, here's a thread that is holding cdr_batch_lock but wants cdr_pending_lock:
Thread 759 (Thread 0xb5d9ab70 (LWP 6632)):
#0 0x0027d416 in __kernel_vsyscall ()
#1 0x0023f019 in __lll_lock_wait () from /lib/libpthread.so.0
#2 0x0023a43e in _L_lock_731 () from /lib/libpthread.so.0
#3 0x0023a34a in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x00370436 in pthread_mutex_lock () from /lib/libc.so.6
#5 0x0809529b in submit_unscheduled_batch () at cdr.c:1315
#6 0x0809698d in ast_cdr_detach (cdr=0xb5031d08) at cdr.c:1370
#7 0x080ae257 in ast_hangup (chan=0xb503ada8) at channel.c:2910
#8 0x080b294a in __ast_request_and_dial (type=0x9689e2e "Local", format=64, requestor=0x0, data=0x9689e36, timeout=<value optimized out>,
outstate=0xb5d9a278, cid_num=<value optimized out>, cid_name=<value optimized out>, oh=0xb5d9a1b8) at channel.c:5569
#9 0x08134486 in ast_pbx_outgoing_exten (type=0x9689e2e "Local", format=64, data=0x9689e36, timeout=180000, context=0x9689e4c "pams-out-agi",
exten=0x9689e5c "s", priority=1, reason=0xb5d9a278, synchronous=1, cid_num=0x0, cid_name=0x0, vars=0x9689438, account=0x8214506 "",
channel=0xb5d9a274) at pbx.c:8821
#10 0x08119143 in fast_originate (data=0x9679fe0) at manager.c:3757
#11 0x08174c97 in dummy_start (data=0x9690610) at utils.c:1012
#12 0x00238a49 in start_thread () from /lib/libpthread.so.0
#13 0x0036063e in clone () from /lib/libc.so.6
I think from looking at the 1.8.21.0-rc1 code that this issue still exists there.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the asterisk-bugs
mailing list