[asterisk-bugs] [JIRA] (ASTERISK-21162) Deadlock in cdr.c: cdr_batch_lock vs cdr_pending_lock

Chase Venters (JIRA) noreply at issues.asterisk.org
Mon Feb 25 12:32:18 CST 2013


     [ https://issues.asterisk.org/jira/browse/ASTERISK-21162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chase Venters updated ASTERISK-21162:
-------------------------------------

    Attachment: gdb-asterisk-deadlock.zip

A full GDB "thread all apply bt" for my deadlocked Asterisk instance, for completeness.

Please note that to respect our users' privacy I have masked all phone numbers as **********.
                
> Deadlock in cdr.c: cdr_batch_lock vs cdr_pending_lock
> -----------------------------------------------------
>
>                 Key: ASTERISK-21162
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21162
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: CDR/General
>    Affects Versions: 1.8.19.0
>         Environment: CentOS release 6.3 (Final)
> Linux ivr01.XXXXXXXX 2.6.32-279.22.1.el6.i686 #1 SMP Wed Feb 6 00:31:03 UTC 2013 i686 i686 i386 GNU/Linux
> Asterisk 1.8.19.0 from EPEL
>            Reporter: Chase Venters
>            Severity: Critical
>         Attachments: gdb-asterisk-deadlock.zip
>
>
> I've run into a deadlock in Asterisk cdr.c. I have hundreds of threads blocked like this:
> Thread 3 (Thread 0xab1f9b70 (LWP 8232)):
> #0  0x0027d416 in __kernel_vsyscall ()
> #1  0x0023f019 in __lll_lock_wait () from /lib/libpthread.so.0
> #2  0x0023a43e in _L_lock_731 () from /lib/libpthread.so.0
> #3  0x0023a34a in pthread_mutex_lock () from /lib/libpthread.so.0
> #4  0x00370436 in pthread_mutex_lock () from /lib/libc.so.6
> #5  0x08096841 in ast_cdr_detach (cdr=0xb5718bb0) at cdr.c:1354
> #6  0x08124b5c in ast_pbx_outgoing_cdr_failed () at pbx.c:8797
> #7  0x08134552 in ast_pbx_outgoing_exten (type=0x973ec64 "Local", format=64, data=0x973ec6c, timeout=180000, context=0x973ec82 "pams-out-agi", 
> exten=0x973ec92 "s", priority=1, reason=0xab1f9278, synchronous=1, cid_num=0x0, cid_name=0x0, vars=0x973eb58, account=0x8214506 "", 
> channel=0xab1f9274) at pbx.c:8877
> #8  0x08119143 in fast_originate (data=0x973ebd8) at manager.c:3757
> #9  0x08174c97 in dummy_start (data=0x967c770) at utils.c:1012
> #10 0x00238a49 in start_thread () from /lib/libpthread.so.0
> #11 0x0036063e in clone () from /lib/libc.so.6
> ast_cdr_detach() is blocked trying to lock cdr_batch_lock. If it can get cdr_batch_lock, it calls submit_unscheduled_batch(). That function adds 
> a 1ms scheduler timeout to submit_scheduled_batch(), then locks cdr_pending_lock and fires a pthread condition.
> Separately, the do_cdr() thread is locking cdr_pending_lock and when the condition fires, it calls ast_sched_runq() to start dispatching the 
> calls to submit_scheduled_batch(). The problem is that submit_scheduled_batch() calls ast_cdr_submit_batch() with cdr_pending_lock held, and it 
> then ast_cdr_submit_batch() tries to lock cdr_batch_lock. A classic lock ordering conflict ensues. 
> You can see that do_cdr() is holding cdr_pending_lock and is blocked trying to grab cdr_batch_lock on this thread:
> Thread 771 (Thread 0xb744bb70 (LWP 2640)):
> #0  0x0027d416 in __kernel_vsyscall ()
> #1  0x0023f019 in __lll_lock_wait () from /lib/libpthread.so.0
> #2  0x0023a43e in _L_lock_731 () from /lib/libpthread.so.0
> #3  0x0023a34a in pthread_mutex_lock () from /lib/libpthread.so.0
> #4  0x00370436 in pthread_mutex_lock () from /lib/libc.so.6
> #5  0x08096a63 in ast_cdr_submit_batch (do_shutdown=0) at cdr.c:1279
> #6  0x08096c43 in submit_scheduled_batch (data=0x0) at cdr.c:1301
> #7  0x0815d5be in ast_sched_runq (con=0x95be6b8) at sched.c:640
> #8  0x080954dc in do_cdr (data=0x0) at cdr.c:1394
> #9  0x08174c97 in dummy_start (data=0x9380038) at utils.c:1012
> #10 0x00238a49 in start_thread () from /lib/libpthread.so.0
> #11 0x0036063e in clone () from /lib/libc.so.6
> Last but not least, here's a thread that is holding cdr_batch_lock but wants cdr_pending_lock:
> Thread 759 (Thread 0xb5d9ab70 (LWP 6632)):
> #0  0x0027d416 in __kernel_vsyscall ()
> #1  0x0023f019 in __lll_lock_wait () from /lib/libpthread.so.0
> #2  0x0023a43e in _L_lock_731 () from /lib/libpthread.so.0
> #3  0x0023a34a in pthread_mutex_lock () from /lib/libpthread.so.0
> #4  0x00370436 in pthread_mutex_lock () from /lib/libc.so.6
> #5  0x0809529b in submit_unscheduled_batch () at cdr.c:1315
> #6  0x0809698d in ast_cdr_detach (cdr=0xb5031d08) at cdr.c:1370
> #7  0x080ae257 in ast_hangup (chan=0xb503ada8) at channel.c:2910
> #8  0x080b294a in __ast_request_and_dial (type=0x9689e2e "Local", format=64, requestor=0x0, data=0x9689e36, timeout=<value optimized out>, 
> outstate=0xb5d9a278, cid_num=<value optimized out>, cid_name=<value optimized out>, oh=0xb5d9a1b8) at channel.c:5569
> #9  0x08134486 in ast_pbx_outgoing_exten (type=0x9689e2e "Local", format=64, data=0x9689e36, timeout=180000, context=0x9689e4c "pams-out-agi", 
> exten=0x9689e5c "s", priority=1, reason=0xb5d9a278, synchronous=1, cid_num=0x0, cid_name=0x0, vars=0x9689438, account=0x8214506 "", 
> channel=0xb5d9a274) at pbx.c:8821
> #10 0x08119143 in fast_originate (data=0x9679fe0) at manager.c:3757
> #11 0x08174c97 in dummy_start (data=0x9690610) at utils.c:1012
> #12 0x00238a49 in start_thread () from /lib/libpthread.so.0
> #13 0x0036063e in clone () from /lib/libc.so.6
> I think from looking at the 1.8.21.0-rc1 code that this issue still exists there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list