[asterisk-bugs] [JIRA] (ASTERISK-25023) Deadlock in chan_sip in update_provisional_keepalive
Rusty Newton (JIRA)
noreply at issues.asterisk.org
Thu Apr 30 19:00:33 CDT 2015
[ https://issues.asterisk.org/jira/browse/ASTERISK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=226032#comment-226032 ]
Rusty Newton edited comment on ASTERISK-25023 at 4/30/15 6:58 PM:
------------------------------------------------------------------
The information below is from a productive system without debugging enabled. If have extracted the following information so far:
[edit by Rusty - removing debug trace from comment as per the guidelines. Always better to attach this sort of thing as a .txt file for readability and accessibility.]
was (Author: arnd):
The information below is from a productive system without debugging enabled. If have extracted the following information so far:
Related Threads:
(gdb) thread 23
[Switching to thread 23 (Thread 0x7f39210ee700 (LWP 364))]#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 2: movl %edx, %eax
(gdb) bt
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x0000003974a09523 in _L_lock_892 () from /lib64/libpthread-2.12.so
#2 0x0000003974a09407 in __pthread_mutex_lock (mutex=0x7f397004fb00) at pthread_mutex_lock.c:82
#3 0x00007f3925f8d92f in sip_pvt_lock_full (*pvt=0x7f397004fb60*) at chan_sip.c:9063
#4 0x00007f3925ff543f in send_provisional_keepalive_full (pvt=0x7f397004fb60, with_sdp=0) at chan_sip.c:4466
#5 0x000000000058ac9e in ast_sched_runq (*con=0x1108b10*) at sched.c:643
#6 0x00007f3925fd33d7 in do_monitor (data=<value optimized out>) at chan_sip.c:28858
#7 0x00000000005bcd2b in dummy_start (data=<value optimized out>) at utils.c:1232
#8 0x0000003974a079d1 in start_thread (arg=0x7f39210ee700) at pthread_create.c:301
#9 0x00000039746e88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) frame 2
#2 0x0000003974a09407 in __pthread_mutex_lock (mutex=0x7f397004fb00) at pthread_mutex_lock.c:82
82 LLL_MUTEX_LOCK (mutex);
(gdb) p *mutex
$6 = {
__data = {
__lock = 2,
__count = 1,
__owner = *14513*,
__nusers = 1,
__kind = 1,
__spins = 0,
__list = {
__prev = 0x0,
__next = 0x0
}
},
__size = "\002\000\000\000\001\000\000\000\261\070\000\000\001\000\000\000\001", '\000' <repeats 22 times>,
__align = 4294967298
}
*Mutex Owner is 14513*
(gdb) info threads
76 Thread 0x7f397fa7f700 (LWP 316) sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:86
...
48 Thread 0x7f3920646700 (LWP 14175) __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
47 Thread 0x7f391fffa700 (LWP *14513*) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
46 Thread 0x7f391ff02700 (LWP 14549) __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
...
24 Thread 0x7f3920ff6700 (LWP 366) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
* 23 Thread 0x7f39210ee700 (LWP 364) __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
22 Thread 0x7f39211e6700 (LWP 362) 0x00000039746df0d3 in __poll (fds=<value optimized out>, nfds=<value optimized out>,
timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
...
(gdb) thread 47
[Switching to thread 47 (Thread 0x7f391fffa700 (LWP 14513))]#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
183 62: movl (%rsp), %edi
(gdb) bt
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
#1 0x000000000058a9a1 in ast_sched_del (*con=0x1108b10*, id=6096) at sched.c:500
#2 0x00007f3925ff6b74 in update_provisional_keepalive (p=0x7f397004fb60, msg=<value optimized out>, req=<value optimized out>, with_sdp=1)
at chan_sip.c:4525
#3 transmit_provisional_response (*p=0x7f397004fb60*, msg=<value optimized out>, req=<value optimized out>, with_sdp=1) at chan_sip.c:12375
#4 0x00007f392600d2e4 in sip_indicate (ast=0x7f397009dde0, condition=<value optimized out>, data=0x0, datalen=0) at chan_sip.c:7708
#5 0x00000000004ae43e in ast_indicate_data (chan=0x7f397009dde0, _condition=14, data=0x0, datalen=0) at channel.c:4555
#6 0x00007f39253555bd in wait_for_answer (in=0x7f397009dde0, out_chans=0x7f391fff74e0, to=0x7f391fff751c, peerflags=0x7f391fff7560,
opt_args=0x7f391fff7170, pa=0x7f391fff69a0, num_in=0x7f391fff7460, result=0x7f391fff7514, dtmf_progress=0x0, ignore_cc=1,
forced_clid=0x7f391fff73e0, stored_clid=0x7f391fff7390) at app_dial.c:1374
#7 0x00007f392535a134 in dial_exec_full (chan=<value optimized out>, data=<value optimized out>, peerflags=0x7f391fff7560, continue_exec=0x0)
at app_dial.c:2680
#8 0x00007f392535bff6 in dial_exec (chan=<value optimized out>, data=<value optimized out>) at app_dial.c:3133
#9 0x0000000000551f15 in pbx_exec (c=0x7f397009dde0, app=0x115e9b0, data=0x7f391fff76c0 "DAHDI/g1/00963932263976") at pbx.c:1662
#10 0x000000000055beb2 in pbx_extension_helper (c=0x7f397009dde0, con=0x0, context=0x7f397009e798 "moonstar",
exten=0x7f397009e7e8 "+963932263976", priority=7, label=0x0, callerid=0x7f396402f9a0 "905388446334", action=E_SPAWN,
found=0x7f391fff9c9c, combined_find_spawn=1) at pbx.c:4934
#11 0x0000000000562655 in ast_spawn_extension (c=0x7f397009dde0, args=0x0) at pbx.c:5967
#12 __ast_pbx_run (c=0x7f397009dde0, args=0x0) at pbx.c:6381
#13 0x0000000000563d1b in pbx_thread (data=<value optimized out>) at pbx.c:6701
#14 0x00000000005bcd2b in dummy_start (data=<value optimized out>) at utils.c:1232
#15 0x0000003974a079d1 in start_thread (arg=0x7f391fffa700) at pthread_create.c:301
#16 0x00000039746e88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) frame 1
#1 0x000000000058a9a1 in ast_sched_del (*con=0x1108b10*, id=6096) at sched.c:500
500 ast_cond_wait(&s->cond, &con->lock);
(gdb) p con
$1 = (struct ast_sched_context *) 0x1108b10
(gdb) l
495 s = con->currently_executing;
496 s->deleted = 1;
497 /* Wait for executing task to complete so that caller of ast_sched_del() does not
498 * free memory out from under the task.
499 */
500 ast_cond_wait(&s->cond, &con->lock);
501 /* Do not sched_release() here because ast_sched_runq() will do it */
502 }
503
504 #ifdef DUMP_SCHEDULER
> Deadlock in chan_sip in update_provisional_keepalive
> ----------------------------------------------------
>
> Key: ASTERISK-25023
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-25023
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Channels/chan_sip/General
> Affects Versions: 13.3.2
> Environment: centos 6 / 64Bit
> Reporter: Arnd Schmitter
> Attachments: trace.txt
>
>
> There is a race condition / deadlock when update_provisional_keepalive is called.
> If there is already a scheduler run plant for calling send_provisional_keepalive_full.
> the func update_provisional_keepalive gets called with a locked sip_pvt struct and the first thing it does is delete the plant scheduler.
> If the plant scheduler is started after the sip_pvt lock and before the AST_SCHED_DEL_UNREF call in update_provisional_keepalive is executed, the scheduler job is blocked in send_provisional_keepalive_full, waiting to get a lock on the sip_pvt struct.
> The call to AST_SCHED_DEL_UNREF is waiting for a condistion signal, that the running scheduler finish.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list