[asterisk-bugs] [JIRA] (ASTERISK-25023) Deadlock in chan_sip in update_provisional_keepalive

Rusty Newton (JIRA) noreply at issues.asterisk.org
Thu Apr 30 19:00:33 CDT 2015


    [ https://issues.asterisk.org/jira/browse/ASTERISK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=226032#comment-226032 ] 

Rusty Newton edited comment on ASTERISK-25023 at 4/30/15 6:58 PM:
------------------------------------------------------------------

The information below is from a productive system without debugging enabled. If have extracted the following information so far:

[edit by Rusty - removing debug trace from comment as per the guidelines. Always better to attach this sort of thing as a .txt file for readability and accessibility.]


was (Author: arnd):
The information below is from a productive system without debugging enabled. If have extracted the following information so far:

Related Threads:
(gdb) thread 23
[Switching to thread 23 (Thread 0x7f39210ee700 (LWP 364))]#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136     2:      movl    %edx, %eax
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x0000003974a09523 in _L_lock_892 () from /lib64/libpthread-2.12.so
#2  0x0000003974a09407 in __pthread_mutex_lock (mutex=0x7f397004fb00) at pthread_mutex_lock.c:82
#3  0x00007f3925f8d92f in sip_pvt_lock_full (*pvt=0x7f397004fb60*) at chan_sip.c:9063
#4  0x00007f3925ff543f in send_provisional_keepalive_full (pvt=0x7f397004fb60, with_sdp=0) at chan_sip.c:4466
#5  0x000000000058ac9e in ast_sched_runq (*con=0x1108b10*) at sched.c:643
#6  0x00007f3925fd33d7 in do_monitor (data=<value optimized out>) at chan_sip.c:28858
#7  0x00000000005bcd2b in dummy_start (data=<value optimized out>) at utils.c:1232
#8  0x0000003974a079d1 in start_thread (arg=0x7f39210ee700) at pthread_create.c:301
#9  0x00000039746e88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) frame 2
#2  0x0000003974a09407 in __pthread_mutex_lock (mutex=0x7f397004fb00) at pthread_mutex_lock.c:82
82            LLL_MUTEX_LOCK (mutex);
(gdb) p *mutex
$6 = {
  __data = {
    __lock = 2, 
    __count = 1, 
    __owner = *14513*, 
    __nusers = 1, 
    __kind = 1, 
    __spins = 0, 
    __list = {
      __prev = 0x0, 
      __next = 0x0
    }
  }, 
  __size = "\002\000\000\000\001\000\000\000\261\070\000\000\001\000\000\000\001", '\000' <repeats 22 times>, 
  __align = 4294967298
}

*Mutex Owner is 14513*

(gdb) info threads
  76 Thread 0x7f397fa7f700 (LWP 316)  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:86
...
  48 Thread 0x7f3920646700 (LWP 14175)  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  47 Thread 0x7f391fffa700 (LWP *14513*)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
  46 Thread 0x7f391ff02700 (LWP 14549)  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
...
  24 Thread 0x7f3920ff6700 (LWP 366)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
* 23 Thread 0x7f39210ee700 (LWP 364)  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
  22 Thread 0x7f39211e6700 (LWP 362)  0x00000039746df0d3 in __poll (fds=<value optimized out>, nfds=<value optimized out>, 
    timeout=<value optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
...

(gdb) thread 47
[Switching to thread 47 (Thread 0x7f391fffa700 (LWP 14513))]#0  pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
183     62:     movl    (%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183
#1  0x000000000058a9a1 in ast_sched_del (*con=0x1108b10*, id=6096) at sched.c:500
#2  0x00007f3925ff6b74 in update_provisional_keepalive (p=0x7f397004fb60, msg=<value optimized out>, req=<value optimized out>, with_sdp=1)
    at chan_sip.c:4525
#3  transmit_provisional_response (*p=0x7f397004fb60*, msg=<value optimized out>, req=<value optimized out>, with_sdp=1) at chan_sip.c:12375
#4  0x00007f392600d2e4 in sip_indicate (ast=0x7f397009dde0, condition=<value optimized out>, data=0x0, datalen=0) at chan_sip.c:7708
#5  0x00000000004ae43e in ast_indicate_data (chan=0x7f397009dde0, _condition=14, data=0x0, datalen=0) at channel.c:4555
#6  0x00007f39253555bd in wait_for_answer (in=0x7f397009dde0, out_chans=0x7f391fff74e0, to=0x7f391fff751c, peerflags=0x7f391fff7560, 
    opt_args=0x7f391fff7170, pa=0x7f391fff69a0, num_in=0x7f391fff7460, result=0x7f391fff7514, dtmf_progress=0x0, ignore_cc=1, 
    forced_clid=0x7f391fff73e0, stored_clid=0x7f391fff7390) at app_dial.c:1374
#7  0x00007f392535a134 in dial_exec_full (chan=<value optimized out>, data=<value optimized out>, peerflags=0x7f391fff7560, continue_exec=0x0)
    at app_dial.c:2680
#8  0x00007f392535bff6 in dial_exec (chan=<value optimized out>, data=<value optimized out>) at app_dial.c:3133
#9  0x0000000000551f15 in pbx_exec (c=0x7f397009dde0, app=0x115e9b0, data=0x7f391fff76c0 "DAHDI/g1/00963932263976") at pbx.c:1662
#10 0x000000000055beb2 in pbx_extension_helper (c=0x7f397009dde0, con=0x0, context=0x7f397009e798 "moonstar", 
    exten=0x7f397009e7e8 "+963932263976", priority=7, label=0x0, callerid=0x7f396402f9a0 "905388446334", action=E_SPAWN, 
    found=0x7f391fff9c9c, combined_find_spawn=1) at pbx.c:4934
#11 0x0000000000562655 in ast_spawn_extension (c=0x7f397009dde0, args=0x0) at pbx.c:5967
#12 __ast_pbx_run (c=0x7f397009dde0, args=0x0) at pbx.c:6381
#13 0x0000000000563d1b in pbx_thread (data=<value optimized out>) at pbx.c:6701
#14 0x00000000005bcd2b in dummy_start (data=<value optimized out>) at utils.c:1232
#15 0x0000003974a079d1 in start_thread (arg=0x7f391fffa700) at pthread_create.c:301
#16 0x00000039746e88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) frame 1
#1  0x000000000058a9a1 in ast_sched_del (*con=0x1108b10*, id=6096) at sched.c:500                                                                
500                     ast_cond_wait(&s->cond, &con->lock);
(gdb) p con
$1 = (struct ast_sched_context *) 0x1108b10
(gdb) l
495                     s = con->currently_executing;
496                     s->deleted = 1;
497                     /* Wait for executing task to complete so that caller of ast_sched_del() does not
498                      * free memory out from under the task.
499                      */
500                     ast_cond_wait(&s->cond, &con->lock);
501                     /* Do not sched_release() here because ast_sched_runq() will do it */
502             }
503
504     #ifdef DUMP_SCHEDULER


> Deadlock in chan_sip in update_provisional_keepalive
> ----------------------------------------------------
>
>                 Key: ASTERISK-25023
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25023
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_sip/General
>    Affects Versions: 13.3.2
>         Environment: centos 6 / 64Bit
>            Reporter: Arnd Schmitter
>         Attachments: trace.txt
>
>
> There is a race condition / deadlock when update_provisional_keepalive is called.
> If there is already a scheduler run plant for calling send_provisional_keepalive_full.
> the func update_provisional_keepalive gets called with a locked sip_pvt struct and the first thing it does is delete the plant scheduler.
> If the plant scheduler is started after the sip_pvt lock and before the AST_SCHED_DEL_UNREF call in update_provisional_keepalive is executed, the scheduler job is blocked in send_provisional_keepalive_full, waiting to get a lock on the sip_pvt struct.
> The call to AST_SCHED_DEL_UNREF is waiting for a condistion signal, that the running scheduler finish.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list