[asterisk-dev] Deadlock between transaction and transport

Richard Mudgett rmudgett at digium.com
Fri Jul 13 17:13:03 CDT 2018


On Thu, Jul 12, 2018 at 5:01 PM, Salahuddin Ahmed <txrubel at gmail.com> wrote:

> Hello,
>
> Recently I have experienced a deadlock situation in Asterisk Version
> 13.20.0. According to our gdb analysis there have 3 suspicious thread(3, 11
> and 44). Thread 44 push a task to taskpool, thread 11 execute that task and
> the thread 3 is a timer event. In thread 11 during a response going to
> transmit a group lock has been acquired. and this thread was locked on this
> point. In Thread 3 while retransmission timer triggered same group lock
> again acquired and try to retransmit response. In this situation asterisk
> was blocked.
>
> Let me explain according to GDB log,
>
> Thread-03:
> #4  0x00007f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
> tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122
>
> Thread-11:
> #6  0x00007f00ba6dc4ac in pjsip_tsx_send_msg (tsx=tsx at entry=0x7eff8bdd4568,
> tdata=tdata at entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
> (gdb) p tsx->transport_flag
> $2 = 1(TSX_HAS_PENDING_TRANSPORT)
>
> Here Thread-03 is execute first and try to retransmit a message before
> acquire a tsx group lock in Thread-11, because in thread 3
> tsx->transport_flag |= TSX_HAS_PENDING_TRANSPORT has been set, Now if we
> read this flag from thread 11 in frame 04 then we have found this flag was
> set by someone before acquiring tsx group lock. That means during this
> acquiring lock in Thread3, Thread 11 trying to send data on transport. So
> this might be the case of deadlock, somehow timer thread 3 execute before
> thread-11.
>
> Here I enclose Thread-3 and Thread-11 gdb backtrace.
>
> Could anyone please help me on this?
>
>
> Thread-11:
> #0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/
> linux/x86_64/lowlevellock.S:135
> #1  0x00007f00f62bf494 in _L_lock_952 () from /lib/x86_64-linux-gnu/
> libpthread.so.0
> #2  0x00007f00f62bf2f6 in __GI___pthread_mutex_lock (mutex=0x7eff41197c58)
> at ../nptl/pthread_mutex_lock.c:114
> #3  0x00007f00b97e49a5 in pj_mutex_lock (mutex=0x7eff41197c58) at
> ../src/pj/os_core_unix.c:1265
> #4  0x00007f00b97eace1 in grp_lock_acquire (p=0x7eff41197b48) at
> ../src/pj/lock.c:290
> #5  0x00007f00b97ead22 in pj_grp_lock_acquire (grp_lock=<optimized out>)
> at ../src/pj/lock.c:478
> #6  0x00007f00ba6dc4ac in pjsip_tsx_send_msg (tsx=tsx at entry=0x7eff8bdd4568,
> tdata=tdata at entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
> #7  0x00007f00ba6de952 in pjsip_dlg_send_response (dlg=<optimized out>,
> tsx=0x7eff8bdd4568, tdata=tdata at entry=0x7eff892ae668) at
> ../src/pjsip/sip_dialog.c:1531
> #8  0x00007f00bab1c09a in pjsip_inv_send_msg (inv=0x7eff88bdf9b8,
> tdata=tdata at entry=0x7eff892ae668) at ../src/pjsip-ua/sip_inv.c:3231
> #9  0x00007f0092874b4c in ast_sip_session_send_response
> (session=session at entry=0x7eff89367700, tdata=0x7eff892ae668) at
> res_pjsip_session.c:983
> #10 0x00007f002a431841 in answer (data=0x7eff89367700) at chan_pjsip.c:591
> #11 0x00007f0094b7a490 in sync_task (data=0x7efdbcf8b810) at
> res_pjsip.c:4268
> #12 0x00000000005d52ce in ast_taskprocessor_execute (tps=tps at entry=0x267f9b0)
> at taskprocessor.c:971
> #13 0x00000000005dcc80 in execute_tasks (data=0x267f9b0) at
> threadpool.c:1322
> #14 0x00000000005d52ce in ast_taskprocessor_execute (tps=0x23e0580) at
> taskprocessor.c:971
> #15 0x00000000005dd558 in threadpool_execute (pool=<optimized out>) at
> threadpool.c:351
> #16 worker_active (worker=<optimized out>) at threadpool.c:1105
> #17 worker_start (arg=0x7f0080001680) at threadpool.c:1024
> #18 0x00000000005e6dfa in dummy_start (data=<optimized out>) at
> utils.c:1238
> #19 0x00007f00f62bd064 in start_thread (arg=0x7eff67c9b700) at
> pthread_create.c:309
> #20 0x00007f00f55a562d in clone () at ../sysdeps/unix/sysv/linux/
> x86_64/clone.S:111
>
>
> Thread-3:
> #0  sanitize_tdata (tdata=<optimized out>) at res_pjsip/pjsip_message_
> filter.c:211
> #1  filter_on_tx_message (tdata=0x7eff892ae668) at res_pjsip/pjsip_message_
> filter.c:233
> #2  0x00007f00ba6c5b0e in endpt_on_tx_msg (endpt=<optimized out>,
> tdata=0x7eff892ae668) at ../src/pjsip/sip_endpoint.c:1088
> #3  0x00007f00ba6cc5c0 in pjsip_transport_send (tr=0x2677718,
> tdata=tdata at entry=0x7eff892ae668, addr=addr at entry=0x7eff8bdd4644,
> addr_len=16, token=token at entry=0x7eff8bdd4568,
>     cb=cb at entry=0x7f00ba6d91b0 <transport_callback>) at
> ../src/pjsip/sip_transport.c:833
> #4  0x00007f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
> tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122
> #5  0x00007f00ba6da8da in tsx_on_state_proceeding_uas (tsx=0x7eff8bdd4568,
> event=<optimized out>) at ../src/pjsip/sip_transaction.c:2882
> #6  0x00007f00ba6d9b86 in tsx_timer_callback (theap=<optimized out>,
> entry=0x7eff8bdd46e8) at ../src/pjsip/sip_transaction.c:1189
> #7  0x00007f00b97f6757 in pj_timer_heap_poll (ht=0x23e53a0,
> next_delay=next_delay at entry=0x7f0090649e50) at ../src/pj/timer.c:644
> #8  0x00007f00ba6c65c4 in pjsip_endpt_handle_events2 (endpt=0x23e50b8,
> max_timeout=max_timeout at entry=0x7f0090649ea0, p_count=p_count at entry=0x0)
> at ../src/pjsip/sip_endpoint.c:715
> #9  0x00007f00ba6c66d7 in pjsip_endpt_handle_events (endpt=<optimized
> out>, max_timeout=max_timeout at entry=0x7f0090649ea0) at
> ../src/pjsip/sip_endpoint.c:776
> #10 0x00007f0094b7a118 in monitor_thread_exec (endpt=<optimized out>) at
> res_pjsip.c:4345
> #11 0x00007f00b97e4ca0 in thread_main (param=0x23e3c68) at
> ../src/pj/os_core_unix.c:541
> #12 0x00007f00f62bd064 in start_thread (arg=0x7f009064a700) at
> pthread_create.c:309
> #13 0x00007f00f55a562d in clone () at ../sysdeps/unix/sysv/linux/
> x86_64/clone.S:111
>
>
* The attached backtrace does not match the threads you are highlighting in
your email.  In fact
the attached backtrace was done with a non-bundled pjproject without any
symbols.

* Thread 3 above should have acquired the transaction group lock in
tsx_timer_callback() before
calling the state handler routine.  From there the pjproject line numbers
do not match the bundled
pjproject version for 13.20.0.  Also the thread is not blocked so I don't
see how it as participating
in a deadlock unless you think it is in an endless loop.

You *really* need to be using bundled pjproject [1] for the reasons listed
on that page.

Richard

[1] https://wiki.asterisk.org/wiki/display/AST/PJSIP-pjproject
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20180713/6a5f946c/attachment.html>


More information about the asterisk-dev mailing list