[asterisk-dev] Deadlock between transaction and transport

Salahuddin Ahmed txrubel at gmail.com
Thu Jul 12 17:01:29 CDT 2018


Hello,

Recently I have experienced a deadlock situation in Asterisk Version
13.20.0. According to our gdb analysis there have 3 suspicious thread(3, 11
and 44). Thread 44 push a task to taskpool, thread 11 execute that task and
the thread 3 is a timer event. In thread 11 during a response going to
transmit a group lock has been acquired. and this thread was locked on this
point. In Thread 3 while retransmission timer triggered same group lock
again acquired and try to retransmit response. In this situation asterisk
was blocked.

Let me explain according to GDB log,

Thread-03:
#4  0x00007f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122

Thread-11:
#6  0x00007f00ba6dc4ac in pjsip_tsx_send_msg (tsx=tsx at entry=0x7eff8bdd4568,
tdata=tdata at entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
(gdb) p tsx->transport_flag
$2 = 1(TSX_HAS_PENDING_TRANSPORT)

Here Thread-03 is execute first and try to retransmit a message before
acquire a tsx group lock in Thread-11, because in thread 3
tsx->transport_flag |= TSX_HAS_PENDING_TRANSPORT has been set, Now if we
read this flag from thread 11 in frame 04 then we have found this flag was
set by someone before acquiring tsx group lock. That means during this
acquiring lock in Thread3, Thread 11 trying to send data on transport. So
this might be the case of deadlock, somehow timer thread 3 execute before
thread-11.

Here I enclose Thread-3 and Thread-11 gdb backtrace.

Could anyone please help me on this?


Thread-11:
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f00f62bf494 in _L_lock_952 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f00f62bf2f6 in __GI___pthread_mutex_lock (mutex=0x7eff41197c58)
at ../nptl/pthread_mutex_lock.c:114
#3  0x00007f00b97e49a5 in pj_mutex_lock (mutex=0x7eff41197c58) at
../src/pj/os_core_unix.c:1265
#4  0x00007f00b97eace1 in grp_lock_acquire (p=0x7eff41197b48) at
../src/pj/lock.c:290
#5  0x00007f00b97ead22 in pj_grp_lock_acquire (grp_lock=<optimized out>) at
../src/pj/lock.c:478
#6  0x00007f00ba6dc4ac in pjsip_tsx_send_msg (tsx=tsx at entry=0x7eff8bdd4568,
tdata=tdata at entry=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:1783
#7  0x00007f00ba6de952 in pjsip_dlg_send_response (dlg=<optimized out>,
tsx=0x7eff8bdd4568, tdata=tdata at entry=0x7eff892ae668) at
../src/pjsip/sip_dialog.c:1531
#8  0x00007f00bab1c09a in pjsip_inv_send_msg (inv=0x7eff88bdf9b8,
tdata=tdata at entry=0x7eff892ae668) at ../src/pjsip-ua/sip_inv.c:3231
#9  0x00007f0092874b4c in ast_sip_session_send_response
(session=session at entry=0x7eff89367700, tdata=0x7eff892ae668) at
res_pjsip_session.c:983
#10 0x00007f002a431841 in answer (data=0x7eff89367700) at chan_pjsip.c:591
#11 0x00007f0094b7a490 in sync_task (data=0x7efdbcf8b810) at
res_pjsip.c:4268
#12 0x00000000005d52ce in ast_taskprocessor_execute (tps=tps at entry=0x267f9b0)
at taskprocessor.c:971
#13 0x00000000005dcc80 in execute_tasks (data=0x267f9b0) at
threadpool.c:1322
#14 0x00000000005d52ce in ast_taskprocessor_execute (tps=0x23e0580) at
taskprocessor.c:971
#15 0x00000000005dd558 in threadpool_execute (pool=<optimized out>) at
threadpool.c:351
#16 worker_active (worker=<optimized out>) at threadpool.c:1105
#17 worker_start (arg=0x7f0080001680) at threadpool.c:1024
#18 0x00000000005e6dfa in dummy_start (data=<optimized out>) at utils.c:1238
#19 0x00007f00f62bd064 in start_thread (arg=0x7eff67c9b700) at
pthread_create.c:309
#20 0x00007f00f55a562d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111


Thread-3:
#0  sanitize_tdata (tdata=<optimized out>) at
res_pjsip/pjsip_message_filter.c:211
#1  filter_on_tx_message (tdata=0x7eff892ae668) at
res_pjsip/pjsip_message_filter.c:233
#2  0x00007f00ba6c5b0e in endpt_on_tx_msg (endpt=<optimized out>,
tdata=0x7eff892ae668) at ../src/pjsip/sip_endpoint.c:1088
#3  0x00007f00ba6cc5c0 in pjsip_transport_send (tr=0x2677718,
tdata=tdata at entry=0x7eff892ae668, addr=addr at entry=0x7eff8bdd4644,
addr_len=16, token=token at entry=0x7eff8bdd4568,
    cb=cb at entry=0x7f00ba6d91b0 <transport_callback>) at
../src/pjsip/sip_transport.c:833
#4  0x00007f00ba6d9db0 in tsx_send_msg (tsx=0x7eff8bdd4568,
tdata=0x7eff892ae668) at ../src/pjsip/sip_transaction.c:2122
#5  0x00007f00ba6da8da in tsx_on_state_proceeding_uas (tsx=0x7eff8bdd4568,
event=<optimized out>) at ../src/pjsip/sip_transaction.c:2882
#6  0x00007f00ba6d9b86 in tsx_timer_callback (theap=<optimized out>,
entry=0x7eff8bdd46e8) at ../src/pjsip/sip_transaction.c:1189
#7  0x00007f00b97f6757 in pj_timer_heap_poll (ht=0x23e53a0,
next_delay=next_delay at entry=0x7f0090649e50) at ../src/pj/timer.c:644
#8  0x00007f00ba6c65c4 in pjsip_endpt_handle_events2 (endpt=0x23e50b8,
max_timeout=max_timeout at entry=0x7f0090649ea0, p_count=p_count at entry=0x0) at
../src/pjsip/sip_endpoint.c:715
#9  0x00007f00ba6c66d7 in pjsip_endpt_handle_events (endpt=<optimized out>,
max_timeout=max_timeout at entry=0x7f0090649ea0) at
../src/pjsip/sip_endpoint.c:776
#10 0x00007f0094b7a118 in monitor_thread_exec (endpt=<optimized out>) at
res_pjsip.c:4345
#11 0x00007f00b97e4ca0 in thread_main (param=0x23e3c68) at
../src/pj/os_core_unix.c:541
#12 0x00007f00f62bd064 in start_thread (arg=0x7f009064a700) at
pthread_create.c:309
#13 0x00007f00f55a562d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111


Thanks,
Salah Ahmed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20180712/478edc91/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: backtrace.tar.gz
Type: application/gzip
Size: 20888 bytes
Desc: not available
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20180712/478edc91/attachment-0001.gz>


More information about the asterisk-dev mailing list