[asterisk-bugs] [JIRA] (ASTERISK-27966) race condition in 183 re transmission can result in a deadlock

Asterisk Team (JIRA) noreply at issues.asterisk.org
Tue Jul 17 07:05:54 CDT 2018


    [ https://issues.asterisk.org/jira/browse/ASTERISK-27966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=244135#comment-244135 ] 

Asterisk Team commented on ASTERISK-27966:
------------------------------------------

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

> race condition in 183 re transmission can result in a deadlock
> --------------------------------------------------------------
>
>                 Key: ASTERISK-27966
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27966
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Third-Party/pjproject
>    Affects Versions: 13.20.0
>            Reporter: Torrey Searle
>              Labels: pjsip
>
> Deadlock between two threads.
> The first thread is stuck in an infinite loop on a header list size 1 containing just a contact header
> Thread 5 (Thread 0x7f3208352700 (LWP 29224)):
> #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1 0x00007f354837c494 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
> #2 0x00007f354837c2f6 in _GI__pthread_mutex_lock (mutex=0x7f3219e74ef0) at ../nptl/pthread_mutex_lock.c:114
> #3 0x000000000045ae87 in __ao2_lock (user_data=0x7f3219e74f40, lock_how=(unknown: 128), lock_how at entry=AO2_LOCK_REQ_RDLOCK, file=0x7f3219e74ef0 "", file at entry=0x7f3219e74f40 "", func=0x0, func at entry=0x60af31 "astobj2_container.c", line=514, var=0x60af2c "self") at astobj2.c:164
> #4 0x000000000045dffd in internal_ao2_traverse (self=0x7f3219e74f40, flags=flags at entry=OBJ_SEARCH_KEY, cb_fn=0x7f35257f9dc0 <session_media_cmp>, arg=arg at entry=0x7f3484638360, tag=tag at entry=0x0, file=file at entry=0x0, line=0, func=0x0, type=AO2_CALLBACK_DEFAULT, data=0x0) at astobj2_container.c:336
> #5 0x000000000045e6f3 in __ao2_callback (arg=0x7f3484638360, cb_fn=<optimized out>, flags=OBJ_SEARCH_KEY, c=<optimized out>) at astobj2_container.c:455
> #6 __ao2_find (c=<optimized out>, arg=arg at entry=0x7f3484638360, flags=flags at entry=OBJ_SEARCH_KEY) at astobj2_container.c:496
> #7 0x00007f3484635aec in t38_framehook_read (session=0x7f3219791ec0, session=0x7f3219791ec0, f=0x0, chan=0x7f3218015810) at res_pjsip_t38.c:448
> #8 t38_framehook (chan=0x7f3218015810, f=0x0, event=<optimized out>, data=<optimized out>) at res_pjsip_t38.c:466
> #9 0x000000000051cc2b in framehook_list_push_event (framehooks=0x7f3218311b80, frame=0x0, event=AST_FRAMEHOOK_EVENT_READ) at framehook.c:118
> #10 0x00000000004bdd45 in __ast_read (chan=0x7f3218015810, dropaudio=0) at channel.c:3973
> #11 0x00000000004827c6 in bridge_handle_trip (bridge_channel=<optimized out>) at bridge_channel.c:2416
> #12 bridge_channel_wait (bridge_channel=<optimized out>) at bridge_channel.c:2586
> #13 bridge_channel_internal_join (bridge_channel=0x7f321a966260) at bridge_channel.c:2732
> #14 0x000000000046c600 in bridge_channel_ind_thread (data=data at entry=0x7f321a966260) at bridge.c:1782
> #15 0x00000000005e6dfa in dummy_start (data=<optimized out>) at utils.c:1238
> #16 0x00007f354837a064 in start_thread (arg=0x7f3208352700) at pthread_create.c:309
> #17 0x00007f354766262d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> The second thread is stuck trying to get the tsx group lock heald by the first thread
> Thread 5 (Thread 0x7f3208352700 (LWP 29224)):
> #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1 0x00007f354837c494 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
> #2 0x00007f354837c2f6 in _GI__pthread_mutex_lock (mutex=0x7f3219e74ef0) at ../nptl/pthread_mutex_lock.c:114
> #3 0x000000000045ae87 in __ao2_lock (user_data=0x7f3219e74f40, lock_how=(unknown: 128), lock_how at entry=AO2_LOCK_REQ_RDLOCK, file=0x7f3219e74ef0 "", file at entry=0x7f3219e74f40 "", func=0x0, func at entry=0x60af31 "astobj2_container.c", line=514, var=0x60af2c "self") at astobj2.c:164
> #4 0x000000000045dffd in internal_ao2_traverse (self=0x7f3219e74f40, flags=flags at entry=OBJ_SEARCH_KEY, cb_fn=0x7f35257f9dc0 <session_media_cmp>, arg=arg at entry=0x7f3484638360, tag=tag at entry=0x0, file=file at entry=0x0, line=0, func=0x0, type=AO2_CALLBACK_DEFAULT, data=0x0) at astobj2_container.c:336
> #5 0x000000000045e6f3 in __ao2_callback (arg=0x7f3484638360, cb_fn=<optimized out>, flags=OBJ_SEARCH_KEY, c=<optimized out>) at astobj2_container.c:455
> #6 __ao2_find (c=<optimized out>, arg=arg at entry=0x7f3484638360, flags=flags at entry=OBJ_SEARCH_KEY) at astobj2_container.c:496
> #7 0x00007f3484635aec in t38_framehook_read (session=0x7f3219791ec0, session=0x7f3219791ec0, f=0x0, chan=0x7f3218015810) at res_pjsip_t38.c:448
> #8 t38_framehook (chan=0x7f3218015810, f=0x0, event=<optimized out>, data=<optimized out>) at res_pjsip_t38.c:466
> #9 0x000000000051cc2b in framehook_list_push_event (framehooks=0x7f3218311b80, frame=0x0, event=AST_FRAMEHOOK_EVENT_READ) at framehook.c:118
> #10 0x00000000004bdd45 in __ast_read (chan=0x7f3218015810, dropaudio=0) at channel.c:3973
> #11 0x00000000004827c6 in bridge_handle_trip (bridge_channel=<optimized out>) at bridge_channel.c:2416
> #12 bridge_channel_wait (bridge_channel=<optimized out>) at bridge_channel.c:2586
> #13 bridge_channel_internal_join (bridge_channel=0x7f321a966260) at bridge_channel.c:2732
> #14 0x000000000046c600 in bridge_channel_ind_thread (data=data at entry=0x7f321a966260) at bridge.c:1782
> #15 0x00000000005e6dfa in dummy_start (data=<optimized out>) at utils.c:1238
> #16 0x00007f354837a064 in start_thread (arg=0x7f3208352700) at pthread_create.c:309
> #17 0x00007f354766262d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> The first thread is an infinite loop because it's iterating a circular list of size one of type contact header
> Infinite loop in sanitize_tdata when iterating the headers. This is the root cause of the outage
> $85 = (pjsip_hdr *) 0x7f3205e5c218
> (gdb) print hdr->next
> $86 = (struct pjsip_hdr *) 0x7f3205e5c218
> From analysis, I've found that when asterisk is trying to forward a 183 it invokes indicate (as seen in the 2nd thread).  This thread has a reference to the tdata of the last 183 send (which is the same tdata as the internal heap timer for retransmission has).  Indicate will call the pjsip method pjsip_dlg_modify_response which intern removes the contact header from the tdata, causing a chance for the first thread to end up in the above infinite loop.  
> As since this tdata is being shared by both a transaction and a dialog, I feel the best way to avoid conflict is to have the  pjsip_dlg_modify_response to get the tsx group lock to avoid access by the by the heap timer while it's modifying a possibly shared object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list