[asterisk-dev] [Code Review]: Read on idle timerfd will result in deadlocks

David Vossel reviewboard at asterisk.org
Thu Jul 28 09:56:21 CDT 2011



> On July 6, 2011, 10:20 a.m., David Vossel wrote:
> > branches/1.8/res/res_timing_timerfd.c, lines 174-178
> > <https://reviewboard.asterisk.org/r/1255/diff/2/?file=16826#file16826line174>
> >
> >     I had to revert the patch from 1.8 and trunk.  This statement right hear is causing some serious performance problems.  See jrose's comment.
> >     
> >     If we comment out this line, everything works fine for him.
> 
> David Vossel wrote:
>     This same issue was reported here. https://issues.asterisk.org/jira/browse/ASTERISK-17867
> 
> kobaz wrote:
>     So, the whole patch was reverted.  Is your vote to have the same code fix but minus the debugging?  I've also heard from several people that the fix works with the debugging removed.
> 
> David Vossel wrote:
>     We tried taking out the debug statement in this if block, but it still caused performance issues.  Things did not work well until we completely took out that if statement.
> 
> jrose wrote:
>     Actually, the if statement was fine as long as you didn't include the return.  It's running through the whole loop until the call is answered that causes the problem.
> 
> jrose wrote:
>     Scratch that, I want to investigate this a little more.  It seems like it's actually not getting into the while loop that causes the problem.  Should have looked over it more carefully.
> 
> kobaz wrote:
>     It seems to me that the higher level functions that use this need to be investigated.  The upper level calls to ack the timer should themselves be making sure they won't read an idle timer.  The upper level stuff should be aware of timers it needs to ack and only access ones that have events waiting.
> 
> jrose wrote:
>     This is actually more elaborate of a problem than I realized.  It seems that individual applications are relying on the timer_ack function in different ways to make connections.  The breakage from before was specifically from using an analog channel to call into a queue.  That would cause it to hit this function over and over again because it is looking for something to be set in the following do while loop.
>     
>     I've noticed that using other applications causes other weird behaviors.  MusicOnHold for instance will play the music on hold normally, but at that point performing a hanging up the analog phone will not cause the channel to start closing and the analog phone won't be able to start a new call.
>     
>     I think the various SAY and playback applications are fine, even if you don't use them in a mode that performs an answer.
>     
>     Here are a couple example backtraces where I forced a crash at the condition causing all these problems:
>     
>     queue:
>     
>     Program received signal SIGSEGV, Segmentation fault.
>     [Switching to Thread 0xb6954b70 (LWP 17873)]
>     0x02f12ec8 in timerfd_timer_ack (handle=46, quantity=1) at res_timing_timerfd.c:178
>     178			pointer2[1] = 'a';
>     (gdb) bt
>     #0  0x02f12ec8 in timerfd_timer_ack (handle=46, quantity=1) at res_timing_timerfd.c:178
>     #1  0x081a2699 in ast_timer_ack (handle=0x86a07a8, quantity=1) at timing.c:169
>     #2  0x080b64bd in __ast_read (chan=0x867aad0, dropaudio=0) at channel.c:3788
>     #3  0x080b8661 in ast_read (chan=0x867aad0) at channel.c:4233
>     #4  0x023fcfa9 in wait_for_answer (qe=0xb694e29c, outgoing=0x85ebd18, to=0xb694c694, digit=0xb694c6a3 "", prebusies=0, caller_disconnect=0, forwardsallowed=1, 
>         update_connectedline=1) at app_queue.c:3803
>     #5  0x023ffba8 in try_calling (qe=0xb694e29c, options=0x0, announceoverride=0x0, url=0x0, tries=0xb694e270, noption=0xb694e26c, agi=0x0, macro=0x0, gosub=0x0, 
>         ringing=0) at app_queue.c:4583
>     #6  0x02407c17 in queue_exec (chan=0x867aad0, data=0xb695175c "markq") at app_queue.c:5992
>     #7  0x08146838 in pbx_exec (c=0x867aad0, app=0x85e9850, data=0xb695175c "markq") at pbx.c:1406
>     #8  0x08150408 in pbx_extension_helper (c=0x867aad0, con=0x0, context=0x867ae3c "analog", exten=0x867ae8c "3700", priority=4, label=0x0, callerid=0x0, 
>         action=E_SPAWN, found=0xb6953a68, combined_find_spawn=1) at pbx.c:4102
>     #9  0x08151eb3 in ast_spawn_extension (c=0x867aad0, context=0x867ae3c "analog", exten=0x867ae8c "3700", priority=4, callerid=0x0, found=0xb6953a68, 
>         combined_find_spawn=1) at pbx.c:4625
>     #10 0x08152b34 in __ast_pbx_run (c=0x867aad0, args=0x0) at pbx.c:4723
>     #11 0x08155013 in ast_pbx_run_args (c=0x867aad0, args=0x0) at pbx.c:5096
>     #12 0x08155039 in ast_pbx_run (c=0x867aad0) at pbx.c:5105
>     #13 0x02be7e03 in __analog_ss_thread (data=0xb779a3f0) at sig_analog.c:2119
>     #14 0x081acaca in dummy_start (data=0x867b1b8) at utils.c:1004
>     #15 0x003fc96e in start_thread (arg=0xb6954b70) at pthread_create.c:300
>     #16 0x011bfa4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
>     
>     musiconhold:
>     
>     Program received signal SIGSEGV, Segmentation fault.
>     [Switching to Thread 0xb7355b70 (LWP 17945)]
>     0x02f12ec8 in timerfd_timer_ack (handle=46, quantity=1) at res_timing_timerfd.c:178
>     178			pointer2[1] = 'a';
>     (gdb) bt
>     #0  0x02f12ec8 in timerfd_timer_ack (handle=46, quantity=1) at res_timing_timerfd.c:178
>     #1  0x081a2699 in ast_timer_ack (handle=0x88f94b8, quantity=1) at timing.c:169
>     #2  0x080b64bd in __ast_read (chan=0x8910bc0, dropaudio=0) at channel.c:3788
>     #3  0x080b8661 in ast_read (chan=0x8910bc0) at channel.c:4233
>     #4  0x080b12b7 in ast_safe_sleep_conditional (chan=0x8910bc0, ms=9981, cond=0, data=0x0) at channel.c:1830
>     #5  0x080b1447 in ast_safe_sleep (chan=0x8910bc0, ms=10000) at channel.c:1874
>     #6  0x01beb71a in play_moh_exec (chan=0x8910bc0, data=0xb735275c "madplay") at res_musiconhold.c:751
>     #7  0x08146838 in pbx_exec (c=0x8910bc0, app=0x8573960, data=0xb735275c "madplay") at pbx.c:1406
>     #8  0x08150408 in pbx_extension_helper (c=0x8910bc0, con=0x0, context=0x8910f2c "analog", exten=0x8910f7c "3702", priority=3, label=0x0, callerid=0x0, 
>         action=E_SPAWN, found=0xb7354a68, combined_find_spawn=1) at pbx.c:4102
>     #9  0x08151eb3 in ast_spawn_extension (c=0x8910bc0, context=0x8910f2c "analog", exten=0x8910f7c "3702", priority=3, callerid=0x0, found=0xb7354a68, 
>         combined_find_spawn=1) at pbx.c:4625
>     #10 0x08152b34 in __ast_pbx_run (c=0x8910bc0, args=0x0) at pbx.c:4723
>     #11 0x08155013 in ast_pbx_run_args (c=0x8910bc0, args=0x0) at pbx.c:5096
>     #12 0x08155039 in ast_pbx_run (c=0x8910bc0) at pbx.c:5105
>     #13 0x02be7e03 in __analog_ss_thread (data=0xb7930b08) at sig_analog.c:2119
>     #14 0x081acaca in dummy_start (data=0x88f7c80) at utils.c:1004
>     #15 0x003fc96e in start_thread (arg=0xb7355b70) at pthread_create.c:300
>     #16 0x011bfa4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
>     
>     sayalpha (this one works just fine):
>     
>     #0  0x02f12ec8 in timerfd_timer_ack (handle=46, quantity=1) at res_timing_timerfd.c:178
>     #1  0x081a2699 in ast_timer_ack (handle=0x88e7db8, quantity=1) at timing.c:169
>     #2  0x080b64bd in __ast_read (chan=0x88f5a30, dropaudio=0) at channel.c:3788
>     #3  0x080b8661 in ast_read (chan=0x88f5a30) at channel.c:4233
>     #4  0x08110f45 in waitstream_core (c=0x88f5a30, breakon=0x81ff77b "", forward=0x81f47f6 "", reverse=0x81f47f6 "", skip_ms=0, audiofd=-1, cmdfd=-1, context=0x0)
>         at file.c:1240
>     #5  0x08111459 in ast_waitstream (c=0x88f5a30, breakon=0x81ff77b "") at file.c:1330
>     #6  0x0816f6f3 in say_character_str_full (chan=0x88f5a30, str=0xb6b4b75c "\"ABCDEFGHIJKLMNOPQRSTUVWXYZ\"", ints=0x81ff77b "", lang=0x88e17ec "en", audiofd=-1, 
>         ctrlfd=-1) at say.c:131
>     #7  0x080c678c in ast_say_character_str (chan=0x88f5a30, str=0xb6b4b75c "\"ABCDEFGHIJKLMNOPQRSTUVWXYZ\"", ints=0x81ff77b "", lang=0x88e17ec "en") at channel.c:7983
>     #8  0x081669f4 in pbx_builtin_saycharacters (chan=0x88f5a30, data=0xb6b4b75c "\"ABCDEFGHIJKLMNOPQRSTUVWXYZ\"") at pbx.c:9856
>     #9  0x08146838 in pbx_exec (c=0x88f5a30, app=0xb7d14d00, data=0xb6b4b75c "\"ABCDEFGHIJKLMNOPQRSTUVWXYZ\"") at pbx.c:1406
>     #10 0x08150408 in pbx_extension_helper (c=0x88f5a30, con=0x0, context=0x88f5d9c "analog", exten=0x88f5dec "3703", priority=2, label=0x0, callerid=0x0, 
>         action=E_SPAWN, found=0xb6b4da68, combined_find_spawn=1) at pbx.c:4102
>     #11 0x08151eb3 in ast_spawn_extension (c=0x88f5a30, context=0x88f5d9c "analog", exten=0x88f5dec "3703", priority=2, callerid=0x0, found=0xb6b4da68, 
>         combined_find_spawn=1) at pbx.c:4625
>     #12 0x08152b34 in __ast_pbx_run (c=0x88f5a30, args=0x0) at pbx.c:4723
>     #13 0x08155013 in ast_pbx_run_args (c=0x88f5a30, args=0x0) at pbx.c:5096
>     #14 0x08155039 in ast_pbx_run (c=0x88f5a30) at pbx.c:5105
>     #15 0x02be7e03 in __analog_ss_thread (data=0x8812550) at sig_analog.c:2119
>     #16 0x081acaca in dummy_start (data=0x889e3e8) at utils.c:1004
>     #17 0x003fc96e in start_thread (arg=0xb6b4eb70) at pthread_create.c:300
>     #18 0x011bfa4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
>     
>     
>     Still not sure if that will help at all.
> 
> n8ideas wrote:
>     Any progress made on this?  This is a rather major issue for many people that do not run DAHDI.

We haven't forgotten this issue.  We realize a large number of people are experiencing this problem and resources have been allocated to come up with a solution.  No progress has been made since this patch appeared to have problems that I am aware of, but progress will resume shortly.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1255/#review3813
-----------------------------------------------------------


On June 3, 2011, 4:11 p.m., kobaz wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1255/
> -----------------------------------------------------------
> 
> (Updated June 3, 2011, 4:11 p.m.)
> 
> 
> Review request for Asterisk Developers and russelb.
> 
> 
> Summary
> -------
> 
> Fix for issue https://issues.asterisk.org/view.php?id=18028
> And several others.
> 
> 
> Diffs
> -----
> 
>   branches/1.8/res/res_timing_timerfd.c 320171 
> 
> Diff: https://reviewboard.asterisk.org/r/1255/diff
> 
> 
> Testing
> -------
> 
> Make enough calls to trigger a read on an idle timer
> 
> Wait for message:
> Reading attempt on idle timerfd. This would have caused a deadlock.
> 
> Deadlock avoided.
> 
> 
> Thanks,
> 
> kobaz
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20110728/65da0e15/attachment-0001.htm>


More information about the asterisk-dev mailing list