[asterisk-dev] [Code Review] 2441: res_timing_pthread: Reduce probability of deadlocking on timer.
Matt Jordan
reviewboard at asterisk.org
Mon Apr 15 12:54:15 CDT 2013
> On April 15, 2013, 5:37 p.m., Matt Jordan wrote:
> > I'm not sure I can fully articulate what is happening here yet, so bear with me.
> >
> > Assume we are using res_timing_pthread and this patch. When a silence generator is on a channel and the channel timer is put into continuous mode (which occurs during play_and_record), there appears to be a race condition between the timer being serviced - where write_byte is called and pending_ticks is incremented - and the thread servicing the timer acknowledging the timer properly. At some point, timer->pending_ticks is incremented past 1, at which point all hell breaks loose.
> >
> > By "all hell breaking loose", the channel in question which has the audio streamed on it is no longer serviced; the dreaded "exceptionally long queue" message is displayed; the channel cannot be hung up; and the CPU pegs out. This so far can only be reproduced if the silence generator on record option is enabled in asterisk.conf and a Local channel is put into VoiceMail.
> >
> > A simple fix thus far has been to put the following at the beginning of write_byte:
> >
> > if (timer->pending_ticks) {
> > return;
> > }
> > timer->pending_ticks++;
> >
> > This means we can only be in overflow (which would be a very strange situation to get into, and would only happen if someone in a future patch incremented pending_ticks elsewhere) or enabled.
> >
> > Although this patch did prevent the situation I was seeing, I did notice that the silence generator will still sometimes prevent Asterisk from detecting silence on a channel and killing the recording. That's potentially a separate bug (and does not happen every time).
As a follow up: I was running an old version of the patch, and the most recent version of the patch on the issue doesn't exhibit any of the problems I was seeing.
- Matt
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2441/#review8254
-----------------------------------------------------------
On April 11, 2013, 10:16 p.m., Shaun Ruffell wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/2441/
> -----------------------------------------------------------
>
> (Updated April 11, 2013, 10:16 p.m.)
>
>
> Review request for Asterisk Developers and Matt Jordan.
>
>
> Bugs: ASTERISK-21389
> https://issues.asterisk.org/jira/browse/ASTERISK-21389
>
>
> Repository: Asterisk
>
>
> Description
> -------
>
> From the patch attached on ASTERISK-21389:
>
> res_timing_pthread: Reduce probability of deadlocking on timer.
>
> There were several reports of deadlock when using
> res_timing_pthread. Backtraces indicated that one thread was blocked
> waiting for the write to the pipe to complete and this thread held
> the container lock for the timers. Therefore any thread that wanted
> to create a new timer or read an existing timer would block waiting
> for either the timer lock or the container lock and deadlock ensued.
>
> This patch changes the way the pipe is used to eliminate this source
> of deadlocks:
>
> 1) The pipe is placed in non-blocking mode so that it would never
> block even if the following changes someone fail...
>
> 2) Instead of writing bytes into the pipe for each "tick" that's
> fired the pipe now has two states--signaled and unsignaled. If
> signaled, the pipe is hot and any pollers of the read side
> filedescriptor will be woken up. If unsigned the pipe is idle. This
> eliminates even the chance of filling up the pipe and reduces the
> potential overhead of calling unnecessary writes.
>
> 3) Since we're tracking the signaled / unsignaled state, we can
> eliminate the exta poll system call for every firing because we know
> that there is data to be read.
>
>
> Diffs
> -----
>
> /trunk/res/res_timing_pthread.c 385382
>
> Diff: https://reviewboard.asterisk.org/r/2441/diff/
>
>
> Testing
> -------
>
> ran timing test on the console and also put several channels in a confbridge.
>
>
> Thanks,
>
> Shaun Ruffell
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20130415/cb7efcc6/attachment-0001.htm>
More information about the asterisk-dev
mailing list