[asterisk-dev] [Code Review] [patch] Use of MASTER_CHANNEL causes a race condition ending in a deadlock among channels
Kirill Katsnelson
kkm at adaptiveai.com
Sat Mar 5 23:38:32 CST 2011
On 110303 1936, Russell Bryant wrote:
>
> I took a look at the backtrace on the issue. This is the exact same situation we were looking at.
> The pbx thread does not have the channel locked going into the waitforsilence application. It is locking it in ast_read(). It's stuck in res_timing_timerfd.
As a general architecture question, is the channel supposed to be locked
while in timerfd_timer_ack() by design, or this is rather a
manifestation of a problem? We may have 10+ channels simultaneously
calling the WaitForSilence, with a 2.5 second limit. Would that mean we
are asking for long-lingering channel locks?
> If we had a test environment that reliably reproduced it, that would be a great start.
This was happening on a production server, and only under a high-ish
load condition. It might work for 2 weeks without freezing up, or it
might freeze twice a day. This is as reliable as it could get.
Now, this machine is for egress calls, placed by out AI guys. This means
that we generally can afford some (little) downtime; still, we are
losing the calls already in progress, so it should be used for testing
very sparingly.
Having said that, I do want to track down the problem. I am dying to
implement CEL, for one, and this is not in 1.6. Out call routing is
complex enough so that CDR is messy.
> For you or anyone else that hits this same issue, the only workaround right now is to install DAHDI and use the res_timing_dahdi module instead of res_timing_timerfd.
An interesting observation is that 1.6.2, also configured with timerfd,
has never has a single freeze-up.
Do you think installing the latest kernel may affect the problem? Mine
is fairly old, and timerfd is fairly a recent development.
-kkm
More information about the asterisk-dev
mailing list