[asterisk-dev] RTP streams suddenly stop

Mon Mar 29 12:01:34 CDT 2010

Shaun, many thanks for your reply - much appreciated!

In article <4BB0D58E.1010304 at digium.com>,
Shaun Ruffell <sruffell at digium.com> wrote:
> On 03/29/2010 11:03 AM, Tony Mountifield wrote:
> > In article <hm3o5a$mno$1 at softins.clara.co.uk>,
> > Tony Mountifield <tony at softins.clara.co.uk> wrote:
> >>
> >> My main question at the moment is "what mechanism could stall all RTP streams?",
> >> and a clue is the almost-exact five minutes for which it happens.
> > 
> > Well, FINALLY, I have managed to discover what is causing this problem,
> > although I haven't yet fixed it.
> > 
> > The problem is in ztdummy. It is the one from zaptel-1.2.27, and has been
> > compiled with USE_RTC. The OS is CentOS 4 with kernel 2.6.9-78.0.22.ELsmp.
> > The system has a pair of quad-core E5420 Xeons.
> > 
> > By monitoning /proc/interrupts every second, I have discovered that when
> > the problem occurs, the RTC interrupts stop counting up. After exactly
> > five minutes, they start up again. Obviously the lack of timing from
> > ztdummy is causing Meetme and file streaming to stall.
> > 
> > So, does anyone have any ideas why the RTC interrupt might stall for exactly
> > five minutes? I have only ever seen it on this one system. Nothing is logged
> > in any of the system logs at the time it occurs.
> > 
> > I would quite like to try the HPET mode of ztdummy, but it looks like this
> > requires a much newer kernel, 2.6.22, which is way newer than even CentOS 5.
> > Is there any other way to do this?
> 
> I noticed this as well when looking at issue 13930 [1]. Perhaps instead of
> using the HPET timer, you could back port the changes in dahdi_dummy that
> makes the default kernel timer usable (assuming you can't update)?

I've had a look at the diff, and they look very straightforward changes to do,
so I will indeed backport them.

Have there been no reports of adverse effects from calling zt_recieve() and
zt_transmit() in batches of four every 4ms instead of evenly at 1ms intervals?

> Best guess as to why it might stop though is probably ntp or something is
> running on that system, and it gets out of sync, then 5 minutes later when
> the kernel tries to sync back the time, it gets kicked off again.  Don't
> quote me on that..just a guess.

No, it won't be to do with NTP, more likely hardware. It's the actual RTC
interrupt from the 146818-lookalike in the chipset that is stopping (or else
being masked out by something else).

> [1] https://issues.asterisk.org/view.php?id=13930#99444

This is very useful indeed - thanks again!

Tony

> -- 
> Shaun Ruffell
> Digium, Inc. | Linux Kernel Developer
> 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
> Check us out at: www.digium.com & www.asterisk.org
-- 
Tony Mountifield
Work: tony at softins.co.uk - http://www.softins.co.uk
Play: tony at mountifield.org - http://tony.mountifield.org