[Asterisk-Dev] chan_zap.c:4409 my_zt_write Major Probs

Kris Boutilier Kris.Boutilier at scrd.bc.ca
Thu May 12 11:44:37 MST 2005


> -----Original Message-----
> From: Andrew Kohlsmith [mailto:akohlsmith-asterisk at benshaw.com]
> Sent: Thursday, May 12, 2005 6:31 AM
> To: asterisk-dev at lists.digium.com
> Subject: Re: [Asterisk-Dev] chan_zap.c:4409 my_zt_write Major Probs
> 
> 
> On May 11, 2005 11:13 pm, Russell Bryant wrote:
> > Yes, exactly.  It IS a debug level message.  It was changed 
> > to an error by mistake.
> 
{clip}

Alright, I'll 'fess up here - I am the origin of that patch and, apparently, the source of much phantom pain for many people. Had there simply been a one line comment in the code to explain why it's approproate to throw away the data then neither I nor the bug marshall who handled this patch would have made this mistaken modification. However, that is another topic...

> > The conditions that make this message come up are not a sign of problems
> > with your Zap stuff.  It can happen when there is jitter on your
> > network, and given that you were using a WiFi phone, I'm sure that is
> > the case.  Also, note that the "audio may have been lost" part was a
> > part of the change to be a warning, not the original debug message.
> 
> my_zt_write() attempts to send audio data to the zaptel  hardware to get it out 
> to the PSTN.  Now there are (by default) 4 buffers to which  pending data can 
> be written.  Increasing the number of buffers by adding 
> "jitterbuffers=8" to zaptel.conf seems to help (by allowing the driver to queue up 
> more audio,  which can cause increased latency), but the failure is still curious.
> 
> If there is network jitter or delays in getting data from the 
> network in a timely fashion, then that means that zaptel already played 
> blank audio out to the PSTN, and the data that came in late will have been 
> discarded.  I don't believe that "audio may have been lost" is incorrect at all.
> 

What seems to be inferred here is that the core of * runs at full tilt, so it's possible for a block of audio to arrive off of the network hot on the heels of a previous block and the core will attempt to convey it to the zaptel layer immediately, possibly while zaptel is still 'spooling out' the audio from the previous block. The zaptel jitterbuffers in turn provide a facility for _some_ pending writes to be buffered, essentially a queuing strategy, but should a large number of chunks of data arrive concurrently the zaptel jitterbuffer may overflow and _data_will_be_discarded_, leading sometime shortly thereafter to a buffer _underrun_ and no audio whatsoever being written out the zaptel interface. Hence, in my case, raising the number of zaptel jitterbuffers mitigated my symptoms. I suspect if I was currently running the IAX jitterbuffer (which meters incoming IAX packets in a predictable fashion) then the zaptel jitterbuffer starvation would not have occured as chan_zap would have been getting a nicely metered stream of audio to pass out its interfaces.

The deeper question is _why_ the blocks of audio are becoming all bunched up at the zaptel layer and there are a few perspectives here: 

1) perhaps the zaptel driver is having problems writing data (timing slips, interrupt misses, pci bus congestion (eg. non-dma hard drive access)) which is causing a backlog even though everything is arriving evenly metered - possible symptom would be consistent write() failures on all channels on the affected interface.

2) perhaps the zaptel driver really does mean to throw away the data (as happens during DTMF channel muting) - possible symptom would be a stream of write() failures during DTMF conveyance and not at other times.

or, 3) perhaps the data arriving off the network is correctly arriving at the IAX code in 'squirts' - as might happen if there were an intervening switch that was using store-and-forward switching rather than cut-over switching, or the local or remote network card drivers were doing write coalescing, mmaped buffer transfers or some other exotic (and appropriate) optimization. Hell, this could even be a correct behaviour for some QoS strategies on absurdly fast interfaces (ie. trunked IAX running over a Gigabit NIC). Just one possible behaviour - one channel within a particular zaptel interface throwing write() errors while all the others are behaving quite happily (case 2 excepted).

However, all this remains supposition unless someone intimatly familiar with zaptel can throw out some pointers...

> While I don't think that this message belongs in DEBUG land, 
> perhaps it would be a better compromise to have a proper reporting layer which 
> can help people see where problems may be.  IAX2 already has the new jitter 
> buffer and iax2 network statistics reporting, perhaps something similar for zaptel 
> (statistics) would be a good thing.
> 

Some mechansim, hell, any communications mechanism for the zaptel layer to explain whats going on would be a start. The tweaked warning message specifically read 'may have been lost' - however I obviously didn't emphasise the word 'may' strongly enough. This really was just intended as a heads-up to other neophites like me as a positive indicator that something odd _might_ be going on and needs to be investigated further, however it was based on the false assumption that the zaptel layer would be silently absorbing and discarding any intentionally unwanted data internally (ie. during a DTMF mute) rather than just flatly refusing it.

So, enough ranting. The EAGAIN/'resource temporarially unavailable' issue has been firmly put in its place - it is a feature, not a bug. If you're having pops and clicks on an predictable incoming reference signal, such as a remote milliwatt source, give adjusting zaptel jitterbuffers a go. It might be a cure for your symptoms, it might not.

:-)

Kris Boutilier
Information Services Coordinator
Sunshine Coast Regional District



More information about the asterisk-dev mailing list