[asterisk-dev] Zaptel Interrupt Issues

Tue Oct 2 01:30:50 CDT 2007

I have a theory about zaptel interrupt issues.

I've done some testing on a wide variety of systems. I've used Sangoma,
Digium, and Rhino T1/E1 cards. I've developed some data integrity tools.

What I have found is that systems with a high load drop data bits that go
unnoticed on all of these systems. Unless using PRI, these errors are
undetectable. On systems with old style DMA, I was able to reduce the data
errors to near zero by changing the ZT_CHUNKSIZE to 16, 32, or 64. At 128, I
can detect the delay, and at 512, a commercial EC is required on a zap-zap
bridged call. On ZT_CHUNKSIZE of 64 and above, I needed a load_avg of 500 or
so to cause an error, and that usually broke the system eventually anyhow.

My zaptel patch is to add *(8/ZT_CHUNKSIZE) to all those timer constants in
zaptel.h. I think that those should go in there anyhow. The modules that
don't like ZT_CHUNKSIZE == 8 will tell you at compile time.

My theory is that user land code is not being scheduled every 1 ms under
high load. The data drops are most prominent on systems with high network
traffic or RAIDs that can occupy the CPU for more than 1ms without letting
go. The simple solution was to expand the time to 2, 4, 8 ms respose time of
the user application.

Feedback?

Should I submit this patch?