[asterisk-dev] possible zaptel problem with SMP and RAID1

François Delawarde fdelawarde at wirelessmundi.com
Fri Jul 13 10:50:43 CDT 2007


Hello,

I've done some further investigation about the zaptel interrupt issue on 
my systems and found out the following:

- Disabling software RAID1 didn't help, so the whole issue I had brought 
wasn't one.
- I found out that it was related to network traffic and not hard disk 
activity (which are quite related in my machine). I have an external 
network card with a 8169 chipset (Gigaethernet) plugged in a 100Mpbs 
network, and when the network card receives more than 5Mbps, problems 
start. The driver is around since 2002, and nothing seems to have been 
reported about (I tried with/without NAPI).
- I made a small dirty script to calculate the number of interrupts the 
analog TDM card was receiving (wctdm) every 10 seconds from 
/proc/interrupts, and results are strange:

10008
10011
10006
10007
10790 <-- starting scp or samba or ftp transfer at around 10Mbps
12953
11964
12232
12765
12381
11419 <-- ending transfer
10011
10006
10011
10012
10008

I know it's probably not the best method to check, but we can see a 
trend here: there are no missed interrupts, on contrary, we have a mean 
of about 2000 more interrupts  (20% more) than we should (around 10000) 
during that 10 sec time. That's probably why it generates a tone instead 
of cutting voice.

I modified zttest a bit to show me times in micro seconds, and it show:
#./zttest2 -v
8192 samples in 8192 sample intervals (expected=1024000us, 
time=1024099us, diff=99us) 99.990334%
8192 samples in 8192 sample intervals (expected=1024000us, 
time=1024040us, diff=40us) 99.996094%
8192 samples in 8192 sample intervals (expected=1024000us, 
time=1024065us, diff=65us) 99.993652%
8192 samples in 8192 sample intervals (expected=1024000us, 
time=1024069us, diff=69us) 99.993263%
8192 samples in 8192 sample intervals (expected=1024000us, 
time=1024073us, diff=73us) 99.992874%
8192 samples in 7693 sample intervals (expected=1024000us, 
time=961574us, diff=-62426us) 93.507935% <-- transfer starts
8192 samples in 5693 sample intervals (expected=1024000us, 
time=711544us, diff=-312456us) 56.087608%
8192 samples in 6320 sample intervals (expected=1024000us, 
time=790031us, diff=-233969us) 70.384834%
8192 samples in 4753 sample intervals (expected=1024000us, 
time=594063us, diff=-429937us) 27.627710%
8192 samples in 5873 sample intervals (expected=1024000us, 
time=734099us, diff=-289901us) 60.509277%
8192 samples in 5081 sample intervals (expected=1024000us, 
time=635225us, diff=-388775us) 38.797276%
8192 samples in 5911 sample intervals (expected=1024000us, 
time=738811us, diff=-285189us) 61.398922%
8192 samples in 6673 sample intervals (expected=1024000us, 
time=834055us, diff=-189945us) 77.226318%
8192 samples in 4605 sample intervals (expected=1024000us, 
time=575686us, diff=-448314us) 22.125256%
8192 samples in 6564 sample intervals (expected=1024000us, 
time=820409us, diff=-203591us) 75.184204%
8192 samples in 4305 sample intervals (expected=1024000us, 
time=538061us, diff=-485939us) 9.687006%
8192 samples in 6657 sample intervals (expected=1024000us, 
time=832029us, diff=-191971us) 76.927368% <--- transfer ends
8192 samples in 8192 sample intervals (expected=1024000us, 
time=1024067us, diff=67us) 99.993454%
8192 samples in 8199 sample intervals (expected=1024000us, 
time=1024923us, diff=923us) 99.909943%
8192 samples in 8185 sample intervals (expected=1024000us, 
time=1023218us, diff=-782us) 99.923576%
8192 samples in 8192 sample intervals (expected=1024000us, 
time=1024076us, diff=76us) 99.992577%
8192 samples in 8192 sample intervals (expected=1024000us, 
time=1024063us, diff=63us) 99.993851%



I really don't understand how the wctdm driver could generate MORE 
interrupts than it should in a given time... Anyone could tell me 
possible reasons for me to know where to continue investigating?

Thanks again,
François.



François Delawarde wrote:
> For what I understand, it appears to be a magic problem that just 
> happens or not depending on too many external parameters (card, 
> motherboard, drivers, RAID, ...) to be efficiently debugged. Guess I 
> just don't have luck with the different configurations I tried.
>
> I will test more without RAID, or on different hardware, and report if I 
> ever find a real cause to what's happening (could it still be related to 
> a zaptel module?).
>
> One last question, is there a way to force the timer to be ztdummy (that 
> seems to be more reliable in my systems), even if one or more cards are 
> installed (disabling timer things in wctdm, and forcing the timer to use 
> ztdummy)?
>
> Thank you all for all those answers,
> François.
>
> Benny Amorsen wrote:
>   
>> SC> In my case we don't use software RAID _EVER_. I despise software
>> SC> RAID.
>>
>> In my experience, hardware RAID is way slower and more error-prone
>> than software RAID. Just don't do it -- and if you really can't do
>> without it, then go for a SAN or NAS solution instead.
>>
>> /Benny
>>
>>
>>
>> _______________________________________________
>> --Bandwidth and Colocation Provided by http://www.api-digital.com--
>>
>> asterisk-dev mailing list
>> To UNSUBSCRIBE or update options visit:
>>    http://lists.digium.com/mailman/listinfo/asterisk-dev
>>     
>
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-dev



More information about the asterisk-dev mailing list