[asterisk-dev] possible zaptel problem with SMP and RAID1
François Delawarde
fdelawarde at wirelessmundi.com
Fri Jul 13 10:50:43 CDT 2007
Hello,
I've done some further investigation about the zaptel interrupt issue on
my systems and found out the following:
- Disabling software RAID1 didn't help, so the whole issue I had brought
wasn't one.
- I found out that it was related to network traffic and not hard disk
activity (which are quite related in my machine). I have an external
network card with a 8169 chipset (Gigaethernet) plugged in a 100Mpbs
network, and when the network card receives more than 5Mbps, problems
start. The driver is around since 2002, and nothing seems to have been
reported about (I tried with/without NAPI).
- I made a small dirty script to calculate the number of interrupts the
analog TDM card was receiving (wctdm) every 10 seconds from
/proc/interrupts, and results are strange:
10008
10011
10006
10007
10790 <-- starting scp or samba or ftp transfer at around 10Mbps
12953
11964
12232
12765
12381
11419 <-- ending transfer
10011
10006
10011
10012
10008
I know it's probably not the best method to check, but we can see a
trend here: there are no missed interrupts, on contrary, we have a mean
of about 2000 more interrupts (20% more) than we should (around 10000)
during that 10 sec time. That's probably why it generates a tone instead
of cutting voice.
I modified zttest a bit to show me times in micro seconds, and it show:
#./zttest2 -v
8192 samples in 8192 sample intervals (expected=1024000us,
time=1024099us, diff=99us) 99.990334%
8192 samples in 8192 sample intervals (expected=1024000us,
time=1024040us, diff=40us) 99.996094%
8192 samples in 8192 sample intervals (expected=1024000us,
time=1024065us, diff=65us) 99.993652%
8192 samples in 8192 sample intervals (expected=1024000us,
time=1024069us, diff=69us) 99.993263%
8192 samples in 8192 sample intervals (expected=1024000us,
time=1024073us, diff=73us) 99.992874%
8192 samples in 7693 sample intervals (expected=1024000us,
time=961574us, diff=-62426us) 93.507935% <-- transfer starts
8192 samples in 5693 sample intervals (expected=1024000us,
time=711544us, diff=-312456us) 56.087608%
8192 samples in 6320 sample intervals (expected=1024000us,
time=790031us, diff=-233969us) 70.384834%
8192 samples in 4753 sample intervals (expected=1024000us,
time=594063us, diff=-429937us) 27.627710%
8192 samples in 5873 sample intervals (expected=1024000us,
time=734099us, diff=-289901us) 60.509277%
8192 samples in 5081 sample intervals (expected=1024000us,
time=635225us, diff=-388775us) 38.797276%
8192 samples in 5911 sample intervals (expected=1024000us,
time=738811us, diff=-285189us) 61.398922%
8192 samples in 6673 sample intervals (expected=1024000us,
time=834055us, diff=-189945us) 77.226318%
8192 samples in 4605 sample intervals (expected=1024000us,
time=575686us, diff=-448314us) 22.125256%
8192 samples in 6564 sample intervals (expected=1024000us,
time=820409us, diff=-203591us) 75.184204%
8192 samples in 4305 sample intervals (expected=1024000us,
time=538061us, diff=-485939us) 9.687006%
8192 samples in 6657 sample intervals (expected=1024000us,
time=832029us, diff=-191971us) 76.927368% <--- transfer ends
8192 samples in 8192 sample intervals (expected=1024000us,
time=1024067us, diff=67us) 99.993454%
8192 samples in 8199 sample intervals (expected=1024000us,
time=1024923us, diff=923us) 99.909943%
8192 samples in 8185 sample intervals (expected=1024000us,
time=1023218us, diff=-782us) 99.923576%
8192 samples in 8192 sample intervals (expected=1024000us,
time=1024076us, diff=76us) 99.992577%
8192 samples in 8192 sample intervals (expected=1024000us,
time=1024063us, diff=63us) 99.993851%
I really don't understand how the wctdm driver could generate MORE
interrupts than it should in a given time... Anyone could tell me
possible reasons for me to know where to continue investigating?
Thanks again,
François.
François Delawarde wrote:
> For what I understand, it appears to be a magic problem that just
> happens or not depending on too many external parameters (card,
> motherboard, drivers, RAID, ...) to be efficiently debugged. Guess I
> just don't have luck with the different configurations I tried.
>
> I will test more without RAID, or on different hardware, and report if I
> ever find a real cause to what's happening (could it still be related to
> a zaptel module?).
>
> One last question, is there a way to force the timer to be ztdummy (that
> seems to be more reliable in my systems), even if one or more cards are
> installed (disabling timer things in wctdm, and forcing the timer to use
> ztdummy)?
>
> Thank you all for all those answers,
> François.
>
> Benny Amorsen wrote:
>
>> SC> In my case we don't use software RAID _EVER_. I despise software
>> SC> RAID.
>>
>> In my experience, hardware RAID is way slower and more error-prone
>> than software RAID. Just don't do it -- and if you really can't do
>> without it, then go for a SAN or NAS solution instead.
>>
>> /Benny
>>
>>
>>
>> _______________________________________________
>> --Bandwidth and Colocation Provided by http://www.api-digital.com--
>>
>> asterisk-dev mailing list
>> To UNSUBSCRIBE or update options visit:
>> http://lists.digium.com/mailman/listinfo/asterisk-dev
>>
>
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-dev
More information about the asterisk-dev
mailing list