[asterisk-users] TE121 - Idle system load at ~0.3 - Bad DAHDI 2.2.0.2 behaviour ?!
Ex Vito
ex.vitorino at gmail.com
Wed Nov 11 14:08:43 CST 2009
Hi Asterisk Users,
We've been experiencing some tough time regarding a new Asterisk installation
connected to the PSTN via an ISDN PRI with a Digium TE121 with the optional
VPMADT032 echo cancellation module.
For now, I'll focus on something very specific which is summarized on this
email's subject.
However, here are some general facts for the context:
- System pbxfri went into production about a month ago.
- System pbxfrv is HW+SW "copy+paste" of pbxfri not in production yet.
- Had several incidents where the PSTN connection was not operational
(calls had bad quality/echo or PRI trunk could not be used for
either inbound or
outbound)
- Most of the incidents (maybe all of them, haven't verified
thourougly) are asso-
ciated to hundreds/thousands of "HDLC Abort" / "Bad FCS" messages in the
asterisk log.
- DAHDI + Asterisk + libpri never seemed to recover from those conditions. We
manually had to stop Asterisk, unload+load DAHDI, start Asterisk.
- Had at least on kernel panic on DAHDI load.
- We have logs + traces and are working with the telco so as to try to fully
diagnose what's going on here.
For now we'd like to focus on the following (but if you think we should start
somewhere else, please, by all means, fire away!):
- Lots of info out there (google) seems to associate the "HDLC Abort"
/ "Bad FCS"
with a system hardware issue - whatever it is: interrupts, badly
behaved NICs,
disk array controllers, etc.
Question #1:
What do these messages actually mean ?
Can they be associated to a bad link/telco switch configuration ?
- We've noticed that the system load at idle is about 0.3 when DAHDI is loaded.
If we unload DAHDI, system load at idle goes to appoximately 0, as expected.
Question #2:
This looks like a very odd behaviour. We've installed several
other systems
(different HW/SW versions, however) without seeing such
behaviour. Is this
expected or could this be related with the "HDLC Aborts" / "Bad
FCS" and general
failures we've been experiencing ?
System info (same for both):
HW: HP Proliant ML310 G5
TE121 + VPMADT032
AEX410 + 4x FXS + without DSP
OS: CentOS 5.3, kernel 2.6.18-164.el5
DAHDI: 2.2.0.2
libpri: 1.4.10.2
Asterisk: 1.4.26.2
Here is a session transcript for pbxfrv (not in production) showing the
odd DAHDI / system load behaviour. It starts with DAHDI unloaded:
# uname -a
Linux pbxfrv.replaced 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT
2009 i686 i686 i386 GNU/Linux
# cat /proc/cmdline
ro root=/dev/vg0/lv00 console=tty0 console=ttyS1,115200
# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 25288985 25275219 25290489 25274409 IO-APIC-edge timer
1: 3 0 0 0 IO-APIC-edge i8042
3: 24819 20503 24395 19262 IO-APIC-edge serial
8: 14 16 13 11 IO-APIC-edge rtc
9: 0 0 0 0 IO-APIC-level acpi
12: 3 0 1 0 IO-APIC-edge i8042
74: 0 0 0 0 IO-APIC-level
ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4,
uhci_hcd:usb5
82: 21 24 21 30 IO-APIC-level uhci_hcd:usb6
90: 17 16 14 16 IO-APIC-level
ata_piix, ata_piix
106: 77476 0 0 0 PCI-MSI eth0
169: 1912615 1909646 1911302 1910566 IO-APIC-level ioc0
NMI: 0 0 0 0
LOC: 101129266 101132004 101132444 101128234
ERR: 0
MIS: 0
# uptime
17:52:15 up 1 day, 4:07, 1 user, load average: 0.00, 0.07, 0.06
# dmesg
...
ACPI: PCI interrupt for device 0000:05:08.0 disabled
Freed a Wildcard
ACPI: PCI interrupt for device 0000:08:08.0 disabled
Freed a Wildcard TE12xP.
dahdi: Telephony Interface Unloaded
# /etc/init.d/dahdi start
Loading DAHDI hardware modules:
wcte12xp: [ OK ]
wctdm24xxp: [ OK ]
Running dahdi_cfg: [ OK ]
# dmesg
...
dahdi: Telephony Interface Registered on major 196
dahdi: Version: 2.2.0.2
PCI: Enabling device 0000:08:08.0 (0150 -> 0153)
ACPI: PCI Interrupt 0000:08:08.0[A] -> GSI 19 (level, low) -> IRQ 185
wcte12xp: VPM present and operational (Firmware version 117)
wcte12xp: Setting up global serial parameters for E1
wcte12xp: Found a Wildcard TE121
PCI: Enabling device 0000:05:08.0 (0150 -> 0153)
wcte12xp0: Missed interrupt. Increasing latency to 4 ms in order to compensate.
ACPI: PCI Interrupt 0000:05:08.0[A] -> GSI 18 (level, low) -> IRQ 177
Port 1: Installed -- AUTO FXS/DPO
Port 2: Installed -- AUTO FXS/DPO
Port 3: Installed -- AUTO FXS/DPO
Port 4: Installed -- AUTO FXS/DPO
VPM100: Not Present
Found a Wildcard TDM: Wildcard AEX410 (4 modules)
dahdi: Registered tone zone 0 (United States / North America)
dahdi_echocan_mg2: Registered echo canceler 'MG2'
wcte12xp0: Missed interrupt. Increasing latency to 5 ms in order to compensate.
wctdm24xxp0: Missed interrupt. Increasing latency to 4 ms in order to
compensate.
dahdi: Registered tone zone 25 (Portugal)
wcte12xp: Span configured for CCS/HDB3/CRC4
# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 143066 143547 142836 143602 IO-APIC-edge timer
1: 2 1 0 0 IO-APIC-edge i8042
3: 104 122 115 125 IO-APIC-edge serial
8: 0 1 1 1 IO-APIC-edge rtc
9: 0 0 0 0 IO-APIC-level acpi
12: 2 1 0 1 IO-APIC-edge i8042
74: 0 0 0 0 IO-APIC-level
ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4,
uhci_hcd:usb5
82: 35 17 28 16 IO-APIC-level uhci_hcd:usb6
90: 17 15 13 18 IO-APIC-level
ata_piix, ata_piix
98: 2820 0 0 0 PCI-MSI eth0
169: 27848 28037 27938 27759 IO-APIC-level ioc0
177: 21087 21083 21051 21149 IO-APIC-level wctdm24xxp0
185: 39542 38625 39457 38761 IO-APIC-level wcte12xp0
NMI: 0 0 0 0
LOC: 572349 572761 572610 572485
ERR: 0
MIS: 0
# sleep 60
# uptime
17:56:13 up 1 day, 4:11, 1 user, load average: 0.39, 0.19, 0.10
# /etc/init.d/dahdi stop
Unloading DAHDI hardware modules: done
# dmesg | tail
...
ACPI: PCI interrupt for device 0000:05:08.0 disabled
Freed a Wildcard
ACPI: PCI interrupt for device 0000:08:08.0 disabled
Freed a Wildcard TE12xP.
dahdi: Telephony Interface Unloaded
# sleep 60
# uptime
18:00:37 up 1 day, 4:15, 1 user, load average: 0.01, 0.10, 0.08
Extra information / question:
A quick peek at https://issues.asterisk.org/view.php?id=15498&nbn=18
also lead me to test loading the wcte12xp driver with "vpmsupport=0".
The system load behaviour is exactly the same.
Do you think it could be related ? How would you go about diagnosing
this behaviour ?
Thanks in advance.
Kind regards,
--
exvito
More information about the asterisk-users
mailing list