[asterisk-users] TE121 - Idle system load at ~0.3 - Bad DAHDI 2.2.0.2 behaviour ?!

Ex Vito ex.vitorino at gmail.com
Wed Nov 11 14:08:43 CST 2009


 Hi Asterisk Users,


 We've been experiencing some tough time regarding a new Asterisk installation
 connected to the PSTN via an ISDN PRI with a Digium TE121 with the optional
 VPMADT032 echo cancellation module.

 For now, I'll focus on something very specific which is summarized on this
 email's subject.

 However, here are some general facts for the context:

 - System pbxfri went into production about a month ago.
 - System pbxfrv is HW+SW "copy+paste" of pbxfri not in production yet.

 - Had several incidents where the PSTN connection was not operational
   (calls had bad quality/echo or PRI trunk could not be used for
either inbound or
   outbound)
 - Most of the incidents (maybe all of them, haven't verified
thourougly) are asso-
   ciated to hundreds/thousands of "HDLC Abort" / "Bad FCS" messages in the
   asterisk log.
 - DAHDI + Asterisk + libpri never seemed to recover from those conditions. We
   manually had to stop Asterisk, unload+load DAHDI, start Asterisk.
 - Had at least on kernel panic on DAHDI load.
 - We have logs + traces and are working with the telco so as to try to fully
   diagnose what's going on here.


 For now we'd like to focus on the following (but if you think we should start
 somewhere else, please, by all means, fire away!):

 - Lots of info out there (google) seems to associate the "HDLC Abort"
/ "Bad FCS"
   with a system hardware issue - whatever it is: interrupts, badly
behaved NICs,
   disk array controllers, etc.

   Question #1:

       What do these messages actually mean ?
       Can they be associated to a bad link/telco switch configuration ?

 - We've noticed that the system load at idle is about 0.3 when DAHDI is loaded.
   If we unload DAHDI, system load at idle goes to appoximately 0, as expected.

   Question #2:

       This looks like a very odd behaviour. We've installed several
other systems
       (different HW/SW versions, however) without seeing such
behaviour. Is this
       expected or could this be related with the "HDLC Aborts" / "Bad
FCS" and general
       failures we've been experiencing ?


 System info (same for both):

 HW: HP Proliant ML310 G5
     TE121 + VPMADT032
     AEX410 + 4x FXS + without DSP

 OS: CentOS 5.3, kernel 2.6.18-164.el5

 DAHDI:    2.2.0.2
 libpri:   1.4.10.2
 Asterisk: 1.4.26.2


 Here is a session transcript for pbxfrv (not in production) showing the
 odd DAHDI / system load behaviour. It starts with DAHDI unloaded:


 # uname -a
 Linux pbxfrv.replaced 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT
2009 i686 i686 i386 GNU/Linux

 # cat /proc/cmdline
 ro root=/dev/vg0/lv00 console=tty0 console=ttyS1,115200

 # cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3
   0:   25288985   25275219   25290489   25274409    IO-APIC-edge  timer
   1:          3          0          0          0    IO-APIC-edge  i8042
   3:      24819      20503      24395      19262    IO-APIC-edge  serial
   8:         14         16         13         11    IO-APIC-edge  rtc
   9:          0          0          0          0   IO-APIC-level  acpi
  12:          3          0          1          0    IO-APIC-edge  i8042
  74:          0          0          0          0   IO-APIC-level
ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4,
uhci_hcd:usb5
  82:         21         24         21         30   IO-APIC-level  uhci_hcd:usb6
  90:         17         16         14         16   IO-APIC-level
ata_piix, ata_piix
 106:      77476          0          0          0         PCI-MSI  eth0
 169:    1912615    1909646    1911302    1910566   IO-APIC-level  ioc0
 NMI:          0          0          0          0
 LOC:  101129266  101132004  101132444  101128234
 ERR:          0
 MIS:          0

 # uptime
  17:52:15 up 1 day,  4:07,  1 user,  load average: 0.00, 0.07, 0.06

 # dmesg
 ...
 ACPI: PCI interrupt for device 0000:05:08.0 disabled
 Freed a Wildcard
 ACPI: PCI interrupt for device 0000:08:08.0 disabled
 Freed a Wildcard TE12xP.
 dahdi: Telephony Interface Unloaded

 # /etc/init.d/dahdi start
 Loading DAHDI hardware modules:
   wcte12xp:                                                [  OK  ]
   wctdm24xxp:                                              [  OK  ]

 Running dahdi_cfg:                                         [  OK  ]

 # dmesg
 ...
 dahdi: Telephony Interface Registered on major 196
 dahdi: Version: 2.2.0.2
 PCI: Enabling device 0000:08:08.0 (0150 -> 0153)
 ACPI: PCI Interrupt 0000:08:08.0[A] -> GSI 19 (level, low) -> IRQ 185
 wcte12xp: VPM present and operational (Firmware version 117)
 wcte12xp: Setting up global serial parameters for E1
 wcte12xp: Found a Wildcard TE121
 PCI: Enabling device 0000:05:08.0 (0150 -> 0153)
 wcte12xp0: Missed interrupt. Increasing latency to 4 ms in order to compensate.
 ACPI: PCI Interrupt 0000:05:08.0[A] -> GSI 18 (level, low) -> IRQ 177
 Port 1: Installed -- AUTO FXS/DPO
 Port 2: Installed -- AUTO FXS/DPO
 Port 3: Installed -- AUTO FXS/DPO
 Port 4: Installed -- AUTO FXS/DPO
 VPM100: Not Present
 Found a Wildcard TDM: Wildcard AEX410 (4 modules)
 dahdi: Registered tone zone 0 (United States / North America)
 dahdi_echocan_mg2: Registered echo canceler 'MG2'
 wcte12xp0: Missed interrupt. Increasing latency to 5 ms in order to compensate.
 wctdm24xxp0: Missed interrupt. Increasing latency to 4 ms in order to
compensate.
 dahdi: Registered tone zone 25 (Portugal)
 wcte12xp: Span configured for CCS/HDB3/CRC4

 # cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3
   0:     143066     143547     142836     143602    IO-APIC-edge  timer
   1:          2          1          0          0    IO-APIC-edge  i8042
   3:        104        122        115        125    IO-APIC-edge  serial
   8:          0          1          1          1    IO-APIC-edge  rtc
   9:          0          0          0          0   IO-APIC-level  acpi
  12:          2          1          0          1    IO-APIC-edge  i8042
  74:          0          0          0          0   IO-APIC-level
ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4,
uhci_hcd:usb5
  82:         35         17         28         16   IO-APIC-level  uhci_hcd:usb6
  90:         17         15         13         18   IO-APIC-level
ata_piix, ata_piix
  98:       2820          0          0          0         PCI-MSI  eth0
 169:      27848      28037      27938      27759   IO-APIC-level  ioc0
 177:      21087      21083      21051      21149   IO-APIC-level  wctdm24xxp0
 185:      39542      38625      39457      38761   IO-APIC-level  wcte12xp0
 NMI:          0          0          0          0
 LOC:     572349     572761     572610     572485
 ERR:          0
 MIS:          0

 # sleep 60

 # uptime
  17:56:13 up 1 day,  4:11,  1 user,  load average: 0.39, 0.19, 0.10

 # /etc/init.d/dahdi stop
 Unloading DAHDI hardware modules: done

 # dmesg | tail
 ...
 ACPI: PCI interrupt for device 0000:05:08.0 disabled
 Freed a Wildcard
 ACPI: PCI interrupt for device 0000:08:08.0 disabled
 Freed a Wildcard TE12xP.
 dahdi: Telephony Interface Unloaded

 # sleep 60

 # uptime
  18:00:37 up 1 day,  4:15,  1 user,  load average: 0.01, 0.10, 0.08



 Extra information / question:

 A quick peek at https://issues.asterisk.org/view.php?id=15498&nbn=18
 also lead me to test loading the wcte12xp driver with "vpmsupport=0".

 The system load behaviour is exactly the same.

 Do you think it could be related ? How would you go about diagnosing
 this behaviour ?


 Thanks in advance.
 Kind regards,
--
 exvito



More information about the asterisk-users mailing list