[asterisk-users] Asterisk Server : IDE HDD frequent crash

Matthew Rubenstein email at mattruby.com
Fri Oct 6 06:55:24 MST 2006


	I partitioned/formatted a new WD2500 with NTFS on a WinXP machine,
filled it with data (mostly 10MB FLAC and SHN soundfiles). Then
transferred it to an AAH Asterisk server box with a Digium TDM400P
(1FXO/1FXS) and an Audigy2 soundcard. I installed it as hdb, booting off
hda (no other drives). I mounted that drive with ntfs-fuse, and then
remotely mounted it from another machine (Ubuntu) with sshfs. fuse
doesn't fully work, so when I removed some files from the NTFS volume it
failed to remove the last file specified for removal from some
directories (and therefore their directories). I then opened several of
the existing remote files from my local workstation.

	After about 6 hours, I got a CentOS kernel panic from the AAH server
with the NTFS drive, indicating an IRQ conflict. When I rebooted, it
continued to kernel panic. Until I rebooted with the Audigy2 soundcard
removed, which forced CentOS to deinstall the driver. After which point
I deleted the AC97 module for the motherboard soundchip, just to be
safe, then shut down, reinserted the Audigy2, restarted, let CentOS
automatically remove the AC97 configs, add the Audigy2 configs, and
continue normally. Except the drive is now marked "dirty", requiring
"chkdsk", which doesn't run on Linux, and has no Linux equivalent. The
NTFS tools that come with fuse and fix the most basic state problems had
no effect. But if I force mount, the drive mounts and reads files fine
(I don't write to it in its dirty state).

	Then I shut down, added another WD2500 to the IDE as hdc, booted, and
the kernel didn't find hdc when it probed the IDE, though it did see
that there was a device on IDE1. I shut down, moved both WD2500s to
IDE1, booted, and the kernel found neither hdc nor hdd. So I can't dd
the NTFS drive to an ext3 (etc) Linux drive. Even when I removed the
Audigy2, left the TDM400P, restored the AC97 module, the kernel is not
finding the second IDE drive on probe, no matter where I install it on
the IDE buses.

	I can recover the drive with chkdsk on the WinXP machine that formatted
it, and either copy across the LAN or possibly mount in a USB enclosure
locally to the Ubuntu machine, then copy across USB to a locally mounted
Linux drive.

	But it looks like an IRQ conflict, or maybe DMA, or other conflict at
that level, is interfering with the IDE. The conflict didn't happen with
Audigy2 + TDM400P + IDE0/hda, but it does happen when adding hdb/c/d to
the mix, unless I remove the soundcard. Maybe the Audigy2 conflicts with
the TDM400P in a way that interferes with the IDE. This problem seems
like it could destroy drives quicker than their MTBF, so I thought I'd
throw it out there.


On Fri, 2006-10-06 at 00:26 -0700,
asterisk-users-request at lists.digium.com wrote:
> Date: Thu, 5 Oct 2006 16:44:10 +0000 (UTC)
> From: Dushyanth <dushyanth at gmail.com>
> Subject: [asterisk-users] Asterisk Server : IDE HDD frequent crash 
> To: asterisk-users at lists.digium.com
> Message-ID: <loom.20061005T181309-926 at post.gmane.org>
> Content-Type: text/plain; charset=us-ascii
> 
> Hey guys,
> 
> Iam having a peculiar problem with my asterisk installation. The
> specs 
> are..
> 
> [root at pbx ~]# asterisk -V
> Asterisk 1.2.7.1
> 
> Wildcard: Digium Wildcard TE110P T1/E1
> Wildcard TDM: Wildcard TDM400P REV I (4 modules) ( 2 FXO, 2 FXS)
> Wildcard TDM: Wildcard TDM400P REV I (4 modules) ( 1 FXO, 3 FXS)
> Wildcard TDM: Wildcard TDM2400P Prototype (24 modules) (12 FXO's -
> rest 
> empty)
> 
> Total 15 FX0's, 5 FXS out of which 5 to 6 FXO/FXS are being used. We
> have 
> about 300 active SIP accounts. 
> 
> Queues, SIP extensions, Agents are in MySQL database using asterisk 
> realtime static.
> 
> CPU : Intel(R) Xeon(TM) CPU 3.06GHz with Hyper threading
> RAM : 1G
> Mobo : Intel SE7501HG2
> 
> The system is stable, however, the IDE disk crashes every 3/4 months.
> There 
> are DMA timeout errors for the IDE disk before it fails completely.
> The 
> same issue occured for the past three disks and I was doubting the 
> recommended hdparm setting 
> 
> 'hdparm -d 1 -X udma2 -c 3 /dev/IDE Device'
> 
> So, I removed this setting after the last crash and the system workd
> fine 
> for another 3 months. Yes'day, the disk failed again with same
> symptoms. 
> All the disks were seagate baraccuda IDE drives.
> 
> zttool doesnt show any IRQ misses even without the above hdparm
> setting and
> there is no noticeable problem in asterisk with the PRI line etc.
> Below is 
> my /proc/interrupts as well as /dev/hda settings.
> 
> [root at pbx ~]# cat /proc/interrupts
>            CPU0       CPU1
>   0:   24771857   24719039    IO-APIC-edge  timer
>   1:        102         62    IO-APIC-edge  i8042
>   8:          1          0    IO-APIC-edge  rtc
>   9:          0          0   IO-APIC-level  acpi
>  14:     134159     135915    IO-APIC-edge  ide0
> 185:   32988610   16463264   IO-APIC-level  wctdm
> 193:   22173177   27275710   IO-APIC-level  wctdm
> 201:   21737611   27711650   IO-APIC-level  wctdm24xxp
> 209:   22038077   27401613   IO-APIC-level  wcte11xp
> 225:   18992311          0   IO-APIC-level  eth1
> 233:        117    1166879   IO-APIC-level  eth0
> NMI:          0          0
> LOC:   49493157   49493156
> ERR:          0
> MIS:          0
> 
> [root at pbx ~]# hdparm -i /dev/hda
> 
> /dev/hda:
> 
>  Model=ST340014A, FwRev=3.06, SerialNo=5JX96VFV
>  Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
>  RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
>  BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=16
>  CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=78165360
>  IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
>  PIO modes:  pio0 pio1 pio2 pio3 pio4
>  DMA modes:  mdma0 mdma1 mdma2
>  UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
>  AdvancedPM=no WriteCache=enabled
>  Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:
> 
>  * signifies the current active mode
> 
> I looked at the mailing lists and couldnt any such issues reported. 
> 
> Please advice. Should i be using SCSI disks on RAID 1 or something ?
> Will 
> that help ?
> 
> Also, should i be looking at any other mobo then Intel SE7501HG2 ?
> Iam 
> planning to put in a another asterisk server as a failover and would 
> appreciate inputs abt the kind of hardware i should be using for the
> system 
> with the specs i mentioned.
> 
> Thanks
> Dushyanth
> 
> 
-- 

(C) Matthew Rubenstein



More information about the asterisk-users mailing list