[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
Chris Samuel
csamuel at vpac.org
Mon Apr 6 05:46:29 PDT 2009
----- "Rahul Nabar" <rpnabar at gmail.com> wrote:
> I contact Dell. Responses range from the clueless to absurd. First,
> they convinced us it was Fedora. So I shifted to CentOS. They still
> claim CentOS is "unvalidated" but I refuse to spend a fortune to move
> over to RHEL like they want me to.
Not that this helps, but you have my sympathy as I've
been dealing with the same stuff from IBM over a storage
server they sold us.
Turns out I can make 7-12 drives in their external
enclosures fail in short order (seconds to minutes
between failures) by telling the software RAID to
do a check, thus:
for i in md[0123]; do
echo check > /sys/block/$i/md/sync_action
done
Even though we could reproduce it on 64-bit Debian
and 32-bit CentOS they wouldn't escalate the issue
until we could reproduce it on RHEL5 - which we did
today.
Sigh..
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
More information about the Beowulf
mailing list