[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Chris Samuel csamuel at vpac.orgMon Apr 6 18:34:29 PDT 2009
- Previous message: [Beowulf] OT: Windows tex editors/processors
- Next message: [Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
----- "Joe Landman" <landman at scalableinformatics.com> wrote: > Chris Samuel wrote: > > > for i in md[0123]; do > > echo check > /sys/block/$i/md/sync_action > > done > > Are these softirq cpu hangs? Nope, these are SCSI read errors back from the drives.. I've now been asked to update the IBM driver (they don't support the RHEL one) and the firmware on the disks, both of which have been released in the last few days with vaguely possibly applicable changelogs.. > could you tell me what > > cat /sys/block/md[0123]/md/stripe_cache_size > > reports? They're 256 on RHEL5.3 vanilla - same as on CentOS (2.6.18-92.1.10.el5PAE) and Debian (2.6.28.9). cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency
- Previous message: [Beowulf] OT: Windows tex editors/processors
- Next message: [Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
