[Beowulf] Re: mysterious slow disk

David Mathog mathog at caltech.edu
Wed Mar 5 11:44:59 PST 2008


I don't think I'm going to solve this one :-(.

Bruno Coutinho wrote:

>I noticed that some disks can't get full bandwidth at once.
<snip>
>A way to monitor disk throughput for a longer time is to use dd
>to copy a partition to /dev/null.

The disk is also slow for long dd operations by about the same factor.

SLOW: 
 % sync; \
   TIME=`accudate -t0` ; \
   dd if=/dev/zero count=4000000 of=/scratch/tmp/foo.dat ; \
   sync; \
   accudate -ds $TIME
4000000+0 records in
4000000+0 records out
2048000000 bytes (2.0 GB) copied, 72.4961 seconds, 28.2 MB/s
0000105.343

vs.

FAST:
  % sync; \
    TIME=`accudate -t0` ; \
    dd if=/dev/zero count=4000000 of=/scratch/tmp/foo.dat ; \
    sync; \
    accudate -ds $TIME
4000000+0 records in
4000000+0 records out
2048000000 bytes (2.0 GB) copied, 59.6326 seconds, 34.3 MB/s
0000075.422

(27Mb/sec sustained vs. 19.4Mb/sec sustained).

Carsten Aulbert wrote:

>Looks good. Can you run hdparm -I /dev/hda?
>And there please have a look at the line with acoustic mgmt:

hdparm -I was identical on fast and slow systems (except for
serial numbers). 

  hdparm -I /dev/hda                         

/dev/hda:

ATA device, with non-removable media
        Model Number:       WDC WD400BB-00DEA0                      
        Serial Number:      WD-WMAD11736294
        Firmware Revision:  05.03E05
Standards:
        Supported: 5 4 3 
        Likely used: 6
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:   78165360
        device size with M = 1024*1024:       38166 MBytes
        device size with M = 1000*1000:       40020 MBytes (40 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        bytes avail on r/w long: 40
        Standby timer values: spec'd by Standard, with device specific
minimum
        R/W multiple sector transfer: Max = 16  Current = 16
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4 
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    DOWNLOAD_MICROCODE
                SET_MAX security extension
                Automatic Acoustic Management feature set
           *    Device Configuration Overlay feature set
           *    SMART error logging
           *    SMART self-test
Security: 
                supported
        not     enabled
        not     locked
                frozen
        not     expired: security count
        not     supported: enhanced erase
HW reset results:
        CBLID- above Vih
        Device num = 0 determined by CSEL
Checksum: correct

All three Acoustic management settings of 0, 128, 254 were tried.
128 made it slightly slower still, 0 or 254 were apparently
equivalent, and it was at 0 to start with.  (The only difference
between 0 and 254 is that the former unchecks the "Automatic Acoustic
Management feature set" line.)

ariel sabiguero yawelak wrote:
>I found a situation pretty similar to this a few years ago in a system
>with shared video memory.

The system has a separate graphics card.

As a final shot the case was opened again and all of the
following checked, none of which made any difference
and/or were different from other systems:

1.  motherboard and disk jumpers
2.  IDE cable
3.  voltage on the power connector to the drive
4.  checked power supply with two testers (both showed PS in spec)
5.  cleared the BIOS with the CMOS jumper, loaded defaults,
    changed the few settings that were not at default to match
    the other systems.
6.  moved cable from first IDE primary IDE to secondary IDE socket
7.  Observed the exposed running disk.  The amount of vibration
    and temperature were typical, and there were no unusual noises.

Ran the Stream benchmark on both the slow and normal systems and
it scored the same.  The hdparm -T test is also the same on the slow
system as on the fast ones, it is only -t which is slow.  Seems
like there is something on the disk itself which is slow and it
isn't a CPU speed, memory speed, or even IDE bus speed issue.

The disk is apparently going, but it is an odd way to fail.  I almost
wonder if it has not stepped down from 7200 RPM to 5400 RPM.  That ratio
is 0.75, and the speed ratio in the dd test was .72, which is pretty
close.  It spins up with no problems though. 

Thanks all,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list