[Beowulf] Big storage
Bogdan Costescu
Bogdan.Costescu at iwr.uni-heidelberg.de
Fri Sep 14 08:50:20 PDT 2007
On Fri, 14 Sep 2007, Bruce Allen wrote:
>> I will try to get fsprobe deployed on as much of the Nordic LHC storage as
>> possible.
>
> I'll get fsprobe up and going on the new systems I am putting
> together in Hannover, and will also try and encourage the right
> people to get it running on some of the LIGO Scientific
> Collaboration's other storage systems.
I might be dense after holiday, but I still don't get the reasons for
such an interest in running fsprobe. I can see it being used as a
burn-in test and to prove that a running system can write then read
data correctly, but what does it mean about the data that is already
written or about the data that is in flight at the time fsprobe is
run? (someone else asked this question earlier in the thread and
didn't get an answer either) How is fsprobe as a burn-in test better
than, say, badblocks ?
I am genuinely interested in these answers because I have written a
somehow similar tool 5-6 years ago to test new storage, simply because
I didn't trust enough the vendors' burn-in test(s). My interest was a
bit larger in the sense that apart from data correctness I was also
checking the behaviour of FS quota accounting (by creating randomly
sized files with random ownership) and of the disk+FS in face of
fragmentation (by measuring "instantaneous" speed). But I never saw
the potential usage by other people mainly because I could not find
answers to the above questions, so I never thought about making it
public... and now it's too late ;-)
There is another issue that I could never find a good answer to: how
much testing a storage device should withstand before the testing
itself becomes dangerous or disturbing ? Access by the test tool
requires usage of resources: sharing of connections, poluting of
caches, heads that have to be moved. For example, for the 1.something
GB/s figure that was mentioned earlier in this thread, would you
accept a halving of the speed while the data integrity test is being
run ? Or more generally, how much of the overall performance of the
storage system would you be willing to give up for the benefit of
knowing that data can still be written and then read correctly ? And
sadly I miss some data in the results that Google and others published
recently: how much were the disks seeking (moving heads) during their
functioning ? I imagine that it's hard to get such data (should
probably be from the disk as opposed to kernel, as firmware could
still reorder), but IMHO is valuable for those designing multi-user
storage systems where disks move heads frequently to access files
belonging to different users (and therefore spread on the disk) that
are used "simultaneously".
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De
More information about the Beowulf
mailing list