[Beowulf] Big storage

Loic Tortay tortay at cc.in2p3.fr
Wed Sep 12 03:14:58 PDT 2007

[Sorry, I can't seem to be able to write a non verbose answer :-(]

According to Michael Will / Sr. Cluster Engineer:
> Iozone is a very good tool to understand the performance of your storage
> for a variety of
> access patterns, including the impact of cache and ram.  It spits out
> the raw data and in addition
> allows writing it to a spreadsheet file that you can then graph with
> openoffice or excel in order
> to look at the dimensions you are interested in.
> The trick is to read the documentation and to specify the correct
> parameters for the use case you are interested in ;-)
IOzone is a good general purpose benchmark with some very interesting
features (AIO, multi-node, etc.)

I used it extensively in the past and have read the documentation
(a few times, thank you :-).

But as Joe points out, IOzone is very cache-friendly and that is not
necessarily useful.

We stopped using it for RFPs (about 4 years ago) because we need
something more useful in terms of I/O workload than (for example) "test
the global throughput of N threads all doing sequential writes".

We are more interested in gathering information on how some hardware
copes with a workload similar to the actual applications I/O workloads
rather than something synthetic.

IOzone has in my opinion several issues:
 . it's not possible to avoid the "rewrite" (or "re-read") test (or at
   least it wasn't last time I checked);
 . it's not possible to (precisely) specify different concurrent I/O
   workloads for different threads (or a sequence of I/O workloads for
   a single thread);
 . last I checked, the operation mode of IOzone in throughput
   (threaded) mode is "first done" ("stone-walling"), id est the first
   thread that reaches the I/O aim ("-s 1G" for instance) triggers the
   end of all threads;
 . the results are much less detailled than what we need;
 . it can only access files.

There is a "mixed" workload in IOzone, but (last time I checked) its
actual I/O workload is not precisely defined and it can't be specified.
For instance we might want to test how some hardware behaves when there
are 80% of the threads doing reads and 20% doing writes, reads being
random with small blocks and writes sequential with large blocks writes
(followed bu reads of the files just written).
In other words, we want to be able to specify the I/O workload
completely and precisely for each thread.
As far as I know you can't do that with IOzone.

The "first done" operation mode is only marginally useful if, for some
reason, a thread is priviledged (for instance the file it's accessing is
kept in whole in cache at the expense of the files accessed by other
threads) the results will be over-optimistic and almost certainly
If I'm not mistaken, this operation mode can be disabled with IOzone
'-x' option, but it's usually more useful to be able to have a fixed
duration run, with some or all threads "looping" to maintain a
"constant" I/O load.  Again, as far as I know, this can't be done with

Our HSM requires raw devices for its disk cache, if we want to test the
hardware in a somewhat useful way, we have to access raw devices and
not files.  As far as I know you can't do that with IOzone.

We have limited ressources, we sometimes do a single procurement for
disk to satisfy multiple needs: X% of the disk will be used for
high-level storage applications (for instance, in our context, Xrootd,
dCache, etc.) and Y% of the disk will be used as disk cache for our HSM
We can't expect the vendors to install the applications we use to test
how they behave, we can't afford to support the vendor installing and
running the applications or send someone/accept a loan to do the test
since there are just too many vendors and not enough people (nor time).

The reason why we didn't just modify IOzone to meet our needs, is that
its source code is quite horrible.  There is a single ~500 kBytes C
source file, with incredible preprocessor spaghetti code and a load of
other problems.

IOzone has an interesting POSIX asynchronous I/O feature, but the
library that does this has its own shortcomings.  If I'm not mistaken,
it's doing its own caching and does async I/O with "busy waits" rather
than asynchronous notifications.

Two of the open source benchmarks that are the closest to our needs are
filebench from Sun and Intel's IOmeter.
Both are quite nice but tend to be way too heavy-weight for us (pick
your choices among: built-in graphics generation, graphic front-end,
dependency on large external libraries, compilation on something other
than Linux or Solaris, etc.)  Plus, they can't access raw devices.

We have something light-weight (like IOzone) and useful (like filebench
or IOmeter) that can be sent to vendors before they answer to our RFPs,
so they can test their solution according to *our* performance
criteria.  It's also used as one of the validation tools during the
acceptance process after the procurement has been "awarded".
This tool is far from perfect but it does match our needs (it's
unfortunately not open source yet).

| Loïc Tortay <tortay at cc.in2p3.fr> -     IN2P3 Computing Centre     |

More information about the Beowulf mailing list