[Beowulf] posting bonnie++ stats from our cluster: any comments about my I/O performance stats?

Thu Sep 24 19:06:33 PDT 2009

Rahul Nabar wrote:
> I now ran bonnie++ but have trouble figuring out if my perf. stats are
> up to the mark or not. My original plan was to only estimate the IOPS
> capabilities of my existing storage setup. But then again I am quite

Best way to get IOPs data in a "standard" manner is to run the type of 
test that generates 8k random reads.

I'd suggest not using bonnie++.  It is, honestly, not that good for HPC 
IO performance measurement.  I have lots of caveats on it, having used 
it for a while as a test, while looking ever more deeply at it.

I've found fio (http://freshmeat.net/projects/fio/) to be an excellent 
testing tool for disk systems.  To use it, compile it (requires libaio), 
and then run it as

	fio input.fio

For a nice simple IOP test, try this:

[random]
rw=randread
size=4g
directory=/data
iodepth=32
blocksize=8k
numjobs=16
nrfiles=1
group_reporting
ioengine=sync
loops=1

This file will do 4GB of IO into a directory named /data, using an IO 
depth of 32, a block size of 8k (the IOP measurement standard) with 
random reads as the major operation, using standard unix IO.  We have 16 
simultaneous jobs doing IO, each job using 1 file.  It will aggregate 
all the information from each job and report it, and it will run once.

We use this to model bonnie++ and other types of workloads.  It provides 
a great deal of useful information.

> ignorant about the finer nuances. Hence I thought maybe I should post
> the stats. here and if anyone has comments I'd very much appreciate
> hearing them. In any case, maybe my stats help someone else sometime!
> I/O stats on live HPC  systems seem hard to find.

It looks like channel bonding isn't helping you much.  Is your server 
channel bonded?  Clients?  Both?

> 
> Data posted below. Since this is an NFS store I ran bonnie++ from both
> a NFS client compute node and the server. (head node)
> 
> Server side bonnie++
> http://dl.getdropbox.com/u/118481/io_benchmarks/bonnie_op.html
> 
> Client side bonnie++
> http://dl.getdropbox.com/u/118481/io_benchmarks/bonnie_op_node25.html
> 
> 
> Caveat: The cluster was in production so there is a chance of
> externalities affecting my data. (am trying it hard to explain why
> some stats seem better on the client run than the server run)
> 
> Subsidary Goal: This setup had 23 clients for NFS. In a new cluster
> that I am setting up we want to scale this up about 250 clients. Hence
> want to estimate what sort of performance I'll be looking for in the
> Storage. (I've found most conversations with vendors pretty
> non-productive with them weaving vague terms and staying as far away
> from quantitative estimates as is possible.)

Heh ... depends on the vendor.  We are pretty open and free with our 
numbers (to our current/prospective customers), and our test cases. 
Shortly we are releasing the io-bm code for people to test single and 
parallel IO, and publishing our results as we obtain them.

> (Other specs: Gigabit ethernet. RAID5 array of 5 total SAS 10k RPM
> disks. Total storage ~ 1.5 Terabyte; both server and client have 16GB
> RAM; Dell 6248 switches. Port bonding on client servers)

What RAID adapter and drives?  I am assuming some sort of Dell unit. 
What is the connection from the server to the network ... single gigabit 
(ala Rocks clusters), or 10 GbE, or channel bonded gigabit?

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615