[Beowulf] posting bonnie++ stats from our cluster: any comments about my I/O performance stats?
cousins at umit.maine.edu
Fri Sep 25 15:47:22 PDT 2009
I went through a fair amount of work with this sort of thing (specifying
performance and then getting the vendor to bring it up to expectations
when performance didn't come close) and I was happiest with Bonnie++ in
terms of simplicity of use and the range of stats you get. I haven't kept
up with benchmark tools over the last year though. They are benchmarks so
as you often hear here: "it all depends". As in, it depends on what sort
of applications you are running and whether you want to tune for IOPS or
throughput. Sequential or Random. Etc.
First thing is that I'd concentrate on the local (server side) performance
and then once that is where you expect it work on the NFS side.
One thing to try with bonnie++ is to run multiple instances at the same
time. For our tests, one single instance of bonnie showed 560 MB/sec
writes and 524 MB/sec reads. Going to 4 instances at the same time brought
it up to an aggregate of ~600 MB/sec writes and ~950 MB/sec reads.
One note about bonding/trunking, check it closely to see that it is
working the way you expect. We have a cluster with 14 racks of 20 nodes
each rack with a 24 port switch at the top. Each of these switches has
four ports trunked together back to the core switch. All nodes have two
GbE ports but only eth0 was being used. It turns out that all eth0 MAC
addresses in this cluster are even. The hashing algorithm on these
switches (HP) only uses the last two bits of the MAC address for a total
of four paths. Since all MAC's were even it went from four choices to two
so we were only getting half the bandwidth.
Once the server has the performance you want, I'd use Netcat from a number
of clients at the same time to see if your network is doing what you want.
Use netcat and bypass any disks (writing to /dev/null on the server and
reading from /dev/zero on the client and vica versa) in order to test that
bonding is working. You should be able to fill up the network pipes with
aggregate tests from multiple nodes using netcat.
Then, test out NFS. You can do this with netcat or with bonnie++ but again
I'd recommend running it on multiple nodes at the same time.
Good luck. It can be quite a process sorting through it all. I really just
meant to comment on your use of only one instance of Bonnie++ on the
server. Sorry to go beyond the scope of your question. You probably have
already done these other things in a different way.
Rahul Nabar wrote:
> I now ran bonnie++ but have trouble figuring out if my perf. stats are
> up to the mark or not. My original plan was to only estimate the IOPS
> capabilities of my existing storage setup. But then again I am quite
> ignorant about the finer nuances. Hence I thought maybe I should post
> the stats. here and if anyone has comments I'd very much appreciate
> hearing them. In any case, maybe my stats help someone else sometime!
> I/O stats on live HPC systems seem hard to find.
> Data posted below. Since this is an NFS store I ran bonnie++ from both
> a NFS client compute node and the server. (head node)
> Server side bonnie++
> Client side bonnie++
> Caveat: The cluster was in production so there is a chance of
> externalities affecting my data. (am trying it hard to explain why
> some stats seem better on the client run than the server run)
> Subsidary Goal: This setup had 23 clients for NFS. In a new cluster
> that I am setting up we want to scale this up about 250 clients. Hence
> want to estimate what sort of performance I'll be looking for in the
> Storage. (I've found most conversations with vendors pretty
> non-productive with them weaving vague terms and staying as far away
> from quantitative estimates as is possible.)
> (Other specs: Gigabit ethernet. RAID5 array of 5 total SAS 10k RPM
> disks. Total storage ~ 1.5 Terabyte; both server and client have 16GB
> RAM; Dell 6248 switches. Port bonding on client servers)
More information about the Beowulf