[Beowulf] Computation on the head node
Joe Landman
landman at scalableinformatics.com
Mon May 19 07:59:19 PDT 2008
Jeffrey B. Layton wrote:
> Here comes the $64 question - how do you benchmark the IO portion of your
> code so you can understand whether you need a parallel file system, what
> kind
> of connection do you need from a client to the storage, etc. This is a
> difficult
> problem and one in which I have an interest.
Yeah, it is hard. My own view is that it takes a hard looking at the
code if you can, and if you can't, we use dstat, atop, vmstat, and other
tools to see if the IO channel is full.
What we do here (if atop/dstat/... suggest that IO is an issue) is to
replicate the runs, and provide ever larger pipes to IO to see if this
ameliorates problems.
We have found it does for some codes (specific CFD and others).
It is hard in general to do a good IO benchmark, which is why we have
bonnie++ and IOzone. They aren't great, they have domains of applicability.
I wrote something called IO-bm to help a customer evaluate multiple
streams (reading/writing) to file system(s). I still have to get it
working with MPI-IO (it is an MPI code), but it seems to reflect
specific threaded IO workloads reasonably well. Its good enough to use
for some tuning effort on the underlying system.
> The best way I've found is to look a the IO pattern of your code(s). The
> best
Yup. For this you either need source or a way to profile the IOs ...
> I've found to do this is to run an strace against the code. I've written
> an strace
This does help.
Empirical data is better than none at all, even reconstructed data is
quite helpful
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf
mailing list