[Beowulf] I/O bound simulation

Joe Landman landman at scalableinformatics.com
Fri Nov 30 15:33:40 PST 2007

Hi Mark:

Mark Kozikowski wrote:


> When I approach the higher fidelity levels, the simulation starts
> to choke on the quantity of data being processed.
> It appears that the system is failing on I/O. Transferring large
> amounts of time critical data between process elements.

Could you describe what you mean by "choke" and what you mean by 
"failing on I/O"?  This would help enormously.

Also, could you tell us what

	uname -a

reports, as well as what type of network you are using, and the NIC and 
switch type for laughs ?  (Intel, broadcom, SMC, ...)

> I a running on a mostly standard Red Hat distro, no special
> compiling or running architectures are in place.

Is this something you built from source?  Using MPI?

> Do any of you have suggestions as to how I might start
> getting control of this I/O problem?

First is problem identification, which you may have gotten a good start 
on.  It would help to know what I indicated above.  Also, it might be 
worth it if you grab a copy of dstat 
(http://dag.wieers.com/rpm/packages/dstat/) and atop 
(http://dag.wieers.com/rpm/packages/atop/) and install them.  Dstat is 
your friend (though it does make mistakes on aggregate IO calculations, 
it is useful at figuring out other relevant information).  Atop is your 
friend on your file server node.

Run atop on the head node, and dstat on the compute nodes while running 
your job.  Try to capture some of this output ... simple cut and paste 
is fine.  If you can show a "choked" versus "non-choked" run, this would 
help immensely in diagnostics.

Once we are sure where the point of pain is, the next steps would be 
planning for remediation of the same.


Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615

More information about the Beowulf mailing list