[Beowulf] I/O workload of an application in distributed file system

Mark Hahn hahn at mcmaster.ca
Thu Nov 22 07:15:25 PST 2007

> sytem (eg. database, webserver and so on). But i want to find more
> info on distributed file systems (eg. checkpoint read/write).

our experience with filesystems is that you can model checkpoints 
as large, multithreaded, sequential IO.  but while that may be an 
important IO mode, it's not the dominant one - most of our IO
is almost certainly smallish, metadataish stuff.  users compiling,
doing 'ls' over and over waiting for their job to produc output, etc.

with that in mind, my opinion is that cluster IO testing should be
a combination of:
 	- parallel streaming IO to separate files - resembling a checkpoint,
 	or an IO-intensive app reading, or an app where the user forgot to
 	turn off debugging.
 	- smallish metadata-heavy traffic like time(tar zxf;make;make clean).

More information about the Beowulf mailing list