[Beowulf] I/O workload of an application in distributed file system
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert Latham robl at mcs.anl.govMon Nov 26 08:06:03 PST 2007
- Previous message: [Beowulf] I/O workload of an application in distributed file system
- Next message: [Beowulf] I/O workload of an application in distributed file system
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Nov 22, 2007 at 10:15:25AM -0500, Mark Hahn wrote: > with that in mind, my opinion is that cluster IO testing should be > a combination of: > - parallel streaming IO to separate files - resembling a checkpoint, > or an IO-intensive app reading, or an app where the user forgot to > turn off debugging. > - smallish metadata-heavy traffic like time(tar zxf;make;make clean). The word 'distributed' in the subject is telling... I like to make a distiction between 'distributed', 'cluster', and 'parallell' file systems. Distributed: uncorrdinated access among processes. Possibly over the wide area. Total capacity is important, but performance is not. Cluster: local access only. maybe homedir-style accesses (lots of metadata operations, lots of small file creation/reading/writing -- unpack a tarball, compile a kernel). also has uncoordinated access among many processes. Parallel: a high performance file system for parallel applications doing large amounts of I/O. Coordinated access, likely via MPI-IO. This is verring a bit off topic from the original question... I'd like to suggest that I/O to separate files, while certainly a popular I/O workload, should be considered a legacy workload, or at the very least not a high-performance workload. Applications should be encouraged if at all possible to do their I/O to a single large file. Supercompuer applications, further, should do all their I/O through either MPI-IO or a high-level library on top of MPI-IO (parallel-HDF5, parallel-NetCDF, etc). Lots of files compilcates the data management problem and eliminiates several optimization opportunities for the I/O software stack. ==rob -- Rob Latham Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B
- Previous message: [Beowulf] I/O workload of an application in distributed file system
- Next message: [Beowulf] I/O workload of an application in distributed file system
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
