[Beowulf] integrating node disks into a cluster filesystem?

Joshua Baker-LePain jlb17 at duke.edu
Fri Sep 25 15:32:36 PDT 2009


On Fri, 25 Sep 2009 at 6:09pm, Mark Hahn wrote

> but since 1U nodes are still the most common HPC building block, and most of 
> them support 4 LFF SATA disks with very little added cost (esp using the 
> chipset's integrated controller), is there a way to integrate them into a 
> whole-cluster filesystem?

This is something I've considered/toyed-with/lusted after for a long 
while.  I haven't pursued it as much as I could have because the clusters 
I've run to this point have generally run embarrassingly parallel jobs, 
and I train the users to cache data-in-progress to scratch space on the 
nodes.  But there's a definite draw to a single global scratch space that 
scales automatically with the cluster itself.

> - obviously want to minimize the interference of remote IO to a node's jobs.
>  for serial jobs, this is almost moot.  for loosely-coupled parallel jobs
>  (whether threaded or cross-node), this is probably non-critical.  even for
>  tight-coupled jobs, perhaps it would be enough to reserve a core for
>  admin/filesystem overhead.

I'd also strongly consider a separate network for filesystem I/O.

> - distributed filesystem (ceph?  gluster?  please post any experience!)  I
>  know it's possible to run oss+ost services on a lustre client, but not
>  recommended because of the deadlock issue.

I played with PVFS1 a bit back in the day.  My impression at the time was 
they they were focused on MPI-IO, and the POSIX layer was a bit of an 
afterthought -- access with "regular" tools (tar, cp, etc) was pretty 
slow.  I don't know what the situation is with PVFS2.  Anyone?

> - this is certainly related to more focused systems like google/mapreduce.
>  but I'm mainly looking for more general-purpose clusters - the space would
>  be used for normal files, and definitely mixed read/write with something
>  close to normal POSIX semantics...

It seems we're after the same thing.

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF



More information about the Beowulf mailing list