[Beowulf] Striped file system with RAM disk.

Thu Nov 22 08:29:16 PST 2007

Alan Louis Scheinine a écrit :
> 
>    There is a particular kind of application, single-client
> and serial process, for which a striped file system using
> RAM disk would be very useful.  Consider reading small
> blocks at random locations on a hard disk.  The latency
> of the HDD could be a few milliseconds.  Adding more HDD's
> does not solve the problem, unlike an application based on
> streaming.  Adding more disks and parallelizing the program
> could be a solution but sometimes there is no time
> to parallelize the program.
> 
>    A possible solution is RAM disk.  But if we put, for example,
> 64 GB of RAM on a single computer then that computer becomes
> specialized and expensive, whereas the need for a huge
> amount of RAM may be only temporary.  An alternative is to
> use a cluster of nodes, a typical Beowulf cluster.  For example,
> using a striped file system over 16 nodes where each node has 4 GB
> of RAM.  Each node would have a normal amount of RAM and yet
> could provide the aggregate storage of 64 GB when the need arises.
> While we have not yet created this configuration, I suppose
> that Gbit Ethernet could provide 100 microsecond latency and
> Infiniband or Myrinet could provide 10 microsecond latency.
> Much, much less than the seek time of a HDD.
> 
>    The idea is so simple that I imagine it has already been done.
> I would be interested in learning from other sites that have
> used this method with a file system such as Lustre, PVFS2 or
> another.

I have played with this kind of idea, just for fun, using an exported fs
in tmpfs (so as a block in RAM), via AoE (Ata-over-Ethernet) and it was
mounted on a single node in a raid0 array.
I tried it just for fun in RAM, and a test: my idea was to export via
AoE over gigabit Ethernet some large blocks from /tmp partitions then
aggregate them in a raid0 array on another node and export this large FS
on NFS on top of 10G myrinet.

The purpose of this set of experiments could look strange, but, it was
designed to offer some easy to deploy, easy to use, large filesystem
with efficient throughput for a small set of clients using nfs,
leveraging the use of a single central FS, for specific jobs requiring
shared storage capacities independant of the load on the cluster, and
larger than the local /tmp.

Lustre or PVFS2 would be easier to export but do not fit well with the
platform I use: the AoE aspect can be interesting when aggregating some
storage ressources on a cluster.

Best regards,
-- Julien