[Beowulf] Suggestions to what DFS to use
deadline at eadline.org
Mon Feb 13 18:00:17 PST 2017
> Hi guys,
> So, we're running a small(as in a small number of nodes(10), not
> storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
> Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
> have a GPFS transparency connector so that HDFS uses GPFS.
> Now, if I'd like to replace GPFS with something else, what should I use?
> It needs to be a fault-tolerant DFS, with POSIX support(so that users
> can move data to and from it with standard tools).
HDFS does have a NFSv3 gateway which helps users move
data around in a familiar fashion (without the -put -get commands).
If you need to use HDFS for big block local streaming performance
that feature can be useful. If you are doing Spark or MR where data
locality is important, then HDFS is a low cost alternative
to other file systems. Plus if you use something like
Ambari/Hortonworks the management is somewhat integrated
in the web-GUI. (Hortonworks is open source rpm based)
If you don't care about locality, then another file system
As an aside, having done a handful of Hadoop/Spark workshops
in the last year, I have found the single most difficult
aspect of Hadoop/HDFS and Spark on Hadoop/HDFS is understanding
the "remote" or non-local aspect of HDFS, i.e. the fact that
a copy of the data must be loaded into HDFS before it
can be used. The NFS gateway helps because files can
be seen in a users local file system. But I digress ...
> I've looked at MooseFS which seems to be able to do the trick, but are
> there any others that might do?
> Best regards,
> Tony Albers
> Systems administrator, IT-development
> Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
> Tel: +45 2566 2383 / +45 8946 2316
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> Mailscanner: Clean
More information about the Beowulf