[Beowulf] Suggestions to what DFS to use
Tony Brian Albers
tba at kb.dk
Tue Feb 14 05:14:08 PST 2017
On 2017-02-14 03:00, Douglas Eadline wrote:
>> Hi guys,
>> So, we're running a small(as in a small number of nodes(10), not
>> storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
>> Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
>> have a GPFS transparency connector so that HDFS uses GPFS.
>> Now, if I'd like to replace GPFS with something else, what should I use?
>> It needs to be a fault-tolerant DFS, with POSIX support(so that users
>> can move data to and from it with standard tools).
> HDFS does have a NFSv3 gateway which helps users move
> data around in a familiar fashion (without the -put -get commands).
> If you need to use HDFS for big block local streaming performance
> that feature can be useful. If you are doing Spark or MR where data
> locality is important, then HDFS is a low cost alternative
> to other file systems. Plus if you use something like
> Ambari/Hortonworks the management is somewhat integrated
> in the web-GUI. (Hortonworks is open source rpm based)
> If you don't care about locality, then another file system
> will work.
> As an aside, having done a handful of Hadoop/Spark workshops
> in the last year, I have found the single most difficult
> aspect of Hadoop/HDFS and Spark on Hadoop/HDFS is understanding
> the "remote" or non-local aspect of HDFS, i.e. the fact that
> a copy of the data must be loaded into HDFS before it
> can be used. The NFS gateway helps because files can
> be seen in a users local file system. But I digress ...
>> I've looked at MooseFS which seems to be able to do the trick, but are
>> there any others that might do?
>> Best regards,
>> Tony Albers
>> Systems administrator, IT-development
>> Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
>> Tel: +45 2566 2383 / +45 8946 2316
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> Mailscanner: Clean
Some very good points there. No doubt the NFS gateway can be useful.
But, NFS gateway in itself is not enough for our purposes.
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316
More information about the Beowulf