[Beowulf] PetaBytes on a budget, take 2

Marian Marinov mm at yuhu.biz
Thu Jul 21 14:58:16 PDT 2011


On Thursday 21 July 2011 21:55:30 Ellis H. Wilson III wrote:
> On 07/21/11 14:29, Greg Lindahl wrote:
> > On Thu, Jul 21, 2011 at 12:28:00PM -0400, Ellis H. Wilson III wrote:
> >>  For traditional Beowulfers, spending a year or two developing custom
> >> 
> >> software just to manage big data is likely not worth it.
> > 
> > There are many open-souce packages for big data, HDFS being one
> > file-oriented example in the Hadoop family. While they generally don't
> > have the features you'd want for running with HPC programs, they do
> > have sufficient features to do things like backups.
> 
> I'm actually doing a bunch of work with Hadoop right now, so it's funny
> you mention it.  My experience with and understanding of Hadoop/HDFS is
> that it is really more geared towards actually doing something with the
> data once you have it on storage, which is why it's based of off google
> fs (and undoubtedly why you mention it, being in the search arena
> yourself).  As purely a backup solution it would be particularly clunky,
> especially in a setup like this one where there's a high HDD to CPU ratio.
> 
> My personal experience with getting large amounts of data from local
> storage to HDFS has been suboptimal compared to something more raw, but
> perhaps I'm doing something wrong.  Do you know of any distributed
> file-systems that are geared towards high-sequential-performance and
> resilient backup/restore?  I think even for HPC (checkpoints), there's a
> pretty good desire to be able to push massive data down and get it back
> over wide pipes.  Perhaps pNFS will fill this need?
> 

I think that GlusterFS would fit perfectly in that place. HDFS is actually a 
very poor choice for such storages because its performance is not good.

The article explaines that they have compared JFS, XFS and Ext4. When I was 
desiging my backup solution I also compared those 3 and GlusterFS on top of 
them.

I also concluded that Ext4 was the way to go. And with utilizing LVM or having 
a software to handle the HW failures it actually prooves to be quite suitable 
for backups. The performance of Ext4 is far better then JFS and XFS, we also 
tested Ext3 but abondand that.

However I'm not sure that this kind of storage is very good for anything else 
then backups. I believe that more random I/O may kill the performance and 
hardware of such systems. If you are doing only backups on these drives and 
you are keeping hot spares on the controler having a tripple failure is quite 
hard to achieve. And even in those situations if you lose only a single RAID6 
array, not the whole storage node. 

Currently my servers are with 34TB capacity, and what these guys show me, is 
how I can rearange my current hardware and double the capacity of the backups. 
So I'm extremely happy that they share this with the world.



-- 
Best regards,
Marian Marinov
CEO of 1H Ltd.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20110722/5077f097/attachment.sig>


More information about the Beowulf mailing list