[Beowulf] A petabyte of objects

Ellis H. Wilson III ellis at cse.psu.edu
Tue Nov 13 21:17:53 PST 2012

On 11/13/12 19:00, Bill Broadley wrote:
> If you need an object store and not a file system I'd consider hadoop.

Eeek -- for .5MB to 10MB files is anathema for Hadoop.  As much as I 
love Hadoop, there's a tool for every job and I'm not sure this one 
quite fits for those file sizes.  If you had a decent chunk of larger 
files (i.e. > 64MB at the very least, ideally like 1GB files on 
average), Hadoop might work.

The specific use of the file system seems particularly relevant to this 
discussion, so if you can figure out some more hard and fast ideas about 
the ways in which your storage will be actually used, we'll probably 
have a better idea of what suggestion to offer.

IMHO, it's not the storage of that size of data annually that makes this 
a hard problem -- it's what you want to do with it (and how fast).  If 
you never want to look at it again, and you're receiving that 1PB over 
the duration of the year in a steady fashion, you'll note that this 
boils down to around 34MB/s.  Pretty easy for any parallel file system 
(or really, even a slow individual HDD, provided you just continued onto 
the next one once you filled the current one).  This becomes interesting 
if you need to handle big bursts of writes, big bursts of reads, reads 
of the whole (or large portions of the) data set, etc, etc.

Again, knowing what you need will help us a lot here.



