Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Storage - the end of RAID?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Greg Lindahl lindahl at pbm.com
Fri Oct 29 12:48:29 PDT 2010


On Fri, Oct 29, 2010 at 03:02:45PM -0400, Ellis H. Wilson III wrote:

> I think it's making a pretty wild assumption to say search engines and  
> HPC have the same I/O needs (and thus can use the same I/O setups).

Well, I'm an HPC guy doing infrastructure for a search engine, so I'm
not assuming much. And I didn't say the setup would be the same --
just that Lustre/PVFS would probably be more reliable and higher
performance if they stored copies on multiple servers instead of using
local or SAN RAID. (Or did they implement this while I wasn't looking?)

> Also, I'd be blown away if Blekko wasn't doing it's own  
> striping/redundancy - even if they aren't using RAID 0 or 1 by the book,  
> they probably are using the same concepts (though hand-spun for search  
> engine needs).

We do the usual thing: store 3 copies on 3 different servers, locality
picked such that a single network or power failure won't take out more
than 1 copy. Since we are very concerned about transfer rates, it's
well worth buying more disks because the read speed increases.

> I don't think the "whole internet" takes up 5 petabytes,  

The internet is infinite in size thanks to websites that generate data
(or crap). Our 3 billion page crawl (1/5 of the size we dream of) is
257 tbytes (compressed), and the corresponding index is 77 terabytes
(very compressed). (Yes, we have a lot of disk space empty at the moment.)

-- greg






More information about the Beowulf mailing list