[Beowulf] Linux powers low-cost petabyte-level storage

David Kewley kewley at gps.caltech.edu
Wed Jun 22 10:53:24 PDT 2005

I'm skeptical about a number of the important details.  Quoting from the 

"The IA's PetaBox installation comprises about 16 racks housing 600 
systems with 2,500 spinning drives, for a total capacity of roughly 1.5 

Let's check their math.  The maximum capacity Capricorn offers right now 
is a rack of 40 1U nodes, each with four 400GB drives, for a total of 
64TB per rack.  600 systems means 2400 drives, not 2500 drives.  2400 
times 400GB is 0.96TB, not 1.5TB.  That is RAW capacity, before RAID, 
before hot spares, etc.

How do they get 1.5PB?  Are they disregarding RAID?  Are they assuming 
(or achieving) a ~1.5x compression of the data, and then reporting 
"capacity" as the size of the *uncompressed* data rather than the 
on-disk data?

"The IA systems boot Debian or Fedora Linux from a central PXE boot 
server, and are remotely monitored using nagios."

"... Saikley says Capricorn tried then backed away from RAID (redundant 
arrays of inexpensive disks), instead opting to recommend JBOD (just a 
bunch of disks) configurations to most of its clients. 'We had a 
painful experience with RAID 5, which does not scale well to 
petabyte-level storage,' Saikley notes."

There is no description in the article or the corporate website of how 
the raw disk capacity is integrated.  For all I know, the IA system is 
2400 (or 2500, depending on which of the reported numbers you believe) 
separate 400GB (raw) filesystems with no RAID and no hot spares.

The Capricorn products sounds sexy, and they are in fact fairly sexy in 
several different ways, but it is far from obvious that Capricorn has 
solved the hard problems.  It looks like they may well have made good 
hardware choices, but the challenge is on the software side.


The article:


The Capricorn website:


More information about the Beowulf mailing list