[Beowulf] Linux powers low-cost petabyte-level storage
kewley at gps.caltech.edu
Wed Jun 22 10:53:24 PDT 2005
I'm skeptical about a number of the important details. Quoting from the
"The IA's PetaBox installation comprises about 16 racks housing 600
systems with 2,500 spinning drives, for a total capacity of roughly 1.5
Let's check their math. The maximum capacity Capricorn offers right now
is a rack of 40 1U nodes, each with four 400GB drives, for a total of
64TB per rack. 600 systems means 2400 drives, not 2500 drives. 2400
times 400GB is 0.96TB, not 1.5TB. That is RAW capacity, before RAID,
before hot spares, etc.
How do they get 1.5PB? Are they disregarding RAID? Are they assuming
(or achieving) a ~1.5x compression of the data, and then reporting
"capacity" as the size of the *uncompressed* data rather than the
"The IA systems boot Debian or Fedora Linux from a central PXE boot
server, and are remotely monitored using nagios."
"... Saikley says Capricorn tried then backed away from RAID (redundant
arrays of inexpensive disks), instead opting to recommend JBOD (just a
bunch of disks) configurations to most of its clients. 'We had a
painful experience with RAID 5, which does not scale well to
petabyte-level storage,' Saikley notes."
There is no description in the article or the corporate website of how
the raw disk capacity is integrated. For all I know, the IA system is
2400 (or 2500, depending on which of the reported numbers you believe)
separate 400GB (raw) filesystems with no RAID and no hot spares.
The Capricorn products sounds sexy, and they are in fact fairly sexy in
several different ways, but it is far from obvious that Capricorn has
solved the hard problems. It looks like they may well have made good
hardware choices, but the challenge is on the software side.
The Capricorn website:
More information about the Beowulf