[Beowulf] Linux powers low-cost petabyte-level storage

Eugen Leitl eugen at leitl.org
Wed Jun 22 02:07:09 PDT 2005

Also see threads on http://www.archive.org/web/petabox.php


Jun. 22, 2005

Capricorn Technologies says it has completed delivery of more than a petabyte
of storage to the Internet Archive, a non-profit organization based in San
Francisco that creates periodic snapshots of the Internet. Capricorn's
PetaBox products are based on Via mini-ITX boards running Debian or Fedora
Linux, and deliver the lowest cost-per-GB and cost-of-ownership available,
the company claims.

(Click for larger view of Capricorn PetaBox racks)

Capricorn started as a project within the Internet Archive (IA) to develop
inexpensive storage devices based on Linux and commodity PC components. The
project was spun out in June of 2004, resulting in the formation of Capricorn
Technologies. The company has since supplied its PetaBox products to a number
of universities, research centers, libraries, and national archives, both
within the US and overseas, according to CEO C.R. Saikley. The IA remains
Capricorn's largest customer, however, Saikley says.

The IA's PetaBox installation

The IA is an online digital library with very large collections of audio,
video, texts, web sites, and software. For example, it claims to host footage
of more than 20,000 live concerts, and snapshots of the Internet dating back
to 1996, accessible through the well-known Wayback Machine, which currently
hosts over 40 billion web pages.

The IA's PetaBox installation comprises about 16 racks housing 600 systems
with 2,500 spinning drives, for a total capacity of roughly 1.5 petabytes.
Despite its large size, the IA's PetaBox installation draws only about 50kW
of power, Saikley says, and is maintained by one full- and one half-time
person who spend a disproportionate amount of time working on older systems.
"We've improved reliability considerably," Saikley claims.

The IA systems boot Debian or Fedora Linux from a central PXE boot server,
and are remotely monitored using nagios. "The beauty of nagios is that it is
so readily extensible," says Saikley. "If the register exists on the board,
nagios can figure out how to read it. We typically provide hard disk
temperatures, cpu temperatures, ping response, capacity utilization, that
sort of thing."

The PetaBox can also be managed by Linux cluster management software,
according to Saikley.

The PetaBox

Capricorn claims that its PetaBox storage devices provide the lowest
ownership cost and cost-per-GB available. The company offers 40- and
64-terabyte models comprised of racks with 40 1U systems. The 1U systems are
available in 1- and 1.6-terabyte models that are essentially the same but for
hard-drive capacity. Both systems run Debian or Fedora Linux on Via mini-ITX

The PetaBox is based on Via mini-ITX motherboards

Each 1U system includes a Via M-10000 mini-ITX board with a 1GHz Via C3
processor and 512MB of RAM, expandable to 1GB. Each includes four Hitachi ATA
hard drives with 8MB caches and a claimed 8.5ms of typical latency.

Saikley says Capricorn did extensive testing to qualify hard drives for
capacity, reliability, and cost, finally choosing Hitachi. "Although Hitachi
does not offer an 'enterprise' or '24x7' SATA drive, our testing found their
drives to be as reliable as anything out there, enterprise distinction or
not," Saikley said.

The 1U PetaBox units (shown stacked in a rack, on the right) include all I/O
on the front panel, reducing the need to access the back panel while
maximizing its cooling capacity. Drives are housed in EZ-Latch bays that can
be easily changed after the 1U unit is removed from the rack and had its
cover removed. "We experimented with hot-swap, but found it caused as many
problems as it solved. It actually induced failures, so we backed away. But
you still have to make it easy to replace disks," Saikley said.

Similarly, Saikley says Capricorn tried then backed away from RAID (redundant
arrays of inexpensive disks), instead opting to recommend JBOD (just a bunch
of disks) configurations to most of its clients. "We had a painful experience
with RAID 5, which does not scale well to petabyte-level storage," Saikley

PetaBox options include a 16 x 2 LCD display and gigabit Ethernet (10/100 is
standard). The PetaBox is configured by default to boot from a USB key, then
from a PXE boot server, and finally from the local hard drive. However, boot
order can easily be changed in the BIOS.

Each 1.6-terabyte 1U system draws 80 Watts of power (typical), or about 50
Watts per terabyte, according to Capricorn. Each measures 17.25 x 18 x 1.72
inches (43.8 x 45.7 x 4.4 cm), and weighs 18 lbs, 12 oz (8.5 kg).

According to Saikley, Capricorn is currently positioning itself for increased
production levels, following recent improvements to its manufacturing
process. "We have been constantly improving the efficiency and effectiveness
of our manufacturing processes. By positioning ourselves for increased
production levels, we are better able to pursue our relentless commitment to
driving the cost of storage down."


The PetaBox is available now, priced at approximately $2/GB, in 40- and
64-terabyte capacities. Further details are on the company's website.

Eugen* Leitl <a href="http://leitl.org">leitl</a>
ICBM: 48.07100, 11.36820            http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050622/3d224af4/attachment.sig>

More information about the Beowulf mailing list