[Beowulf] PetaBytes on a budget, take 2

Eugen Leitl eugen at leitl.org
Fri Jul 22 00:05:11 PDT 2011

On Thu, Jul 21, 2011 at 11:55:59PM -0700, Greg Lindahl wrote:
> On Fri, Jul 22, 2011 at 01:44:56AM -0400, Mark Hahn wrote:
> > to be honest, I don't understand what applications lead to focus on IOPS
> > (rationally, not just aesthetic/ideologically).  it also seems like
> > battery-backed ram and logging to disks would deliver the same goods...
> In HPC, the metadata for your big parallel filesystem is a good example.
> SSD is much cheaper capacity at high IOPs than battery-backed RAM. (The
> RAM has higher IOPs than you need.)

Hybrid pools in zfs can make use both of SSD and real (battery-backed)
RAM disks ( http://www.amazon.com/ACARD-ANS-9010-Dynamic-Module-including/dp/B001NDX6FE
or http://www.ddrdrive.com/ ).


Additional advantage of zfs is that it can deal with the higher
error rate of consumer or nearline SATA disks (though it can do
nothing against enterprise disk's higher resistance to vibration), 
and also with silent bit rot with periodic scrubbing (you can
make Linux RAID scrub, but you can't make it checksum).
> For Big Data, there's often data that's hotter than the rest. An
> example from the blekko search engine is our index; when you type a
> query on our website, most often all of the 'disk' access is SSD.
> Big Data systems generally don't have a metadata problem like HPC
> does; instead of 200 million files, we have a couple of dozen tables
> in our petabyte database.

Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE

More information about the Beowulf mailing list