[Beowulf] Ethernet connected drives

Ellis H. Wilson III ellis at cse.psu.edu
Thu May 8 10:58:34 PDT 2014

On 05/08/2014 01:43 PM, Lockwood, Glenn wrote:
> On May 8, 2014, at 10:30 AM, Ellis H. Wilson III <ellis at cse.psu.edu>
> wrote:
>> On 05/08/2014 10:29 AM, John Hearns wrote:
>>> Forget building compute clusters - soon we will be building
>>> Beowulfs with disk drives!
>> Color me dubious.  I highly doubt there will be any entire clusters
>> of just HDDs anytime soon.  The cpu/ram you can fit on them will be
>> far lower than a full machine, even if you consider 16 of them or
>> so.
> This is the end goal of Hadoop clusters.  Not everything needs fast
> CPUs and a ton of RAM.

Maybe for some uses of Hadoop.  Not all, from my experience at least. 
Simpler things will benefit enormously.  High-shuffling MR applications 
(e.g., sort) will kick and scream when they have a factor or more 
machines to swap data with that are now slower.  Note that in the paper 
I shared the researchers look at MR specifically for on-SSD computation.

But you're right -- for very straightforward, I/O-heavy and CPU-berift 
workloads, a cluster of these could exist.  I guess I continue to 
believe the usability of a traditional cluster with SATA/SCSI attached 
versions of these (who can still filter) inside would be better.  The 
key here is being able to do simple queries/filters/etc on-drive, and 
just push needed data off-drive.

> Is this still a beowulf cluster?  Probably not, but these sorts of
> devices have a lot of utility in HPC.  We make heavy use of iSCSI on
> our largest machine, and we'd free up a fair amount of resources if
> we didn't have to wrap our iSCSI targets in iSCSI servers.

While they would unarguably be great for Big Data, generally stating 
they would be great for HPC is uncertain to me at least (with the 
exception of checkpointing).  These devices, for better or worse, make 
the I/O stack more, not less, indirect.  You are just issuing queries 
against a KV store instead of plain ol' blocks.  Not ideal conditions 
for complex I/O patterns and enabling richer locking semantics available 
in many robust parallel file systems.



Ph.D. Candidate
Department of Computer Science and Engineering
The Pennsylvania State University

More information about the Beowulf mailing list