[Beowulf] Ethernet connected drives
Ellis H. Wilson III
ellis at cse.psu.edu
Thu May 8 10:58:34 PDT 2014
On 05/08/2014 01:43 PM, Lockwood, Glenn wrote:
> On May 8, 2014, at 10:30 AM, Ellis H. Wilson III <ellis at cse.psu.edu>
>> On 05/08/2014 10:29 AM, John Hearns wrote:
>>> Forget building compute clusters - soon we will be building
>>> Beowulfs with disk drives!
>> Color me dubious. I highly doubt there will be any entire clusters
>> of just HDDs anytime soon. The cpu/ram you can fit on them will be
>> far lower than a full machine, even if you consider 16 of them or
> This is the end goal of Hadoop clusters. Not everything needs fast
> CPUs and a ton of RAM.
Maybe for some uses of Hadoop. Not all, from my experience at least.
Simpler things will benefit enormously. High-shuffling MR applications
(e.g., sort) will kick and scream when they have a factor or more
machines to swap data with that are now slower. Note that in the paper
I shared the researchers look at MR specifically for on-SSD computation.
But you're right -- for very straightforward, I/O-heavy and CPU-berift
workloads, a cluster of these could exist. I guess I continue to
believe the usability of a traditional cluster with SATA/SCSI attached
versions of these (who can still filter) inside would be better. The
key here is being able to do simple queries/filters/etc on-drive, and
just push needed data off-drive.
> Is this still a beowulf cluster? Probably not, but these sorts of
> devices have a lot of utility in HPC. We make heavy use of iSCSI on
> our largest machine, and we'd free up a fair amount of resources if
> we didn't have to wrap our iSCSI targets in iSCSI servers.
While they would unarguably be great for Big Data, generally stating
they would be great for HPC is uncertain to me at least (with the
exception of checkpointing). These devices, for better or worse, make
the I/O stack more, not less, indirect. You are just issuing queries
against a KV store instead of plain ol' blocks. Not ideal conditions
for complex I/O patterns and enabling richer locking semantics available
in many robust parallel file systems.
Department of Computer Science and Engineering
The Pennsylvania State University
More information about the Beowulf