[Beowulf] Hadoop's Uncomfortable Fit in HPC

Mon May 19 22:20:27 PDT 2014

On 05/19/2014 11:26 PM, Lockwood, Glenn wrote:
> I appreciate your commentary, Ellis.  I agree and disagree with you
> on your various points, and a lot of comes from what (I suspect) is
> the difference in our perspectives:

Hi Glenn -- sorry to come off uncouth in certain instances.  Was writing 
strictly "reviewer" like in my response and had limited time, so I 
expect some of my comments were misunderstood.  Thoughts on the key issues:

>> It's too much wrapped into one thing, and only leads to idealogical
>> conversations (best left for the suits).
>
> Maybe I'm guilty of being a suit, or maybe I work for suits.  But
> it's what people understand, and it's not an imprecise term to use in
> many contexts.

Wasn't calling you a suit by any means.  I stand by my dislike for the 
Hadoop term though.  It's as precise as your example of an atom. 
Fortunately for physics, people aren't purchasing machines to run "Atom" 
on it.  So imprecision for the sake of education is less harmful there.

>> 2. You absolutely do not need to use all of the Hadoop sub-projects
>> to use MapReduce, which is the real purpose of using "Hadoop" in
>> HPC at all.  There are already perfectly good, high-bandwidth,
>> low-latency, scalable, semantically-rich file systems in-place and
>> far more mature than HDFS.  So why even bother with HOD (or
>> myhadoop) at all?  Just use MapReduce on your existing files.
>
> This isn't a bad idea, and if it works for a specific problem, run
> with it.  I don't advocate this for four reasons though: 1.
> Configuration and debugging can get very obscure very quickly as you
> try using other tools on top of mapreduce. 2. The performance still
> isn't good--the original implementation of MapReduce-on-Lustre goes
> into nice detail about this.  It's still not as scalable as using a

I detail main issues revolving around performance and scalability with 
MapReduce on a distributed NAS in my paper I referred to. 
Long-story-short, I was doing in excess of 5.5GB/s using 50 machines 
with just 1GBps Ethernet connections of them.  All spinning rust in the 
remote NAS.  Limited fancy footwork, except that there was one local HDD 
per machine for shuffling and I made sure configurations were 
reasonable.  Nothing really genius going on, but I suppose non-standard.

>> which is what you have to do with these inane "on-demand"
>> solutions.
>
> Be nice.
>
> These "inane" solutions were missing from the national
> cyberinfrastructure portfolio here in the US, they were requested by
> users, and they were requested by our federal government.  Being able
> to run Hadoop on a conventional HPC platform is a capability, and it
> uniquely enables us to do a variety of things (teaching, exploratory
> work, etc) that cannot be done on any other system in the US CI
> portfolio.

But this is a band-aid.  Not a real solution.  That's the core of my 
gripe.  The wound, if you will, is a dearth of easy to program and 
readily scalable frameworks like MR in the HPC space (yes, I always 
refer to MR as Hadoop MR, because I'm a poor graduate student who 
doesn't work for Google nor has funds for MapR nor patience for 
non-standard alternatives).

So, my suggestion continues to be to find ways to add MR as a hammer in 
the toolbox beside the protractor that is MPI+C.  No need to replace the 
toolbox, much less the lumber.  The inane part is forcing the 
"on-demand" instance to come up with every aspect of Hadoop.  You 
wouldn't force a user to spin up a brand new VM just because they wanted 
to switch from Perl to Python, would you?  Similarly, I argue you 
shouldn't just when they want to use MR.

> This is where my perspective from the operational HPC angle becomes
> apparent.  Is this the best way of doing it?  Absolutely not.  Does
> it work, does it work reasonably well, does it provide a unique
> capability for the national research community, and does it satisfy
> our funding body?  Yes, yes, yes, and yes.

I'm not advocating for some hyper-optimal solution.  You'll find I'm far 
more an engineer than a scientist (hence the leaving academia bit). 
However, I won't hesitate to call somebody out when they are holding a 
hammer backwards, but saying it works fine.  Or, more applicably to this 
situation, demanding that we can only use a hammer when we have 
drunk-goggles on and are wear 10 foot stilts, and then complaining about 
the hammers efficacy.

>> 3. Java really doesn't matter for MR or for YARN.  For HDFS (which,
>> as mentioned, you shouldn't be using in HPC anyhow), yea, I'm not
>> happy with it being in Java, but for MR, if you are experiencing
>> issues relating to Java, you shouldn't be coding in MR.  It's not a
>> MR-ready job.  You're being lazy.
>
> I'm not clear on the point being made here.  Should such a user not
> use Hadoop's MapReduce at all and defer to something else?

No, just use the tool for the right things.  I'm saying people shouldn't 
try to just learn MR and fit everything into that square hole, nor learn 
MPI+C and try to fit everything through that circular one.  Different 
tools for different purposes.  Same storage framework underneath.

>> MR should really only be used (in the HPC context) for pre- and
>> post-computation analytics.
>
> Says who?  I don't disagree, but you're sounding like a computer
> scientist here.  From the operational perspective, one never

Luckily, I am one :D.  I am also somebody who values their time, and the 
time on the system, which is why I stand by my perspective that core HPC 
applications should not be written in MR unless it is overhauled into C. 
  As you state very cogently, HPC has already provided a fabulous 
framework for hardcore computation.  I'm just saying "yes, let's use 
it!"  And let's save MR for solely bandwidth-restricted reasonably 
simple analytics, as it was intended in the original Google paper. 
Climate applications and the like in MR make me want to jump of a bridge.

> proscribes the tools/languages/libraries a researcher should use to
> solve a problem.  This was one of the most insulting things I was
> told by HPC staff when I was a researcher.

Fortunately, I take no offense when a physicist tells me I'm doing it 
wrong when I'm trying to smash atoms with a sledgehammer instead of the 
collider down the hall.  I work day-in and day-out in a small company 
with a number of economists, and we recognize what each of us is good 
at.  It's almost Unix'ian -- do one thing and do it well, right?  I see 
no reason why that same mentality shouldn't be applied everywhere. 
Sorry, I really don't follow you here.

>> For all I care, it could be written in BASIC (or Visual BASIC, to
>> bring the point home). Steady-state bandwidth to disk in Java is
>> nearly equivalent to C.  Ease of coding and scalability is what
>> makes MR great.
>
> Now is this generic MapReduce, or Hadoop's implementation of
> MapReduce (HIOMR)?  Perhaps we should just call it "Hadoop."  All
> jesting aside, this point isn't as clear to me.  I will say, though,
> that I'm a big advocate of using non-Java languages to teach Hadoop
> to researchers.  There are many cases where one language makes a lot
> more sense than another when mapreducing from a practical level.

Absolutely.  I was simply saying the language doesn't matter, somewhat 
contrary to your article.  Write MR as fast as you can in whatever 
language you want, because frankly it doesn't matter if you are using it 
for the right types of applications.  If you find yourself limited by 
the performance of the language you are using, you shouldn't be writing 
MR.  Go back to C+MPI/etc.  You have a CPU/Memory/etc limited problem, 
not a I/O-limited problem.  Again, wrong tool.

> This is not the entire point of HDFS.  MapReduce does not work
> without a distributed file system.  Thus, I would argue that giving

Sorry, but this is wrong.  Even the very first example on installing 
HDFS on the Hadoop site shows how to use it on a single box (and 
therefore, a single HDD).  If you have a single NAS box, it works fine. 
  Just point it there.  The file:/// URI doesn't care if its NFS or a 
local disk.

Moreover, with regard to the "entire point" bit, I was overemphasizing 
for sure, but per the GFS paper the very first assumption is: "The 
system is built from many inexpensive commodity components that often 
fail. It must constantly monitor itself and detect, tolerate, and 
recover promptly from component failures on a routine basis."  This is 
the core assumption in GFS from my reading of the paper, and why Google 
came up with it.  HDFS is no more than a FOSS version of GFS.

>> really was designed to be put on crappy hardware and provide really
>> nice throughput.  Responsiveness, POSIX semantics, etc, are all
>> really inappropriate remarks.
>
> Not from the user perspective.  HDFS is a pain to use for our naive
> users.  Again, from an operational perspective, 99% of our users want
> something that's easy to use.  HDFS is not exactly there, so users
> get turned off by it.  Users getting turned off means adoption is
> low.  Adoption being low was the whole point of the post, so I
> maintain that it was not an inappropriate remark at all.

Which is why I maintain don't use HDFS at all and your users won't 
complain.  Because they're applications, which mostly use 
MPI/C/NFS/Lustre/Whatever will still run fine, and the few times they 
want to write and use MR it will just work.

>> Not the intention here and thus, I continue to believe it should
>> not be used in HPC environments when most of them demand the
>> Lamborghini for 90% of executions.
>
> This is a bit silly.  That's like saying accelerators "should not be
> used in HPC" because 90% of people don't use them.  This is the
> difference between capability and capacity.  If you provide it, you
> enable new things that could never be done before for a small subset
> of users.

Enable the users to use MR, for certain!  I'm not advocating against 
that.  I am advocating against the mindset that HDFS+MR is a 
requirement.  It's not, and never has been.  I don't think any of your 
users will miss the absurd semantics and slow response times of HDFS, 
but they may very easily miss the benefits of quick analysis with MR 
(and all of the applications that sit atop that).

Again, my apologies for seeming abrupt or mean.  That was definitely not 
intended.

Best,

ellis

-- 
Ph.D. Candidate
Department of Computer Science and Engineering
The Pennsylvania State University
www.ellisv3.com