[Beowulf] how Google warps your brain

Thu Oct 21 15:55:12 PDT 2010

> -----Original Message-----
> From: Mark Hahn [mailto:hahn at mcmaster.ca]
> Sent: Thursday, October 21, 2010 3:01 PM
> To: Lux, Jim (337C)
> Cc: Beowulf Mailing List
> Subject: RE: [Beowulf] how Google warps your brain
> 
> >> I'm pretty convinced that, ignoring granularity or political issues, shared
> >> resources save a lot in leadership, infrastructure, space, etc.
> >
> > OTOH, it's just those granularity and cost accounting issues that led to
> > Beowulfs being built in the first place.
> 
> I'm not really sure I understand what you mean.  by "granularity", I just
> meant that you can't really have fractional sysadmins, and a rack with 1 node
> consumes as much floor space as a full rack.  in some sense, smaller clusters
> have their costs "rounded down" - there's a size beneath which you tend to
> avoid paying for power, cooling, etc.  perhaps that's what you meant by cost-
> accounting.

That's exactly what I meant.  In any organization, there's a certain level of detail below which they don't generally require reporting.  Likewise, there's a certain threshold value for the signature chain.   For institutionally provided services on a chargeback basis (e.g. phone calls, cpu seconds on the mainframe, etc.) the expectation is that costs are tracked to the penny (or, as the federal rules have it, you have to make sure that they are allocable, accountable, and allowable).  For things bought in chunks, the resolution requirement is typically at the "purchase" level (e.g. nobody makes me allocate a $1000 computer to 15 different cost accounts, but I would have to account for disk space on the institutional server at that level).

(because it's a "diminishing returns" issue.. it's cheap to define which cost accounts pay for which disk directories, it's not cheap to split Purchase Orders between accounts)

> 
> but do you think these were really important at the beginning?  to me,
> beowulf is "attack of the killer micro" applied to parallelism.  that is,
> mass-market computers that killed the traditional glass-house boxes:
> vector supers, minis, eventually anything non-x86.  the difference was
> fundamental (much cheaper cycles), rather than these secondary issues.

I think this was.. If you needed horsepower, you could either go fight the budget battle to buy cpu seconds on the big iron OR you could buy your own supercomputer and not have to worry about the significant administrative time setting up and reconciling and reporting on those cpu seconds across all your projects. And, because the glass house box is very visible and high value, there is a lot of oversight to "make sure that we are effectively using the asset" and "that the operations cost is fairly allocated among the users".

Particularly in places where there is strict cost accounting on things and not so strict on labor (e.g. your salary is paid for already by some generic bucket) this could be a big driver: you could spend your own time essentially for free.

> 
> > I suspect (nay, I know, but just can't cite the references) that this sort
> >of issue is not unique to HPC, or even computing and IT.  Consider
> >libraries, which allow better utilization of books, at the cost of someone
> >else deciding which books to have in stock.
> 
> well, HPC is unique in scale of bursting.  even if you go on a book binge,
> there's no way you can consume orders of magnitude  more books as I can,
> or compared to your trailing-year average.  but that's the big win for HPC
> centers - if everyone had a constant demand, a center would deliver only
> small advantages, not even much better than a colo site.

Yes.. that's why the library/book model isn't as good as it could be. 

> 
> > And consider the qualitatively
> >different experience of "browsing in the stacks" vs "entering the call
> >number in the book retrieval system".. the former leads to serendipity as
> >you turn down the wrong aisle or find a mis-shelved volume; the latter is
> >faster and lower cost as far as a "information retrieval" function.
> 
> heh, OK.  I think that's a bit of a stretch, since your serendipity would
> not scale with the size of the library, but mainly with its messiness ;)
> 
> >get paid for. And this is because they've bought a certain amount of
> >computational resources for me, and leave it up to me to use or not, as I
> >see fit.
> 
> I find myself using my desktop more and more as a terminal - I hardly
> ever run anything but xterm and google chrome.  as such, I don't mind
> that it's a terrible old recycled xeon from a 2003 project.  it would seem
> like a waste of money to buy something modern, (and for me to work locally)
> since there are basically infinite resources 1ms away as the packet flies...

And as long as there's not a direct cost to you (or your budget) of incremental use of those remote resources, then what you say is entirely true.  But if you were paying for traffic, you'd think differently.

When I was in Rome a year ago for a couple weeks, I had one of those USB data modems.  You pay by the kilobyte, so you do all your work offline, fire up the modem, transfer your stuff, and shut it down. Shades of dial-up and paying by the minute of connect time. All of a sudden you get real interested in how much bandwidth all that javascript and cool formatting stuff flying back and forth to make the pretty website for email is.  And the convenient "let's send packets to keep the VPN tunnel alive" feature is really unpleasant, because you can literally watch the money meter add up while you're sitting there thinking.

With a cluster of my own, the entire cost is essentially fixed, whether I use it or not, so I can "fool around" and not worry about whether I'm being efficient.  Which gets back to CS classes in the 70s, where you had a limited number of runs/seconds for the quarter, so great emphasis is put on "desk checking" as opposed to interactive development... I'm not sure that one didn't have higher quality code back then, but the overall productivity was lower, and I'd hate to give up Matlab (and I loved APL on an IBM5100).  But then, I'm in what is essentially a perpetual prototyping environment. That is, a good part of the computational work I need is iterations of the algorithm implementation, more than the outputs of that algorithm. 

If I were like my wife who does banking IT or the folks doing science data processing from satellites, in a production environment, I'd probably say the big data center is a MUCH better way to go.  They need the efficiency, they have large enough volume to justify the fine grained accounting (because 1% of 100 million dollars is a lot bigger in absolute terms than 10% of $100k, so you can afford to put a full time person on it for the big job)

So, my wife needs the HPC data center and a staff of minions.   I want the personal supercomputer which makes my life incrementally easier, but without having to spend time dealing with accounting for tens of dollars.