[Beowulf] El Reg: AMD reveals potent parallel processing breakthrough

Vincent Diepeveen diep at xs4all.nl
Sun May 12 03:59:16 PDT 2013

On May 11, 2013, at 6:39 PM, Lux, Jim (337C) wrote:

> On 5/11/13 1:56 AM, "Vincent Diepeveen" <diep at xs4all.nl> wrote:
>> On May 10, 2013, at 6:04 AM, Lux, Jim (337C) wrote:
>>> On 5/8/13 6:41 PM, "Prentice Bisbal" <prentice.bisbal at rutgers.edu>
>>> wrote:
>>>> On 05/08/2013 09:41 AM, Lux, Jim (337C) wrote:
>>>>> The game console business is a strange one, and I don't know that
>>>>> it has
>>>>> much to bring to the HPC world (whoa, that will provoke some
>>>>> comment).
>>>> Roadrunner's body isn't even cold yet, and everyone's already
>>>> forgotten
>>>> about it. :(
>>>> http://en.wikipedia.org/wiki/IBM_Roadrunner
>>>> http://en.wikipedia.org/wiki/Cell_microprocessor
>>> I think roadrunner is an example of a one-off stunt..
>>> In the long run, "easy programming" is probably a bigger cost  
>>> driver.
>> The top500.org of today completely refutes your statement there.
> Top 500 is just that "top 500"..
> What fraction of total computational work done on clusters is being  
> done
> by the top 10?   I suspect that when you get down that list a ways,  
> you
> start seeing fairly pedestrian clusters with fairly conventional  
> nodes,
> and that there are a LOT of them and they do a lot of work, day in  
> and day
> out.

In reality industry is already crunching on GPU's for a long time.
In industry they are more realistic than the professors on this list.
They realize that to stay ahead of the pack you have to calculate  
more than
the others. To do that more efficiently is always with the lowest  
level code you can

Each few years new hardware means simply the hardware is more  
expensive than
a good programmer. That was the case 30 years ago and that is the  
case today and that
will be the case 30 years from now, simply because doing effectively  
more work at the same
hardware means that you can see further in the future than your  

That's how majority is doing it in industry for quite some time now.

A bunch of them also has far more realistic energy price than the  

Governments have the habit to put the supercomputers nearby where  
their scientists are.

Industry is more clever there. They ask the energy companies: "where  
can you deliver me the
cheapest power?". That is, the companies that already do not have  
coals themselves...

Right now in East Coast of USA a ton of coals is, last time that i  
checked, around $85 each metric ton.

That delivers roughly 10 megawatthour.

$85 / 10000 = 0.85 cents a kilowatt hour.

Add in some losses and we're still at a price no government  
supercomputer, including secret ones, gets it for.

If energy company builds a new energy central, they need to sell  
power and they are desperate to sell it.
You can get it dirt cheap then.

Seems that's the time google build a datacenter in North part of  
Netherlands (Groningen area), they were
busy building a bunch of new coals, later some converted to gas  
centrals - as they saw an opportunity to
export power to Denmark with its windmills and also Germany had some  
communists in different areas of
local government pushing windmills and they already had written on a  
paper that they somehow would close
a bunch of nuclear centrals.

The energy companies had done that math 10 years ago. Fukushima of  
course was a horrible accident which
just helps them out more.

There is always a limited budget for computational hardware. Building  
a cluster size over 3000 nodes
is not what most like to do in industry.

Yet most of them had already a few gpu's inside those thousands of  
nodes, long before any box in the top500 had.
Obviously you can calculate yourself then who has the fastest generic  
computational power. That's industry/finance obviously
and it always was. Now finance is in a slow conversion from some  
companies doing their calculations in a stupid manner;
some of those who did do it that way basically have been forced by  
the market to do it in more efficient manners now.

Yet the ones making most profit are also the ones who invested most  
into low level codes.

If you can speed up code more than your competition, you can simply  
calculate further into the future.

That's usually the opposite idea of how bunches of governments run  
their supercomputers.

If in industry you outsearch your opponent by 1 second at a total  
timespan of 1 year, you already win.

People don't care whether your car is 100 miles an hour faster or  
0.0001 miles an hour faster. Faster is faster.

In fact most research centers simply prefer a small cluster  
themselves instead of the total burocracy that centralized  
supercomputers give.

Industry has to compete in contradiction to government researchers.

What we'll get more and more now of course, because of the speedup  
and increase in number of datacenters in financial industry,
and fact that basically all trade has moved to derivatives, we'll see  
coming years massive increase in low level codes there as well.

If you're not busy with low level codes at a massive scale there, you  
risk now to be at a disadvantage that's impossible to ever make up for.

Right now what matters is every microsecond there. So you really want  
graphs microsecond based and do calculations for every microsecond  
and this for a period of years preferably.

Some were already doing that, as we can see as a result of  
investigations into the flashcrash when at least 1 company had to  
open up what they were doing
and how.

It's unlikely we'll ever know when who figured out what there.

> Achieving a record is a stunt, by its very nature.  It might be a  
> proof of
> concept. It might be a national competition.
> And for such a thing you can recruit the very few people who know  
> how to
> effectively use it.
> But ultimately, I'll bet there are a LOT more people doing HPC by  
> using
> libraries and such that hide a lot of the cluster specific aspects:  
> from
> their standpoint, it just inverts that 10,000x10,000 matrix faster. Or
> whatever. People run Matlab on clusters. Not "parallel Matlab" but
> separate instances of Matlab on each node. Sure, it's not the most
> efficient way to do things, but if it gets the job done, and the  
> work in
> writing the code is short enough, then maybe that's an OK trade.
> Not every application of HPC needs to have the ultimate speed, nor the
> ultimate efficiency.  It just has to be "good enough".
> Bringing up an interesting question.  How many clusters are there  
> in the
> world? Clearly thousands.  Tens or hundreds of thousands?
> What's the size distribution?
> 10-15 years ago, people were proud of their 16 or 32 node clusters.
> Today, we talk about toy clusters in that size range. Limulus is,  
> what, 3
> boards with 4 core processors?
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list