[Beowulf] Vector coprocessors
Joe Landman
landman at scalableinformatics.com
Thu Mar 16 07:11:26 PST 2006
Jim Lux wrote:
> At 12:04 AM 3/16/2006, Daniel Pfenniger wrote:
>
>> The shipment of this accelerator card has been delayed many times.
>> Last time
>> I asked was October 2005. Apparently the first shipment has been
>> made this
>> month for a Japanese supercomputer with 10^4 Opterons. The cost is not
>> indicated, but something like above $8000.- per card would put it outside
>> commodity hardware. I wouldn't be astonished that more performance can
>> be obtained in most applications with commodity clustering.
I think under 10k$ keeps it commodity (read as what most managers could
likely sign for themselves without needing to walk the approval ladder).
> There are probably applications where a dedicated card can blow the
> doors off a collection of PCs. At some point, the interprocessor
> communication latency inherent in any sort of cabling between processors
> would start to dominate.
There are numerous such examples in life sciences, in chemistry, and
other areas. Such cards are not universal, they cannot be viewed as
general purpose processors. You have to view them as dedicated attached
processors.
The Clearspeed cards have 2 of their co-processors. Each has 96 FP
units. I believe the architecture is a systolic array. To program them
at a high level, you have a C variant that you can use, or you can hand
code assembly. The latter is hard.
The issue for these cards are the memory bandwidth in and out of the
PCI-x based interface. There are tricks you can play for a well
designed system, but you cannot escape the bandwidth ceiling of PCI-x.
For many algorithms of potential interest to this list, memory bandwidth
is as important as FP performance. Having effectively 100 processors on
the far side of a narrow pipe means you have to design algorithms with
that pipe width in mind.
>> If Clearspeed would consider mass production with a cost like
>> $100.-$500.-
>> per card the market would be huge, because the card would be competing
>> with
>> multi-core processors like the IBM-Sony Cell.
Kahan had some interesting things to say about the Cell. Summarized
like this. You get to choose one with Cell: Fast or Accurate. He was
making this point in general but pointed out some issues. This is from
a talk on his web site. Caveat: I don't have a cell to play with (yes
Santa, I would like 1 or 2 hundred), so I can't run paranoia or other
fun tests.
> You need "really big" volumes to get there. Retail pricing of $200
> implies a bill of materials cost down in the sub $20 range.
Yup. Volume drives lower pricing. Economies of scale matter. This is
why FPGAs are where they are price wise. They don't have large volumes.
If they did, pricing should be better.
> Considering
> that a run of the mill ASIC spin costs >$1M (for a small number of parts
> produced), your volume has to be several hundred thousand (or a million)
> before you even cover the cost of your development.
>
> The video card folks can do this because
> a) each successive generation of cards is derived from the past, so the
> NRE is lower.. most of the card (and IC) is the same
I believe they are in incremental improvement mode. This keeps redesign
costs way down.
> b) they have truly gargantuan volumes
This is the critical thing. Remember, these are highly pipelined
graphical supercomputers. The ClawHMMer project ran a hardware
accelerated HMMer on an nVidia GT6800 5x faster than the P4 hosting the
card.
> c) they have sales from existing products to provide cash to support the
> development of version N+1.
Cash is king.
> {I leave aside the possibility of magic elves, although with some
> consumer products, I have no idea how they can design, produce, and sell
> it at the price they do. Making use of relative currency values can
> also help, but that's in the non-technological magic elf category, as
> far as I'm concerned.}
Actually lots of stuff is done outside the US these days. Not magic
elves per se, but Indian and Chinese engineers and scientists who are
extremely good at what they do. This starts getting into a cost and
productivity discussion rather rapidly.
>> The possibly most interesting niche for the Clearspeed cards appears
>> to me
>> accelerating proprietary applications like Matlab, Mathematica and
>> particularly
>> Excel that run on a single PC and that can hardly be reprogrammed by
>> their
>> users to run on a distributed cluster.
>
>
>
> I would say that there is more potential for a clever soul to reprogram
> the guts of Matlab, etc., to transparently share the work across
> multiple machines. I think that's in the back of the mind of MS, as
> they move toward a services environment and .NET
:)
So imagine if you will an LD_PRELOAD environment variable which
points a users code over to the relevant libraries which work their
magic behind the scenes. I would be hard pressed to imagine using this
for Excel, but could see it for Matlab. Programming at high levels with
high performance. Of course Kahan also rips into them over accuracy ...
>
> Jim
>
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf
mailing list