[Beowulf] Re: vectors vs. loops

Wed Apr 27 16:26:10 PDT 2005

Ben Mayer wrote:
>>However, most code doesn't vectorize too well (even, as you say, with
>>directives), so people would end up getting 25 MFLOPs out of 300 MFLOPs
>>possible -- faster than a desktop, sure, but using a multimillion dollar
>>machine to get a factor of MAYBE 10 in speedup compared to (at the time)
>>$5-10K machines.
> 
> 
> What the people who run these centers have told me that a
> supercomputer is worth the cost if you can get a speed up of 30x over
> serial. What do others think of this?

I thought a good new machine should be 4-10x the current speed of your 
old machine.  A "supercomputer" is a hard thing to define in general 
terms.  If you look at it like "at least 2 orders of magnitude faster 
than what you can do today" with not such a significant effort (e.g. not 
rewriting your 100k line code from scratch) ...

[...]

> So what we should really be trying to do is matching code to the
> machine.

"Portable code is not fast, fast code is not portable"

There is a a price for every decision.  How hard are you willing to work 
to make your code fast?  How much time (or money) are you willing to 
spend to do this?

[...]

> The manual for the X1 provides some information and examples. Are the
> Apple G{3,4,5} the only processors who have real vector units? I have
> not really looked at SSE(2), but remember that they were not full
> precision.

Altivec are just SIMD units, but with a saner instruction set design 
than SSEx.  Here is hoping that SSE4 will have a real maximum/minimum 
function. :(

Not sure what you mean by full precision.  SSE2 has a variety of 
formats, and the ISA design makes it hard to get data in and out of the 
SIMD registers.  Packing/unpacking are very expensive.

>>For me, I just revel in the Computer Age.  A decade ago, people were
>>predicting all sorts of problems breaking the GHz barrier.  Today CPUs
>>are routinely clocked at 3+ GHz, reaching for 4 and beyond.  A decade
> 
> 
> I just picked up a Semptron 3000+, 1.5GB RAM, 120GB HD, CD-ROM, video,
> 10/100 + intel 1000 Pro for $540 shipped. I was amazed.

Well we are going to run into some thermodynamical (structural 
stability) limits pretty soon.  At some feature size (haven't done the 
calculations, guessing in the 10-30nm region) the defect formation 
energies will become comparible to the thermal energy.  When this 
happens, the devices do a pretty good job of destroying themselves, 
usually with threading dislocations (this happened in the early days of 
blue LEDs).  The usual tricks to stabilize structures get harder at 
smaller sizes, and the electronic structure effects of surface 
deformations underneath the wires lead to some interesting electronic 
responses.  I have doubts that we will ever see 1 atom wires.  Then 
again, things like carbon nanotubes and other self assembling bits are 
quite intriguing.  And they are small and quite rigid.

Still, I am waiting for bosonic computation (photons).  Enough of these 
fermions (electrons/holes).  Massively parallel by design.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615