[Beowulf] Cell

Wed Apr 27 16:07:35 PDT 2005

At 06:46 PM 4/27/2005 -0400, Joe Landman wrote:
>
>
>Vincent Diepeveen wrote:
>> A raid5 array of 2 terabyte costs like $2000-$3000 and it can deliver
>> 400-600MB/s i/o hands down when attached to a single machine. So if you
>> make the 1 tflop processor, there is no need to worry!
>
>I need to find out where you are getting your raids...

I'm no major expert here, but i plan to buy a cheap 400 euro
raid-5 from 3ware. The shop has no other raid cards than that, so perhaps
there is better brands out there, but even real cheapo S-ATA it gets
400MB/s readspeed at a slow P4 2.4Ghz says their homepage.

If you look further perhaps you'll find a tad faster.

Not sure for you, for me only readspeed matters. Writing speed less
important. Use Raid 0 for that and now and then backup the data :)

So my viewpoint still hasn't changed. 

Get a cheap single cpu machine. Get a cheapo 400 euro S-ATA controller. Get
a bunch of cheak S-ATA disks. 

Get 400MB/s+ a second i/o speed from it and all what you need is a few
memory banks with expensive memory and the virtual 1 Thz processor.

Oh and all i want it to output is 42 anyway :)

>[...]
>
>> Anything that has to do with huge calculations is in the first place cpu
>> power limited. Not anything else.
>
>There is a statement I like to make when I see comments like this.
>
>"Gross generalizations tend to be incorrect".
>
>If you think about it long enough, you can see the recursive humor.
>
>There are many different factors that will affect the overall 
>performance of a machine on a particular code/data set.  To illustrate 
>this, I often suggested the following gedankenexperiment.
>
>Imagine you have a CPU that is infinitely fast, coupled to resources 
>that are not infinitely fast.  This means that while operations take 
>exactly 0 time on the CPU, we haven't done a thing to make the memory or 
>IO faster.  In this gedankenexperiment, how much of a speedup do you get 
>from an infinitely fast CPU?  Memory moves still take time.  Data 
>loading and storing still takes time.  Data motion is quickly becoming 
>one of the (if not the) most critical aspect of performance for a fair 
>number of calculations.  So unless all parts are infinitely fast, you 
>still have to pay for the data motion time, the IO time, the memory-> 
>memory time, the memory->CPU time (and CPU to memory time).
>
>In short, an infinitely fast CPU would reduce the execution time of 
>(possibly significantly) a  class of applications that are only CPU 
>bound (say operating out of internal cache only).
>
>It will do very little for a code which is IO or memory bandwidth or 
>latency bound.
>
>> Big RAM is nice to have for most clever algorithms, but it is second most
>> important. CPU power is most important. If there is some bottleneck that
>> limits the RAM we have, do not worry!
>> 
>> We will find a solution!
>> 
>> The real bottleneck is in the end the number of instructions a cpu can
>> process a second.
>
>Not really.  The bottleneck in performance is how full you can keep the 
>multiple pipelines of the processor.  Branch statements tend to force 
>pipeline flushes.  You can "handle" this with speculative execution. 
>Real memory accesses can bottleneck the memory subsystem, so real 
>processors allow specific mixtures of instructions in flight at once to 
>reduce resource contention.  If you overflow any of the fixed CPU 
>resources, you can stall a pipeline while waiting for the contention to 
>be eliminated, or you can stall the entire CPU while flushing TLB and 
>other shared resources.  Basically you have multiple simultaneous zero 
>sum games (fixed number of operations per unit time, specific mixtures 
>of operations that maximize the performance of instructions in flight). 
>  Compilers are, as I indicated before, not particularly smart in most 
>cases, and they generate code locally that might not make sense 
>globally.  Moreover, how instructions are ordered and presented to the 
>CPU will fundamentally impact the overall performance.  Code optimizers 
>are, in a large sense, an attempt to better fit the emitted instructions 
>to the processor architecture, by rewriting loops, mathematical 
>constructs, and related.  Optimizers are not perfect.
>
>Some architectures are pretty much impossible to write optimal code for 
>(turns out to be NP-hard), and you have to accept a set of compromises 
>at some point to avoid having your compilation take 24 hours (my MD 
>codes used to take about 24 hours to build on a Trace Multiflow, VLIW 
>architecture).
>
>The overall point of this is
>
>a) writing good code is hard
>b) writing fast code is harder
>c) CPUs don't automagically make things faster, compilers are implicated 
>in this mess
>d) some optimizers are better left off :(
>
>-- 
>Joseph Landman, Ph.D
>Founder and CEO
>Scalable Informatics LLC,
>email: landman at scalableinformatics.com
>web  : http://www.scalableinformatics.com
>phone: +1 734 786 8423
>fax  : +1 734 786 8452
>cell : +1 734 612 4615
>
>
>