[Beowulf] Cell
Vincent Diepeveen
diep at xs4all.nl
Wed Apr 27 16:07:35 PDT 2005
At 06:46 PM 4/27/2005 -0400, Joe Landman wrote:
>
>
>Vincent Diepeveen wrote:
>> A raid5 array of 2 terabyte costs like $2000-$3000 and it can deliver
>> 400-600MB/s i/o hands down when attached to a single machine. So if you
>> make the 1 tflop processor, there is no need to worry!
>
>I need to find out where you are getting your raids...
I'm no major expert here, but i plan to buy a cheap 400 euro
raid-5 from 3ware. The shop has no other raid cards than that, so perhaps
there is better brands out there, but even real cheapo S-ATA it gets
400MB/s readspeed at a slow P4 2.4Ghz says their homepage.
If you look further perhaps you'll find a tad faster.
Not sure for you, for me only readspeed matters. Writing speed less
important. Use Raid 0 for that and now and then backup the data :)
So my viewpoint still hasn't changed.
Get a cheap single cpu machine. Get a cheapo 400 euro S-ATA controller. Get
a bunch of cheak S-ATA disks.
Get 400MB/s+ a second i/o speed from it and all what you need is a few
memory banks with expensive memory and the virtual 1 Thz processor.
Oh and all i want it to output is 42 anyway :)
>[...]
>
>> Anything that has to do with huge calculations is in the first place cpu
>> power limited. Not anything else.
>
>There is a statement I like to make when I see comments like this.
>
>"Gross generalizations tend to be incorrect".
>
>If you think about it long enough, you can see the recursive humor.
>
>There are many different factors that will affect the overall
>performance of a machine on a particular code/data set. To illustrate
>this, I often suggested the following gedankenexperiment.
>
>Imagine you have a CPU that is infinitely fast, coupled to resources
>that are not infinitely fast. This means that while operations take
>exactly 0 time on the CPU, we haven't done a thing to make the memory or
>IO faster. In this gedankenexperiment, how much of a speedup do you get
>from an infinitely fast CPU? Memory moves still take time. Data
>loading and storing still takes time. Data motion is quickly becoming
>one of the (if not the) most critical aspect of performance for a fair
>number of calculations. So unless all parts are infinitely fast, you
>still have to pay for the data motion time, the IO time, the memory->
>memory time, the memory->CPU time (and CPU to memory time).
>
>In short, an infinitely fast CPU would reduce the execution time of
>(possibly significantly) a class of applications that are only CPU
>bound (say operating out of internal cache only).
>
>It will do very little for a code which is IO or memory bandwidth or
>latency bound.
>
>> Big RAM is nice to have for most clever algorithms, but it is second most
>> important. CPU power is most important. If there is some bottleneck that
>> limits the RAM we have, do not worry!
>>
>> We will find a solution!
>>
>> The real bottleneck is in the end the number of instructions a cpu can
>> process a second.
>
>Not really. The bottleneck in performance is how full you can keep the
>multiple pipelines of the processor. Branch statements tend to force
>pipeline flushes. You can "handle" this with speculative execution.
>Real memory accesses can bottleneck the memory subsystem, so real
>processors allow specific mixtures of instructions in flight at once to
>reduce resource contention. If you overflow any of the fixed CPU
>resources, you can stall a pipeline while waiting for the contention to
>be eliminated, or you can stall the entire CPU while flushing TLB and
>other shared resources. Basically you have multiple simultaneous zero
>sum games (fixed number of operations per unit time, specific mixtures
>of operations that maximize the performance of instructions in flight).
> Compilers are, as I indicated before, not particularly smart in most
>cases, and they generate code locally that might not make sense
>globally. Moreover, how instructions are ordered and presented to the
>CPU will fundamentally impact the overall performance. Code optimizers
>are, in a large sense, an attempt to better fit the emitted instructions
>to the processor architecture, by rewriting loops, mathematical
>constructs, and related. Optimizers are not perfect.
>
>Some architectures are pretty much impossible to write optimal code for
>(turns out to be NP-hard), and you have to accept a set of compromises
>at some point to avoid having your compilation take 24 hours (my MD
>codes used to take about 24 hours to build on a Trace Multiflow, VLIW
>architecture).
>
>The overall point of this is
>
>a) writing good code is hard
>b) writing fast code is harder
>c) CPUs don't automagically make things faster, compilers are implicated
>in this mess
>d) some optimizers are better left off :(
>
>--
>Joseph Landman, Ph.D
>Founder and CEO
>Scalable Informatics LLC,
>email: landman at scalableinformatics.com
>web : http://www.scalableinformatics.com
>phone: +1 734 786 8423
>fax : +1 734 786 8452
>cell : +1 734 612 4615
>
>
>
More information about the Beowulf
mailing list