[Beowulf] X5500

Thu Apr 2 18:13:05 PDT 2009

On Apr 1, 2009, at 11:08 AM, Jan Heichler wrote:

> Hallo Mikhail,
>
>
>
> Dienstag, 31. März 2009, meintest Du:
>
>
>
> MK> In message from Kilian CAVALOTTI <kilian.cavalotti.work at gmail.com>
>
> MK> (Tue, 31 Mar 2009 10:27:55 +0200):
>
> >> ...
>
> >>Any other numbers, people?
>
>
>
> MK> I beleive there is also a bit other important numbers - prices for
>
> MK> Xeon 55XX and system boards
>
>
>
> Don't forget DDR3-ECC memory which must be registered.
>
>
>
> To my knowledge Intel is the only plattform using that now. It is  
> more expensive than DDR2-800 (for Opterons)... hopefully that will  
> change (quickly).
>
>
>
I wouldn't bet at registered-ecc DDR3 ram to become cheaper.
To be honest i misjudged that for DDR reg-ecc ram also,
it still is relative spoken expensive.

DDR2 ecc ram on other hand is so dirt cheap,
that you really can make nodes with a lot of ram really cheap.

Both amd as well as intel have boards where you can put in easily  
64GB ram,
some even you could go to 128GB ram.

I don't see that very cheap for DDR3 ecc (and whenneeded registered)  
ram yet.

Consider the productoin side. How is such a factory gonna sell all  
that RAM?
So in reality they hardly start the production line to produce it  
right now.

But.. ..don't you want ECC ram inside cluster nodes, in my latency  
measurements for
todays RAM it hardly matters for latency nor bandwidth to use ecc.  
Just 1 nanosecond at most,
at a total of say 160 ns or so.

Latency of DDR3 is not really good cmopared to DDR2. Of course this  
depends upon how you use
the RAM.

That i7 can deliver the weird amount of 192 bytes at a time or so,  
versus dual channel ddr2 can deliver 64 at a time.

Now that's a huge bandwidth difference of course, some i7's even get  
to 18GB/s bandwidth, versus 10GB/s for the DDR2
equivalent; yet that means that the latency of the DDR2 quadcores is  
simply better. More bytes at a time comes at a big latency
price simply.

If you're just having software that streams and hardly has something  
to do for the cpu's to calculate while you stream,
then maybe consider rewriting the algorithm to something more complex  
that needs to stream less and can do more calculations,
as doing massive calculations at just a few gigabytes of RAM is what  
gpu's are genius at and will scale perfectly for, maybe
even outperforming the law of more there for a short period of time  
in number of instructions a cycle you can push through in total
at a single gpu node.

Vincent

> Jan
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf