[Beowulf] Slection from processor choices; Requesting Giudence

Sat Jun 17 12:57:21 PDT 2006

Geoff Jacobs wrote:

> Well, each Opteron core would have to split it's local memory pool with
> it's sister, so pure bandwidth would be similar. The memory controller
> on the Opteron would give a latency bonus, but the registered DIMMs
> would incur a penalty. The Socket A motherboards are using an SIS
> chipset which might be a little more tuned.
> 
> If the application largely factored out the interconnect, I could accept
> the results being this close. But you're right. HT is so much better for
> inter-process communication, and GROMACS should derive a big advantage
> from it.

Hmmm... as with most codes, the details of the calculation, as well as 
the quality of the code base, the compiler used, etc factor into this as 
much if not more than the underlying interconnect at the small core 
count size systems.

If we take an overly simple calculation, it might scale one way, and yet 
when we do a different calculation, it will scale in a rather different 
manner as you are hitting different code paths by different amounts. 
This is why it is (extraordinarily) dangerous to use *standard* 
benchmarks (HPL, etc) as an indication of anything other than how much 
entropy you can generate (both in the physical waste heat view of 
entropy, and in the information theoretic destruction of bits view).

Without knowing the details of Doug's calc (yeah, I might look in short 
order, I am beating my head against a PPTP problem right now... ), it 
would be rather hard to assess why the calculation performs as it does.

-- 

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615