[Beowulf] Benchmark reality check please

Tue Dec 21 18:40:19 PST 2004

G'day all

I'm not looking to start another fight over benchmarking... really I'm not! ;) 

I've got my beastie to the point that I'd like to establish a performance *baseline* for the current rig. I've read a lot on the various benchmark suites, approaches etc and became horribly confused. 

Now that the mental mindstorm has settled somewhat I'd appreciate comments from the collective minds. NASA call this process a "focus review", everybody else calls it a "reality check" ;)

The goal is to come up with some numbers (and maybe some cute, simple graphs) that show how my 4 node, 8 CPU, cluster performs. I don't want to spend weeks setting up the benchmark suite if I can avoid it. The 96point flashing neon number will be the wall clock time on a sample Nbody run. That's why this Beowulf was built :) However, I'd also like some numbers that provide details on the components underlying that number. I'll give you some ideas of the granulation I'd like and possible comparison points.

Things like raw CPU grunt (a 500MHz P3 v's 3GHz P4), cache levels (P3 600MHz with 128 v's 500MHz 512MB L2), System RAM (100MHz v's 133MHz, ECC's v's not and 128 v's 256MB/node), FastEther v's GigaEther interconnect.

My proposed benchmark suite looks like this: 

That RGB fellow's BenchMaster suite would seem to give the CPU/RAM side of things a good workout. I'll give that a burl. Until recently LMBench seemed the go but BenchMaster seems to be a step up (more flexible)? (Someone *other* than RGB's opinion would be nice =P ) Maybe the Netpipe suit for all sorts of juicy network numbers?

I'm not looking to kick heads on compilers. Ye olde "g" compilers will be the start point. Maybe a review of the Intel flavours (if they're still free for non-profit/personal/educational type usage) later on. If I've got to pay for more than download bandwidth then it's out. I'm eating tomato sauce sandwiches after the hardware purchases as it is! :)

I was thinking about the various MPI options and how to put them through the wringer too. I suspect the wall clock time on the Nbody runs will be the easiest number.

What will I do with the results? As I mentioned, this is a baseline ie. What do the numbers look like now? ...then it's off to the sandpit... Tweaking, tweaking and more tweaking. Running the same benchmark suite as I go. Was the tweak a net gain? You, know... the sort of thing that **real** benchmarks are used for! =)

Thoughts, comments appreciated as always.

Cheers
Stevo

This message was sent through MyMail http://www.mymail.com.au