[Beowulf] Not all cores are created equal

Tue Dec 30 20:41:22 PST 2008

On Mon, Dec 29, 2008 at 4:27 PM, Chris Samuel <csamuel at vpac.org> wrote:
>
> ----- "Nifty Tom Mitchell" <niftyompi at niftyegg.com> wrote:
>
>> On Wed, Dec 24, 2008 at 09:03:38PM +1100, Chris Samuel wrote:
>>
>> > I contemplated doing this on our Barcelona cluster, but
>> > sacrificing 1 core in 8 was a bit too much of a high price
>> > to pay.  But people with higher core counts per node might
>> > find it attractive.
>>
>> This seems like a be a benchmark decision based on application
>> load and 'implied IO+OS' loading as well as the ability to
>> localize the IO+OS activity to the sacrificed CPU core.
>
> I'll leave that to sites that have a benchmarkable and
> characterisable workload.  :-)  We've got over 600 random
> users running random code (some very random indeed [1])
> that covers all categories from self-written, through open
> source to commercial apps.
>
> cheers,
> Chris
>
> [1] - including a commercial code that segfaults in one
> particular program in libmsxml.so - yes, that appears to
> be a 3rd party implementation of the M$ XML library on Linux.
> When reported they claimed it was because we were running
> CentOS5 not RHEL4.  Can't reproduce on RHEL4 because it
> crashes *before* that point on that distro.  Gah.
>
> --
> Christopher Samuel - (03) 9925 4751 - Systems Manager

Benchmarking with a long list of random applications is problematic.
One additional hard to benchmark aspect of a big cluster is "between job"
legacy I/O.   After a process exits it is possible for pending data I/O to slow
the startup of the next process.

It may be simpler to sample the system state watching for waitio and
any other activity measure you can track with a light hand.  Statistical
analysis and charting tools are available....  sample oriented benchmarks
on cluster workloads are not common but I suspect they can tell us a lot.

-- 
        NiftyOMPI
        T o m   M i t c h e l l