[Beowulf] recommendations for cluster upgrades

Tue May 19 20:26:47 PDT 2009

On Tue, May 19, 2009 at 01:23:57PM -0500, Rahul Nabar wrote:
> Subject: Re: [Beowulf] recommendations for cluster upgrades
> On Tue, May 19, 2009 at 1:10 PM, Nifty Tom Mitchell
> <niftyompi at niftyegg.com> wrote:
> > Other compiler vendors have their tools and tricks too....  By working
> > with multiple compilers and taking advantage of the strengths of each
> > compiler important portability and correctness code improvements are
> > possible.
> 
> Thanks Tom. Those are important leads for sure. I'll look into those
> flags and pathopt.
> 
> >compiler flag appears to result in a bad (different) answer.
> 
> That has always stumped me. Compilers can lead to slow or fast code
> but how can they compile to "wrong" code. To me this always seemed
> like a compiler design flaw.

Goofy stuff comes to mind:

	x*(y/1024+1024)
        ----------------
	    x

Is the answer (y/1024+1024)  ?
Sure unless x=0
Somtimes x is a complex function so this may not be obvious...
1024+1024 is a constant... woops MyDearAuntSally....

Lots of code depends on dividing by zero not throwing an error
or throwing it in the expected way at the expected time.

Then there are issues related to magnitude.
Consider the sum or a long list of numbers some very small some 
very large.   If you sort the list from small to large and
add you get a different answer then if you add from large to small.
Algebra tells us that these are equivalent but with floating point
numbers the result can be very different.

Since this is a Beowulf list consider the impact of slicing and dicing
an array of mixed small and large numbers, sorted and unsorted.  If the
list is sorted from large to small and added by one processor there is
one result.   If the same list is split between core A and core B the
second core will see a list that contains values of smaller magnitudes
than the first core and depending on the distribution of values and
numbers of cores interesting deltas in the output are possible.

Compilers do make simple to difficult symbolic transformation as optimizations
transparently.  Some constant expressions are also evaluated at compile time.

Another possible numeric issue is FPU rounding rules.   When Intel came up
with a FPU with 80 internal bits the act of moving floating point numbers
to or from a 64bit memory location might result in rounding differences.
This can happen in some cases with loop unrolling or inlining of functions
(.vs. function call).   Anytime a function needs more floating point registers
than the processor has the selection of which value stays in a register
and which lives on the call stack can prove interesting.

Play with these little scripts... we know that 0C is exactly 32F and
100C is exactly 212F.  The simple sixth grade form of the relationship
results in almost the exact answer.   Play with scale...

--- f2c -----
#!/bin/bash
# f2c convert Fahrenheit to centigrade.
bc << KELVIN
#scale=3
scale=20
(5 *(${1}-32 ))/ 9
KELVIN

--- c2f -----
#!/bin/bash
# c2f convert centigrade to Fahrenheit.
bc << KELVIN
#scale=3
scale=20
((9/5) * $1)+32
KELVIN

Later,
mitch

PS: The above are "GPL"... give credit if you keep them.

-- 
	T o m  M i t c h e l l 
	Found me a new hat, now what?