[Beowulf] Re: Why Do Clusters Suck?
Andrew D. Fant
fant at pobox.com
Tue Mar 22 15:04:14 PST 2005
Joe Landman wrote:
>
>
> Craig Tierney wrote:
>
>> Our biggest problem is the immaturity of development
>> tools. Another way to put that is "my compiler doesn't reproduce
>> the bugs in the other compilers my users are accustom to using"
>> or "Fortran isn't a standard, it is a suggestion". It is a rare creature
>> that writes clean, portable code. It is all too common to hear
>> developers tell me things like "does it work if you turn off bounds
>> checking?". I spend way too much time with new users trying to explain
>> to them the difference between 'code porting' and 'bug fixing'.
>
>
> <commiserate />
>
> me: "how do you know it works"
> them:"it compiles with no errors"
> me: "no... how do you know it works, functions correctly?"
> them:(puzzled look) "it compiles with no errors ..."
>
Amen to that, Joe.
My personal complaint is that there aren't enough good standard
test/validation suites out there for cluster building. Some libraries
like Atlas include them, but they are also tied to that specific
package. It would be really great if as a community we could do
something like the Linux test project oriented towards cluster-building
and scientific computing. Something that I can run when my boss wants
"proof" that upgrading a library didn't completely rejigger the
numerical stability of the results. I know that the stock answer here
is that we ought to generate our own regression tests based on our on
particular application set, but I think it would be a boon for a more
generic framework and solution to evolve. If nothing else, it would
offer a basis for heterogeneous systems in a grid environment to trust
each other's results without necessarily requiring full application
cross-validation. It might be a pipe dream, but I like it 8-)
Andy
More information about the Beowulf
mailing list