Scyld and Red Hat 7
Martin Siegert
siegert at sfu.ca
Thu Feb 1 14:01:03 PST 2001
On Thu, Feb 01, 2001 at 07:35:09AM -0500, Robert G. Brown wrote:
> On Thu, 1 Feb 2001, stig wrote:
>
> > As long as the system includes the main libs, a kernel and the popular
> > package managers (well RPM) does it really matter what distribution it is
> > based on?
With respect to applications it matters on which version of glibc the distro
is based on.
> > Would there be this discussion if they 'based' it on their own compilation
> > of binaries instead of those of RedHats.
>
> The reasons to periodically upgrade an operating system distribution
> (theirs or anybody else's), and not just the kernel, are many and valid.
> By the numbers:
>
> a) Improved compilers and support libraries. This is probably the
> number one reason to upgrade a whole distribution rather than just the
> kernel. Sure, you can just upgrade compilers alone, and kernels alone,
> and libraries alone, but at some point (especially for major e.g. libc
> revisions) you find that you have to rebuild everything anyway and the
> whole point of distributions and kickstart and yellow dog's "yup" tool
> is to make it easy to get from tested configuration to tested
> configuration. I've done systems management piecemeal and it is no fun
> at all.
This is also the #1 reason for me not to upgrade: if a new distribution
comes with a glibc that is not downward compatible with the commercial
compilers and scientific libraries that I purchased, I simply cannot
use it without spending lots of $$. There must be very good reasons
for that. Right now I doubt that, e.g., Portland compilers aren't even
available for glibc-2.2; no NAG library either.
> This is currently a highly nontrivial reason in my mind. I'm in the
> middle of fixing an extremely serious bug in the cpu-rate tool I've been
> using to measure floating point performance on nodes and have uncovered
> a rat's nest of wierdness somewhere in the gcc/linux interaction on 6.2
> systems. As in I can run the same benchmark code with the same
> parameters and get two completely different timings, depending literally
> one whether I set a parameter by a fallthrough default or "override" the
> parameter to the exact same value on the command line. Or change the
> order of initialization statements. Different by a factor of two -- not
> a small difference. This SEEMS to be fixed in RH 7.0 although I'm still
> testing.
I admire you - benchmarking is an art by itself. Just look at the stream
benchmark: The comments in the code (stream_d.f) tell you that you can
either use static or f90-type allocatable arrays. They don't tell you
that the results will be dramatically different (you see the same difference
with with stream_d.c when you malloc the array). So which way should you
do it? Probably the slow way if you want a meaningful result for your
application - I at least malloc almost everything at
run time. However, stream results are never quoted that way.
> b) Improved kernel. For example, NFS is basically and maddeningly
> broken in pre-2.18 kernels (but MAY be fixed in 2.18) -- I've actually
> survived a server crash without having to reboot all my NFS clients
> since upgrading my (non-scyld) cluster. Yes, one can rebuild the kernel
> by hand, but some of the scyld advantages (and other useful beowulf
> stuff) interface directly with the kernel. These days one sometimes has
> to upgrade the base compiler to upgrade the kernel. This is less
> important to a scyld beowulf than to a more general purpose cluster
> node, but scyld cannot remain stagnant at a given kernel revision
> forever.
That's one of the reasons why I want to go to the 2.4 kernel: NFS-v3
And as long as I can do it without going to glibc-2.2 I'll probably
upgrade. Now it doesn't look as if RH will be releasing a 2.4 kernel
rpm for 6.2 (although I can't see a reason why they couldn't).
[side remark: is there LFS (large file support > 2GB) in the 2.4 kernel?]
With respect to Scyld (and RH and whoever) this means: I would welcome
upgrades as long as the distribution remains downward compatible.
The showstopper is glibc here and not the kernel.
Sure there are limits to that, but the reasons for giving up downward
compatibility must be very good: so good that the $$ reasons given above
don't count anymore.
> c) Improved everything else. This isn't too important to scyld but
> again, even e.g. MPI marches along. Bugs are fixed, optimizations are
> tuned. Scyld may not have to remain sync'd to RH's development cycle,
> but it has to re-release its OWN distribution package periodically to
> keep everything up to date and/or users will have to periodically
> upgrade node or server packages piecemeal.
>
> RH 7 has definitely got some problems, but 7.1beta comes out what,
> today? and reportedly fixes a lot of those problems (as do the many
> updates already released). Since RH 7 has an incompatible RPM relative
> to 6.2, the 6.2->7 upgrade requires a pretty serious commitment and lots
> of folks are holding off until its problems diminish.
>
> I therefore don't think that the issue is whether scyld should rebuild
> on the 7.x distribution -- it is rather a question of when. This is
> thus a reasonable question to ask, although there is (as noted) less
> pressure for them to do it immediately. There is also the question of
> how difficult it is to do the rebuild -- if the distribution is RPM
> packaged, rebuilding really shouldn't take long at all; it is the
> testing and stabilizing that takes the time.
... and when they decide to rebuild based on 7.x they hopefully consider
keeping a branch based on glibc-2.1.
Martin
========================================================================
Martin Siegert
Academic Computing Services phone: (604) 291-4691
Simon Fraser University fax: (604) 291-4242
Burnaby, British Columbia email: siegert at sfu.ca
Canada V5A 1S6
========================================================================
More information about the Beowulf
mailing list