[Beowulf] cernlib

Robert G. Brown rgb at phy.duke.edu
Fri Oct 14 06:22:57 PDT 2005

On Fri, 14 Oct 2005, Leif Nixon wrote:

> "Robert G. Brown" <rgb at phy.duke.edu> writes:
>>    a) Is cernlib prebuilt in rpm's in SL?
> Which version? Jakob Nielsen used to repackage ATLAS-related stuff
> into RPMs. His old src.rpm for cernlib 2002 is here:
>  ftp://ftp.nordugrid.org/applications/hep/cernlib/2002/src/

(Hoping that the following isn't too off-topic, given that cernlib is
very much a cluster tool...)

Thanks, that's very useful.

Well, (with this) I now have source rpms for both 2002 and 2005.  Both
build perfectly through what looks like about the first third; both fail
initially at exactly the same point -- in the %install phase they
attempt to install paw (and, if you push past that, a couple of other
tools) with install -c -s.  Unfortunately, these are bash scripts.
Unfortunately, install barfs when asked to strip a bash script, e.g.:

rgb at lilith|B:1154>install -c -s build-rpm /home/rgb/bin
strip: /home/rgb/bin/build-rpm: File format not recognized
install: strip failed

The install flags also include -c, which is currently obsolete but

If one plows past this bug (by hand-hacking the relevant build Makefile
to remove the offending and irrelevant flags) then the build proceeds
apace until it hits the usual crap -- missing files, explicit build
references outside of the %buildroot, no %buildroot used or defined in
parts of the spec file.

This is all really annoying.  It isn't that hard to build something into
a rebuildable, portable, robust rpm.  The exercise of making this work
is also invaluable as it invariably means cleaning up the code and build
process.  This code needs to be cleaned with a subtle tool.  I'd suggest
a front loader and a dumptruck, for starters...

I mean (reluctantly taking a peek at actual source at a point of
failure), this code is still #ifdef'd for the Apollo!  For VMS!  For
(and is there darker Evil in the world?) MVS!.  I don't even recognize
the ACE, whatever the hell that was.  Simply trying to thread one's way
through the #ifdef jungle leads one into Hell.  Somebody once upon a
time cared enough to instrument the hell out of this code for every
big/major piece of hardware the planet ever produced, but nobody seems
to be able to spare the time or the money to GET RID OF CRUFT and hack
all this shit OUT.  And nobody seems to be able to spare the time NOW to
get the code to build and run transparently and easily on a platform
that comprises several hundred times (several thousand?  several
million?  several billion?) the total aggregate compute power
represented by all the supercomputers in those #ifdef's put together.

Don't get me started on the fact that it uses imake.

This is very definitely a program that is its own hell.  The worst of it
is that it would take a single programmer who follows the Tao --

  (see http://www.phy.duke.edu/~rgb/General/tao/tao.html#book4, 4.4)

--no more than six months to hack it all OUT, get RID of imake, go IN to
all the sources (with e.g. sed, not by hand) and permanently alter them
to make the SUS compliant where they aren't already, to get this #@(&%
program to "compile without an error message and run like a gentle
wind".  And PACKAGED in a real package management system that is
actually universally supported and that has more than one person in the
Universe committed to it.

The benefits for doing an even more sweeping revision of the whole thing
-- updating the code itself, using EXISTING numerical libraries that ARE
being properly cared for and that DO have widespread participation from
a user community and that CAN be safely and incrementally improved with
the latest in modern algorithms (e.g. the GSL) are very likely to be
suprisingly great in terms of performance, as well.  This might take a
team of no more than 3 programmers a year to do, where the only reason
that I think the project would benefit from more than one is that the
library code is highly modular so that they can probably sanely split up
the work and only have to work "together" to hammer out details at the

So I'm left in the same place I was in the last time I tried this.  How
much of this do I attempt to actually fix -- which involves unpacking
and reconstructing a source copy of the entire tree and at LEAST going
into the Makefiles and Imakefiles and figuring out all of the Evil
therein being wrought, much of which is as useless and occassionally
dangerous as fossil DNA (at best unexpressed, at worst it mutates into
cancer, and this is not really a metaphor) and making the changes
required to make this program build PROPERLY into an RPM, or seeing if I
can hack the rpm %buildroot directly to where it will build, make myself
a "functional" binary rpm, and plan to have to go through it all again
the next time that I need to rebuild.

If somebody would actually pay me for a year or two, actually, I'd
cheerily undertake the former and revise the damn thing from the ground
up myself.  However, I don't even USE cernlib -- I'm doing this for
other users here.

Of course, if >>I<< were to do this, I would proceed by a) freezing the
2005 library as the LAST library with promiscuously and centrally
supported side architectures.  After this if you want to run on win32 or
IBM or Apollos, you either develop patches and instrument the base code
a posteriori for your own private build or you change over to a
supported architecture.  I would then absolutely gut the build process
and replaceit with the simplest of Makefiles at each recursive level
that would get the job done -- an absolutely clean build/install on a
more or less standard linux box -- and then would CONTEMPLATE using
gnu's automake as a mechanism for managing modest heterogeneity.
(Ordinarily I eschew it as being pretty Evil in its own right, but for
code of this level of complexity it is probably appropriate, and at
least it isn't imake.)

This would involve stripping out every last #ifdef and just plain
starting the library code over in a linux/SUS/Posix compliant form.
This really should be fine -- if you count the number of linux/gnu boxes
this library needs to install and run cleanly on compared to everything
else put together it's only what -- ten to one?  A hundred to one?

Who knows, once this were done it might be determined that it is
POSSIBLE to instrument for one, or even two, additional
not-quite-identical architectures that autoconf can't handle
transparently by default.  With the old 2005 code base frozen, they
could always import "just" the win32 #ifdefs.

> There are people picking up where Jakob left, so I might be able to
> lay my hands on newer versions as well.

If they are really committed to making proper rpms out of this, that
would be lovely.  I've already fixed several obvious problems in the
spec file (no you can't assume that a user will build as root, nor can
you assume that the RPM topdir is /usr/src/redhat) but the spec file in
general is SO nasty that it can fail in SO many places and each failure
requires a full rebuild to test.  Even on a GHz class system, it takes
minutes to get to the next failure or to test a fix.

I'd strongly, strongly suggest that they offload the %build and %install
phases (that are currently a whole bunch of really nasty shell code for
doing recursive Makefile hacks with sed followed by recursive builds
followed by recursive installs) into a toplevel build/install script
invoked by the simplest of Makefiles or into a toplevel Makefile (make
already having various mechanisms for recursive builds).  There also
HAVE to be better ways to proceed than to dynamically patch the
Makefiles with sed.  Like patch the makefiles ONCE, permanently, and
then save the patched version as the original, and make the users of the
Apollos still in operation out there undo the patches...;-)

That way the spec file becomes very simple -- a one-phase %build, a
simple %install and a simple list of %files.  Of course the automation
of the toplevel build would benefit everybody, not just rpm builders.

> (Or you could just use Pacman...)

(rgb runs from the room screaming...:-)

(and unfortunately, runs back to delving into the mysteries of:

makedepend: warning:
(reading /usr/include/string.h, line 33): cannot find include file
         not in
         not in
         not in
         not in /var/tmp/cern-2005/usr/cernlib/2005/src/include/stddef.h
         not in /usr/local/lib/gcc-include/stddef.h
         not in /usr/include/stddef.h
         not in /usr/lib/gcc/i386-redhat-linux/4.0.0/include/stddef.h
gmake[2]: Leaving directory
gmake[2]: Entering directory
rm -f archive/tcpaw.o
gcc -c -O1 -fomit-frame-pointer
-I/var/tmp/cern-2005/usr/cernlib/2005/src/include -DFUNCPROTO=15
/var/tmp/cern-2005/usr/cernlib/2005/src/packlib/cspack/tcpaw/tcpaw.c -o
/var/tmp/cern-2005/usr/cernlib/2005/src/packlib/cspack/tcpaw/tcpaw.c: In
function 'isetup':

which is apparently caused by the fact that I failed to link
/usr/src/linux/include in the spec file (which REQUIRES that the build
be run as root) and instead hacked the spec file to

if [ ! -d %buildroot/usr/src/linux ]; then
    mkdir -p %buildroot/usr/src/linux
if [ ! -d %buildroot/usr/src/linux/include ]; then
    ln -sf %buildroot/usr/include %buildroot/usr/src/linux/include

which actually does what the logic it replaces would not correctly do,
but which also doesn't work, so that the build process loses stddef.h.


Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list