[Beowulf] Opteron/Athlon Clustering

Tue Jun 8 17:19:25 PDT 2004

Robert G. Brown wrote:

>On Tue, 8 Jun 2004, Joe Landman wrote:
>
>  
>

[...]

>>>      
>>>
>>I disagree.  Not everyone is doing one offs of it.  Some distros include 
>>this, and will support it.  You simply make a choice as to whether or 
>>not you want to pay them money for the work they did to do this.
>>    
>>
>
><rant> Ah, but you see, this is a religious issue for me for reasons of
>long-term scalability and maintainability, so I don't even think of this
>alternative.  Or if you like, I think it costs money in the short run,
>and costs even more money in the long run, compared to participating in
>Fedora or CaOSity and doing it once THERE, where everybody can share it.
>But you knew that...;-)
>  
>

Hmm.  I think we might be confusing long term maintainability with 
paying for a distribution, which is what I was saying, without actually 
saying it.   Moreover you are bringing a number of other issues to bear, 
which are valid and important in and of themselves, but for the purposes 
of solving this particular problem (32 bit binaries on 64 bit linux), 
are tangential. 

If you want to solve 32 bit binaries on 64 bit linux today, this is 
doable.  Simply requires you buy the right distribution.  No, not every 
distribution is there yet, and in fact some of the ones I am quite 
interested in (cAos), are a bit behind.  Thats ok, I expect them to 
mature.  If we get enough time, you may see us contribute to them.

OTOH, if I have to deploy a cluster today, where this is a requirement, 
I have to balance the time committment to do this work versus the cost 
to do it with another distribution.  This is annoying, but these are 
some of the hard choices that need to be made.

>You see, to me "scalable" means "transparently scalable to the entire
>campus".  It also means "independently/locally auditable for security"
>  
>

I refer to what you call "transparently scalable to the entire campus" 
as a question of deployability, not scalability.  This is an 
(incredibly) important issue.  I wasn't trying to solve that one with my 
suggestion of a different distribution.

>(according to our campus security officer, who curiously enough is an
>ex-cluster jock:-). 
>

heh... seem to find lots of "ex-physics" types around... though I 
haven't seen too many "ex-cluster" folks around.  Lots of old HPC folks 
doing other things...

> Maintainable means "installed so that all users
>automagically update nightly from a locally maintained campus
>repository, built from source rpm's that can be built/rebuilt at a whim
>for any of the supported distributions.
>  
>

Ah... utopia ... sort of ...

This is why in part we are building our bioinformatics software RPM base 
out, as source RPM whenever possible.  Just added HMMer.

Of course RPM is not perfect, and we run into lots of bugs dealing with 
RPM supersets/subsets and version changes.  This is really annoying.  We 
can usually build from a tarball, but when someone screws up an RPM, it 
can be (and has been) a nightmare to fix. 

Tarballs are useful for quite a number of things.  My big problem occurs 
when people decide that they have a better tool than make. 

>A setup like this means that one person, or a small team, can maintain a
>single repository so that every sysadmin, or individual, or research
>group, or cluster on campus can PXE/kickstart install specific
>preconfigured images, or interactively install from preconfigured
>templates plus mods, or install a barebones base and refine with e.g.
>yum install by hand or with a %post script.  It means that individuals,
>or individual groups, or individual departments WITHIN the global
>organization can set up local repositories containing their own RPMs
>that layer on top of the public repository (and which might well be
>proprietary or limited in their distribution scope).  It means that
>nobody has to do work twice, and that everybody understands exactly how
>everything works and hence can make it meet quite exotic needs within
>the established framework, simply.  It means that if any tool IS a
>security problem, dropping an updated rpm in place in the master
>repository means that by the following morning every system on campus is
>patched and no longer vulnerable to an exploit. 
>

Yes, I know. 

See 
http://www.engr.utk.edu/eccsw/SGIautoinst/node4.html#SECTION00310000000000000000  
and if you want 
http://www.engr.utk.edu/eccsw/SGIautoinst/node42.html#SECTION00730000000000000000 

> It means not having to
>call a vendor for support when they DON'T run your local environment and
>have no idea how or why your application fails and whose idea of a
>solution is "pay Red Hat a small fortune per node so that your
>environment and ours are the same, then we'll help you".
>  
>

Heh...  My customers don't have the luxury of running such environments, 
they have real work to do.  That means problems have to be solved, which 
means that they have to be found.  This of course means that you have to 
get the person with the clue on ASAP.

I have not yet found a larger company that knows that the customer 
doesn't want to answer the script from the time-life operator in order 
to be connected to the person that has some level of clue.  Wastes their 
time.  Costs them money.

Support (good support that is, not what most people typically pay for) 
is hard, and it is expensive.   It is also worth it (and yes, I am 
biased).  If you can support yourself, go for it.  Some companies get 
put off by that.  We like working with smart folks.  Makes our lives 
easier, lowers our costs, and lowers the cost to the customer.

>One of the biggest problems facing the cluster community, in my humble
>opinion, is that the very people who should know BEST how important
>scalability and transparency is to users of the software they develop
>and maintain (just for example and not to pick on them as it is a common
>problem, say, SGE) to distribute their software source in tarball form
>with arcane build instructions that install into /usr/local one system
>at a time!
>
>  
>

heh.... if I didn't have two proposals due tomorrow I might join you in 
this rant.  Suffice for now to say that we are doing what we can to make 
informatics tools easier to install.  We want it to be seemless/painless.

The problem is (oh ... there I go) each distro vendor has their own 
particular delusions about the right way to package, and configure their 
packages.  If you have to support multiple distros, this can (and does) 
drive you batty real fast.  I especially like RH using Apache 2.0.  As 
our user interface is based upon HTML::Mason which sits atop of mod_perl 
1.x (doesn't like mod_perl 2.0-delta all that much, lots of things 
broken), and Apache 2.0 doesn't run so good with mod_perl 1.x, we have 
to install Apache 1.3.x alongside Apache 2.0.  Of course, since the good 
folks at RH assume you would never, ever, ever want to run these two at 
the same time, this is very hard(tm).  Only way is with source tarballs 
right now.

 Theres more, but I would be ranting if I went on.... no wait ...

>I'm sorry, but this is just plain insane.  The whole reason that RPMs
>and DEBs exist is because this is an utterly bankrupt model for scalable
>systems management, and it is particularly sad that the tools that NEVER
>seem to be properly packaged these days are largely cluster tools, tools
>that you NEVER plan to install just once and that are damn difficult
>for a non-developer to install.  It is as if the developers assume that
>everybody that will ever use their tool is "like them" -- a brilliant
>and experienced network and systems and software engineer who can read
>and follow detailed instructions just to get the damn thing built, let
>alone learn to run it. 
>  
>

RPM is not perfect.  It does fix some things.  Some of the things that 
are broken in RPM IMO are config management.   Just have a look at the 
RedHat kernel spec file as an example.  Scripting within the RPM is 
broken (its just basic pre-processor stuff). 

It is better than a tarball, but it is not great.  I'd argue that it 
would be hard to call it "good".  Not arguing for .deb (dont use it), or 
portage, etc.

>It is also a bit odd, given that many/most major systems now support rpm
>(or can be made to, given that is IS an open source architecture
>independent toolset).  I visit the Globus website, and they have
>everything neatly packaged -- as tarballs.  I visit Condor, and after
>clicking my way through their really annoying "registration" window I
>see -- tarballs, or BINARY rpms (which of course have to match
>distribution, which limits the applicability of the rpms considerably).
>I visit SGE, I see tarballs (or binary RPMs).  We don't WANT binary
>rpms, we want source rpms, ones that require one whole line of
>installation instructions: rpm --rebuild, followed by rpm -Uvh or
>yum-arch and yum install.
>  
>

Back at SGI years ago, we asked ISVs why their software never came in 
inst images.  The inst tool made config management easy (well nearly 
easy, it was pretty good).  Their response had to do with lowest common 
denominator and ease of their support model.  One distribution medium, 
one set of tests for packing/unpacking ....  it kept their costs down.

Don't shoot the messenger here.

Config management is good.  Very good.  Especially if you are trying to 
solve nasty problems.

>Does this represent a business opportunity?  Sure.  One that shouldn't
>  
>

Doubt that.  Anyone here willing to pay for config management tools?  
RPM and yum are free (as in beer).  

>exist, but sure.  It also represents an even bigger opportunity for the
>community to come to its senses and adopt a proper packaging, one that
>  
>

I'd say lets fix the config issues in RPM, make it really scriptable, so 
we can have *portable* RPMs at the *source* level

< insert incoherent sputtering here over this issue >

with easy to tweak configs, not hunting after every < insert incoherent 
sputtering here over this issue > configure script command line which is 
ifdef'ed out....

>is indeed portable across "all" the rpm-supporting operating systems and
>that puts software so built into the right places on the right systems
>without anything more than (e.g.)
>
>  yum install sge-client
>
>or an equivalent kickstart line on any given node.
>  
>

Heh... we are in violent agreement.  Figures :)

>So I appreciate that you're in business doing this work, and adding
>additional value, and selling the work back to many people who DON'T
>want to build all this themselves given that it IS absurdly difficult to
>build and install and configure the high end tools.  I still lament the
>  
>

That isn't our business (the danger is always having others articulate 
what you are doing... :( ).   Without spamming here, we are trying to 
service the market for high performance computing from an end user and 
application centric point of view, not focusing upon the computer 
first.  We want to make easy things trivial, and hard things possible 
(even easy if possible).  I could go on, but I don't like UCE anymore 
than the next person.

>fact that it is necessary at all, as building, installing, and even part
>of configuring should be totally automagical and prepacked ready to go.
>  
>

To some level I agree with you, though I note that it is not possible to 
completely encapsulate all possible configurations that an end user 
might need.  That is, a machine (machine == cluster here) which is 
really well designed for a particular bit of computing may not be well 
designed for another (beowulf low latency machine vs a compute farm for 
various applications).  Many vendors sell what is on their price-book.  
This means one size does fit all.  You see this when you spend your 
extra 400-800 $USD for unneeded SCSI drives on compute nodes.  But I 
digress.

The same is true for cluster distributions, and for linux RPMs more 
generally.  I lament about the (usually horrible) way folks compile 
Perl, and the perl modules.  I complain bitterly about badly broken 
software (older anacondas were a massive pain).  I complain about distro 
vendors choices of file systems (still waiting for xfs from RH, and yes, 
Hell Michagan, 40ish miles from me, did in fact freeze over last year, 
so it is due anytime now).

It "should be" automagical, but to a large extent, it cannot be due to 
choices others have made when they built the RPMs.

>Note well that tools that ARE maintained in rpm format, e.g. PVM, lam
>mpi, tend to be universally available >>because<< they are there ready
>to directly install in all the major rpm-based distros.  Tools like SGE
>that might well BE universally useful on clusters are NOT in anything
>like universal use BECAUSE they are a major PITA to build and install
>and maintaining them in an installation doesn't scale worth a damn.
>  
>

heh ....

on the head node

    ./install_execd -auto -fast

<oh no> ctrl-c ...

<too late> grrrrr

:(

>This is not the Sufi way....
>
></rant>
>  
>

I think we might disagree on minor points, but sounds like we are 
arguing the same sides to a large extent.

Of course, you can still solve the original problem in the way I 
indicated.  Doesn't do squat for the rest of the stuff, though I'd argue 
(ignoring the vagaries of the the SuSE distribution) that they are 
somewhat ahead in terms of making things co-exist.  RH could learn a 
thing or three from them.

>   rgb
>
>  
>
>>I am using SuSE 9.0 in this mode without problems.   I use nedit 
>>binaries (too lazy to compile myself), and a number of other tools that 
>>are 32 bit only on the dual Opteron. 
>>
>>That said, it makes a great deal of sense to recompile computationally 
>>intensive apps for the 64 bit mode of the Opteron.  Not that it will 
>>make Jove faster, but it does quite nicely for BLAST, HMMer, and others, 
>>including my old molecular dynamics stuff.  The 64 bit versions are 
>>faster by a bit.
>>
>>Joe
>>
>>
>>    
>>
>
>  
>