[Beowulf] hpl size problems

Joe Landman landman at scalableinformatics.com
Tue Oct 4 06:01:32 PDT 2005


Last I checked is was really hard to get mpich to happily switch between 
ch_p4, ch_gm, ch_shmem, ch_p4.iwarp, ... easily.  In fact you have to 
build a new one for each interface (this is *bad* IMO, but the MPICH 
folks on the list must have had a really good reason for doing this, or 
at least I hope it was a good reason).  This leads *precisely* to the 
scenario that Chris indicates.

There are some mpich versions out there that let you dynamically link at 
run time (this is *good*) such as the Scali.  There is mvapich.  There 
is LAM.  Some handle this a little better than others.

But you are still stuck with an explosion of mpich/... .  And this is in 
the truest sense of the word, a nightmare.

It must be nice to have a small single group of mandated codes, 
libraries, and ABIs.  "Thou shalt not use anything but what we provide, 
and the heck with your requests".   Only works until you get that one 
user who will use lots of your cycles, but you know, they really need 
xyz-2.0.1, and you have xyz-1.2.3.  So how do you handle this? 
(rhetorical question).  Short answer is you need discipline in your 
flexibility.  You can be draconian and say "if it aint RPM then we 
aren't gonna do it" in which case they supply you an RPM which breaks 
other peoples codes. ... Hello.  Is this thing on?  That doesn't work, 
wrong discipline IMO.  You can err the other way "Sure we will install 
it whereever you like" and whammo, some poor supercomputing user whom 
has been using default paths for everything just happens to be on the 
business end of your nice shiny new (and incompatible) xyz-2.0.1, as 
theirs is linked against 1.2.3.  Can't happen you say?  Happens all the 
time, with RPMs for that matter as well as with tarballs.

Its all about setting up a discipline for change, educating users, 
understanding that it is inevitable, and that you need to adapt to it if 
you have more than 1 (group of) user(s).  If you need to have 14 
different mpich, then you need to make sure your administrative and 
installation processes can handle it.   This turns out to be what 
modules is exceptionally good at helping with.  You can also change the 
default install paths (remember the thing you buzzed me for 
previously?), and select paths based upon a well defined algorithm 
rather than a "database" lookup (modules).  Lots of folks use this happily.

In short this is a real problem for large shared computing resource 
facilities with lots of users of varying code requirements, that often 
are beyond the initial scope of deployment and system build.  If you 
don't have a good method of handling such cases, you can either deny 
they exist and insist upon LARTs (bad), or come up with a method to 
adapt to the need (can be bad or good, depending upon how hard you work 
at setting up a sensible system).  I know it doesn't jive with "The One 
True Way"(tm).


Robert G. Brown wrote:
> Chris Samuel writes:
>> On Fri, 30 Sep 2005 08:15 pm, Leif Nixon wrote:
>>> So what about us poor souls who need to have several versions of the
>>> same package installed in parallel?
>> Our IBM Power5 Linux cluster has *14* different MPICHs, combinations 
>> of 32/64-bits, myrinet/p4 and various mixes of gcc (different 
>> versions) with xlc and xlf.
> You, my friend, need a sucker rod badly.  [Ref:  man syslogd, /Sucker].
> Or perhaps in Australia you have a different instrument?
>   rgb
>> Chris
>> -- 
>>  Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
>>  Victorian Partnership for Advanced Computing http://www.vpac.org/
>>  Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
> ------------------------------------------------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615

More information about the Beowulf mailing list