[Beowulf] Building new cluster - estimate

Eric Thibodeau kyron at neuralbs.com
Thu Aug 7 07:48:53 PDT 2008


Joe Landman wrote:
> Eric Thibodeau wrote:
>
>>> Advantage of modules is you can upgrade them without upgrading the 
>>> kernel.  Go ahead, build in that e1000 driver.  I dare yah... :(
>> Ok...I didn't put enought emphasis on "main" stuff....as in, _all you 
>> need to get the system booted, which essentially means HDD chipset 
>> drivers, the rest I do build as a module (NIC, video and such).
>>>
>>> More to the point it does give some good flexibility for end users 
>>> with a need to keep the core "separate" from the drivers for 
>>> maintenance.
>>>
>>> Initrd is subtle and quick to anger.  One must use burnt offerings 
>>> to placate the spirits of initrd.
>> LOL!
>
>  ... now I don't mean hardware burnt offerings ... smoke rising from 
> your motherboard may not placate the spirits of initrd, they 
> definitely may impede further operations ...
Oh...you mean something like this: 
http://wiki.neuralbs.com/~kyron/WrongSpecs/dsc00883.jpg
>
>>>
>>> Well, it would be a heck of a lot nicer if the tools were a little 
>>> more forgiving ... Oh you don't have this driver in your initrd ... 
>>> ok ... PANIC (mwahahahaha)
>> Pahahahahah... Point in case, I am building a CD-only cluster system 
>> (based on Gentoo) and I am currently _NOT_ using initrd because all 
>> that really needs to be built in is NFSroot support an all NICs I 
>> care to put in. Obviously this is a deprecated approach but it's 
>> proven to be the most effective and easy to maintain in my case.
>
> We build an integrated NFSroot and e1000 and a few other things for a 
> customer.  Fixed hardware for their cluster.  From bare-metal-off to 
> operational infiniband compute node in ~45-60 seconds (I say 45, but a 
> few things took a little longer to start, like SGE).
Hey, weren't you the one complaining about e1000 "Go ahead, build in 
that e1000 driver.  I dare yah"? I haven't seen "moving hardware"...oh, 
wait, yes I have, our cluster is on wheels (dig a little and you'll see 
it)! How many nodes?
> [...snip...]
>>>   Most of the arguments I have heard are "oh but its compiled with 
>>> -O3" or whatever. Any decent HPC code person will tell you that that 
>>> is most definitely not a guaranteed way to a faster system ...
>> Hey...as I stated above, one would have to be quite silly to claim 
>> -O3 as the all well and all good optimization solution. At least you 
>> can rest assured your solutions will add up correctly with GCC. To get a 
> Well, sometimes.  You still need to be careful with it.
>
> This said, I am not sure icc/pgi/... are uniformly better than gcc.  I 
> did an admittedly tiny study of this http://scalability.org/?p=470 
> some time ago.  What I found was the gcc really held its own.  It did 
> a very good job on a very simple test case.
This is worth a new thread ;)
>
> Then again, the fortran version was simply faster than the C version, 
> but that can be explained ... by ... er ... ah ... something.
>
>> "faster" system, you really have to look at your app, use strace, 
>> ltrace and gprof, then you can play with that. What I _am_ saying 
>> though is that Gentoo _does_ empower the administrator by giving him 
>> the ability to customize the OS if a bottleneck is to be identified.
>
> Yup.  There is nothing like a profile of an app running the code, to 
> see where it is spending its time to decide between code shifts and 
> algorithmic shifts.
>
>>>
>>>> (gcc-4.3.1, or ICC ;) ) _integrated_ into your system (not a hackish 
>>>
>>> Er... We often use several different compilers in several different 
>>> trees.  Several gccs, pgi, icc, eieio ... you name it.  All are 
>>> integrated.
>> Are-you currently able to run GCC-4.3.x versions on your current setup, 
>
> Currently running 4.2.3-2ubuntu7 on my laptop.  Other machines 
> (development box) has something like 4 different gccs there.  I 
> haven't tried 4.3.x yet ... had planned to, but work gets in the way.
Tell me when you get it going, it's for 4.3.x that I had to upgrade 
glibc. As a ref: http://bugs.gentoo.org/show_bug.cgi?id=218603
>
>> I'm actually eager to know. I'm still living under the ASSumption od 
>> binary distributions not coping too well with multi-library 
>> environments. Point in case, one of my colleagues _really_ wanted 
>
> No, our systems (Ubuntu, SuSE, Centos) seem to have no real problems 
> apart from the occasional broken hard wired /usr/lib with the wrong 
> ABI in a configure/make file.  Usually easy to fix.
Ok, those are the general problems I would hit and I had switched to 
Gentoo before starting to use SRPMs.
>
>> firefox 3 on his ubuntu system. The installer trickled down to having 
>> to uninstall glibc...and he forced it to YES (and this is just a 
>> browser, not something that is used to _make_ code and would be tied 
>> to glibc)
>
> Hmmm... I have firefox 3 on this system (64 bit) and I run icecat for 
> 32 bit access (java and other things).  No glibc changes (apart from 
> security patches).  He must have done something horribly wrong.  We 
> have multiple mixed ABI ubuntu/centos/suse systems, and haven't had 
> issues.
Curious...maybe he has an _old_ Ubuntu install...something like 6.0 series.
>
>>>> afterthought of an RPM that pulls in a new glibc that breaks the 
>>>> install 
>>>
>>> Er ... not the slightest clue as to what you are talking about.  I 
>>> haven't seen gcc, icc, pgi, ... touch our glibc.
>>>
>>> Maybe I am missing the fun.  Which ICC version is this?  Which gcc 
>>> is this, which glibc is this?
>>>
>> Sorry about that I might have been misleading, GCC is generally the 
>> one most sensitive to glibc, not the other ones although the latest 
>> ICC (10.1.x series) do claim compatibility with the GNU environment 
>> so it might get a little more dependency there.
>
> We have installed the 10.1.015 on customer machines from Centos 5.2 
> through SuSE 10.x through Ubuntu with nary a problem.  Very different 
> glibc's.  No issues with code generation.
I am sorry I mixed up glibc with GCC whilst talking about ICC's 
compatibility, this one is specific to gcc and icc on the same system 
and the (re)definition of atomic functions which ICC couldn't follow

http://bugs.gentoo.org/show_bug.cgi?id=201596

Never hit that?
>
> Binary distributions aren't evil.  They do work, quite well in most 
> cases.
I switched to Gentoo in 2004 and never looked back, and I should because 
4years is a long time in the distribution world. I did switch my laptop 
users to Kubuntu but I still find the distribution annoys me.
>
>>
>> Cheers!
>>
>> Eric




More information about the Beowulf mailing list