From Hakon.Bugge at scali.com  Fri Feb  1 04:52:15 2008
From: Hakon.Bugge at scali.com (=?iso-8859-1?Q?H=E5kon?= Bugge)
Date: Fri, 01 Feb 2008 13:52:15 +0100
Subject: [Beowulf] Cheap SDR IB
In-Reply-To: <Pine.LNX.4.64.0801310854300.18856@coffee.psychology.mcmast er.ca>
References: <200801302001.m0UK0UCS015867@bluewest.scyld.com>
	<20080131095052.EB94635B03D@mail.scali.no>
	<Pine.LNX.4.64.0801310854300.18856@coffee.psychology.mcmaster.ca>
Message-ID: <20080201125216.100F235B13F@mail.scali.no>

Mark,

At 15:09 31.01.2008, Mark Hahn wrote:
>I did not claim the opposite - I said that for small, cost-sensitive
>clusters, it would be unusual to need IB's advantages (high bandwidth
>and latency comparable to other non-Gb interconnects.)
>
>in particular, I'm curious about the conventional wisdom about weather codes
>and bandwidth.

k

>I was curious about this: you only used one DDR port; was that because
>of lack of switch ports, or because WRF uses bandwidth <= DDR?

The system is a general purpose benchmarking system; not particularly 
crafted for running WRF. Based on a slightly apples-to-oranges 
comparison, you will see that QLogic's SPEC MPI2007 submission 
contains a WRF number (374s) which is _very_ similar to what I 
reported. This is an indiction that WRF on this system / dataset is 
not restricted by SDR bandwidth (also, for the record, this is a 
slightly mix of compilers, Pathscale 3.0 and Intel 9.1, - but 
they  both do a decent job on WRF).

>sure, and these are very fat nodes for which a fat interconnect is
>appropriate for almost any workload that's not embarassing.  but really
>I wasn't suggesting that plain old Gb (bandwidth in particular) was
>adequate for all possible clusters.  I was questioning whether IB 
>was a panacea for small, cost-sensitive ones...

I do not agree that  dual-socket, dual-core Woodcrest nodes these 
days are "very fat". A quad-socket, quad-core is. A quad-socket, 
dual-core or a dual-socket, quad-core might be considered semi-fat...

Hakon


From vanallsburg at hope.edu  Fri Feb  1 09:06:02 2008
From: vanallsburg at hope.edu (Paul Van Allsburg)
Date: Fri, 01 Feb 2008 12:06:02 -0500
Subject: [Beowulf] weather modeling cluster
Message-ID: <47A3517A.5050100@hope.edu>

All,
I'm interested in setting up a open source weather modeling cluster in 
an educational environment. My existing clusters run chemistry, math and 
bio applications and I don't know what weather app would be a good 
choice for a first time effort.  Thanks for any input that may help me 
get my feet wet...

Paul


-- 
Paul Van Allsburg       
Computational Science & Modeling Facilitator
Natural Sciences Division,  Hope College
35 East 12th Street
Holland, Michigan 49423
616-395-7292 
http://www.hope.edu/academic/csm/


From john.leidel at gmail.com  Fri Feb  1 09:29:24 2008
From: john.leidel at gmail.com (John Leidel)
Date: Fri, 01 Feb 2008 11:29:24 -0600
Subject: [Beowulf] weather modeling cluster
In-Reply-To: <47A3517A.5050100@hope.edu>
References: <47A3517A.5050100@hope.edu>
Message-ID: <1201886964.17107.9.camel@e521.site>

Check out WRF: Weather Research and Forecasting Model

http://www.wrf-model.org/index.php


On Fri, 2008-02-01 at 12:06 -0500, Paul Van Allsburg wrote:
> All,
> I'm interested in setting up a open source weather modeling cluster in 
> an educational environment. My existing clusters run chemistry, math and 
> bio applications and I don't know what weather app would be a good 
> choice for a first time effort.  Thanks for any input that may help me 
> get my feet wet...
> 
> Paul
> 
> 


From gerry.creager at tamu.edu  Fri Feb  1 10:04:25 2008
From: gerry.creager at tamu.edu (Gerry Creager)
Date: Fri, 01 Feb 2008 12:04:25 -0600
Subject: [Beowulf] weather modeling cluster
In-Reply-To: <1201886964.17107.9.camel@e521.site>
References: <47A3517A.5050100@hope.edu> <1201886964.17107.9.camel@e521.site>
Message-ID: <47A35F29.7080609@tamu.edu>

I'll second WRF for a good starting point for weather codes.  Feel free 
to drop me a line if you need some suggestions with it.  Also: Plan to 
send someone to the summer WRF tutorial in Boulder, where they'll get 
good info to bring things up right.

gerry

John Leidel wrote:
> Check out WRF: Weather Research and Forecasting Model
> 
> http://www.wrf-model.org/index.php
> 
> 
> 
> On Fri, 2008-02-01 at 12:06 -0500, Paul Van Allsburg wrote:
>> All,
>> I'm interested in setting up a open source weather modeling cluster in 
>> an educational environment. My existing clusters run chemistry, math and 
>> bio applications and I don't know what weather app would be a good 
>> choice for a first time effort.  Thanks for any input that may help me 
>> get my feet wet...
>>
>> Paul
>>
>>
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843


From ionsourcerer at mac.com  Fri Feb  1 08:45:35 2008
From: ionsourcerer at mac.com (Rick Becker)
Date: Fri, 1 Feb 2008 11:45:35 -0500
Subject: [Beowulf] unsubscribe
Message-ID: <4FE37EFA-17EF-4B3E-BACB-96436E358A62@mac.com>


Rick Becker
Cluster Sciences
Borolene Metamaterials
39 Topsfield Rd.
Ipswich, MA  01938 US
978-337-9009
ionsourcerer at mac.com

If you do not know where you are going, call it "exploration".
If you do not know what you are doing, call it "research".


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080201/379da985/attachment.html>

From landman at scalableinformatics.com  Fri Feb  1 11:33:12 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 01 Feb 2008 14:33:12 -0500
Subject: [Beowulf] Cheap SDR IB
In-Reply-To: <47A228F2.1070309@physics.isu.edu>
References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A683@mtiexch01.mti.com>
	<47A228F2.1070309@physics.isu.edu>
Message-ID: <47A373F8.9050303@scalableinformatics.com>

Brian Oborn wrote:

> A quick side question. Is it possible to use IB as a cross-over with no 
> switch? If I had just 2 fat nodes could I connect the HCAs directly to 

Yes.

> each other and avoid the switch costs? Could this be extended to ring or 
> hypercube topologies?

Yeah ... but a switch rapidly makes sense.  One link going down would 
"quench" your ring ala FDDI.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From Daniel.Pfenniger at obs.unige.ch  Fri Feb  1 11:36:47 2008
From: Daniel.Pfenniger at obs.unige.ch (Daniel Pfenniger)
Date: Fri, 01 Feb 2008 20:36:47 +0100
Subject: [Beowulf] Cheap SDR IB
In-Reply-To: <47A228F2.1070309@physics.isu.edu>
References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A683@mtiexch01.mti.com>
	<47A228F2.1070309@physics.isu.edu>
Message-ID: <47A374CF.9030901@obs.unige.ch>


Brian Oborn wrote:
> 
....
> A quick side question. Is it possible to use IB as a cross-over with no 
> switch? 

Yes, and the cables are the same.

> If I had just 2 fat nodes could I connect the HCAs directly to 
> each other and avoid the switch costs? 

Yes.  With 3 nodes it might be cheaper having 2 HCA per node, 6 cables,
than the switch solution.


	Dan


From hahn at mcmaster.ca  Fri Feb  1 12:10:38 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 1 Feb 2008 15:10:38 -0500 (EST)
Subject: [Beowulf] Cheap SDR IB
In-Reply-To: <47A374CF.9030901@obs.unige.ch>
References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A683@mtiexch01.mti.com>
	<47A228F2.1070309@physics.isu.edu> <47A374CF.9030901@obs.unige.ch>
Message-ID: <Pine.LNX.4.64.0802011457030.32049@coffee.psychology.mcmaster.ca>

>> If I had just 2 fat nodes could I connect the HCAs directly to each other 
>> and avoid the switch costs? 
>
> Yes.  With 3 nodes it might be cheaper having 2 HCA per node, 6 cables,
> than the switch solution.

with 3 nodes, each with a two ports,
wouldn't you need just 3 cables?

how is routing controlled in switchless configs? 
does IB have node-level forwarding?


From daniel.pfenniger at obs.unige.ch  Fri Feb  1 14:56:35 2008
From: daniel.pfenniger at obs.unige.ch (Pfenniger Daniel)
Date: Fri, 01 Feb 2008 23:56:35 +0100
Subject: [Beowulf] Cheap SDR IB
In-Reply-To: <Pine.LNX.4.64.0802011457030.32049@coffee.psychology.mcmaster.ca>
References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A683@mtiexch01.mti.com>
	<47A228F2.1070309@physics.isu.edu> <47A374CF.9030901@obs.unige.ch>
	<Pine.LNX.4.64.0802011457030.32049@coffee.psychology.mcmaster.ca>
Message-ID: <47A3A3A3.8070203@obs.unige.ch>

Mark Hahn wrote:
>>> If I had just 2 fat nodes could I connect the HCAs directly to each 
>>> other and avoid the switch costs? 
>>
>> Yes.  With 3 nodes it might be cheaper having 2 HCA per node, 6 cables,
>> than the switch solution.
> 
> with 3 nodes, each with a two ports,
> wouldn't you need just 3 cables?
> 
Yes, I forgot to divide by 2!
Some HCA have 2 ports, so they would be indicated for a 3 node 
switchless cluster.

Dan


From kilian at stanford.edu  Fri Feb  1 17:44:54 2008
From: kilian at stanford.edu (Kilian CAVALOTTI)
Date: Fri, 1 Feb 2008 17:44:54 -0800
Subject: [Beowulf] Cheap SDR IB
In-Reply-To: <Pine.LNX.4.64.0802011457030.32049@coffee.psychology.mcmaster.ca>
References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A683@mtiexch01.mti.com>
	<47A374CF.9030901@obs.unige.ch>
	<Pine.LNX.4.64.0802011457030.32049@coffee.psychology.mcmaster.ca>
Message-ID: <200802011744.55007.kilian@stanford.edu>

Hi Mark,

On Friday 01 February 2008 12:10:38 pm Mark Hahn wrote:
> how is routing controlled in switchless configs?

It's not. :)

> does IB have node-level forwarding?

No, you can't forward traffic between non-directly connected nodes in 
such a ring topology (without any switch). You would need intra-node 
routing mechanisms which are not present in OFED. I don't know in other 
implementations, though.

Besides, for each cross-over pair, you'll be creating a separate subnet, 
and each subnet requires its own subnet manager. 

However, in a 3-nodes ring, each node can directly connect to all the 
other ones, and strictly speaking, you only have 2 subnets. So I guess 
node-level fowarding is not an issue, and that's probably a viable 
solution.

Cheers,
-- 
Kilian


From 3lucid at gmail.com  Sun Feb  3 10:35:02 2008
From: 3lucid at gmail.com (Kyle Spaans)
Date: Sun, 3 Feb 2008 13:35:02 -0500
Subject: [Beowulf] TIPC in a Beowulf?
Message-ID: <5a1205b30802031035wf416840s115b6a5278c20812@mail.gmail.com>

Has anyone heard of or seen TIPC <tipc.sourceforge.net> used in a
Beowulf Cluster?
Some folks from Wind River (creators of the protocol I think) came and
gave a talk about it at my school. They said it can be used over IP,
or even on it's own through ethernet, and would even work with myrinet
or infiniband with proper drivers.

I'm still not very familiar with programming a Beowulf, but Inter
Process Communication is an equally viable paradigm just like Message
Passing, right?


From hahn at mcmaster.ca  Sun Feb  3 14:42:56 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Sun, 3 Feb 2008 17:42:56 -0500 (EST)
Subject: [Beowulf] TIPC in a Beowulf?
In-Reply-To: <5a1205b30802031035wf416840s115b6a5278c20812@mail.gmail.com>
References: <5a1205b30802031035wf416840s115b6a5278c20812@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802031720120.2756@coffee.psychology.mcmaster.ca>

> Has anyone heard of or seen TIPC <tipc.sourceforge.net> used in a
> Beowulf Cluster?

I haven't.  I sat in on tipc meetings at OLS a few times, and have the 
impression that TIPC people are much more into telecom/footprint issues
rather than HPC.  (and yes, I believe these are very different focuses - 
for HPC, the main issue is latency (since bandwidth is not that hard.))

I _think_ I'm not confusing TIPC with SCTP (which also seems to be rather
telecom-oriented.)

here are some kind of shocking performance measures:
http://www.strlen.de/tipc/

no mention of latency there.

> Some folks from Wind River (creators of the protocol I think) came and
> gave a talk about it at my school. They said it can be used over IP,
> or even on it's own through ethernet, and would even work with myrinet
> or infiniband with proper drivers.

well, TIPC is trying to do a lot that TCP isn't.  for instance, I think 
it's trying to do fairly full group membership as well as topology-aware
routing.  I'm not sure these are as critical to HPC-type clustering 
as they would be for HA-type clustering.

I'm also a bit skeptical of a protocol that aims to put everything into
one kernel-resident layer...

> I'm still not very familiar with programming a Beowulf, but Inter
> Process Communication is an equally viable paradigm just like Message
> Passing, right?

TIPC is a form of MP.  don't confuse MP with MPI!  MPI is important and 
widespread, but I don't think many people would say that it's perfect.
MPI-over TCP in particular is kind of a shame, since TCP is really a 
protocol designed for flakey, overloaded, heterogenous WANs, 
not the kind of dedicated, homogenous, flat network you find in an HPC cluster.

I'm looking forward to OpenMX - it's a message-passing layer amenable to 
ethernet, but well-suited for MPI.  any OpenMX people care to comment?

regards, mark hahn.


From steve_heaton at exemail.com.au  Fri Feb  1 13:56:48 2008
From: steve_heaton at exemail.com.au (Particle Boy)
Date: Sat, 02 Feb 2008 08:56:48 +1100
Subject: [Beowulf] Re: weather modeling cluster
In-Reply-To: <200802011935.m11JYqvd026938@bluewest.scyld.com>
References: <200802011935.m11JYqvd026938@bluewest.scyld.com>
Message-ID: <47A395A0.3080809@exemail.com.au>


I'll also recommend the WRF. The WRF EMS kit from STRC at UCAR is a Good 
Thing:
http://strc.comet.ucar.edu/wrf/index.htm

I got it going quickly and relatively easily without any prior 
experience of large models. Bob R has an entertaining  sense of humour 
(you'll see from the scripts) and was also kind enough to quickly send 
me a starter kit all the way down here to Oz. Great service :)

If your students are looking to tinker with code, I had a lot of fun 
with PUMA:
http://puma.dkrz.de/puma

Cheers
Stevo

> Date: Fri, 01 Feb 2008 12:06:02 -0500
> From: Paul Van Allsburg <vanallsburg at hope.edu>
> Subject: [Beowulf] weather modeling cluster
> To: Beowulf Mailing list <beowulf at beowulf.org>
> Message-ID: <47A3517A.5050100 at hope.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> All,
> I'm interested in setting up a open source weather modeling cluster in 
> an educational environment. My existing clusters run chemistry, math and 
> bio applications and I don't know what weather app would be a good 
> choice for a first time effort.  Thanks for any input that may help me 
> get my feet wet...
> 
> Paul
> 
> 


From wrankin at ee.duke.edu  Mon Feb  4 10:35:00 2008
From: wrankin at ee.duke.edu (Bill Rankin)
Date: Mon, 4 Feb 2008 13:35:00 -0500
Subject: [Beowulf] TIPC in a Beowulf?
In-Reply-To: <Pine.LNX.4.64.0802031720120.2756@coffee.psychology.mcmaster.ca>
References: <5a1205b30802031035wf416840s115b6a5278c20812@mail.gmail.com>
	<Pine.LNX.4.64.0802031720120.2756@coffee.psychology.mcmaster.ca>
Message-ID: <E2B70835-15BD-406F-9B26-8D658EDD6795@ee.duke.edu>

Hey Mark,

> I'm looking forward to OpenMX - it's a message-passing layer  
> amenable to ethernet, but well-suited for MPI.  any OpenMX people  
> care to comment?

Do you have any links to the current status of this effort?  All my  
Googling leads to links on a package (also called OpenMX) for nano- 
material simulations.

Thanks for the info.

-bill


From hahn at mcmaster.ca  Mon Feb  4 11:09:59 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 4 Feb 2008 14:09:59 -0500 (EST)
Subject: [Beowulf] TIPC in a Beowulf?
In-Reply-To: <E2B70835-15BD-406F-9B26-8D658EDD6795@ee.duke.edu>
References: <5a1205b30802031035wf416840s115b6a5278c20812@mail.gmail.com>
	<Pine.LNX.4.64.0802031720120.2756@coffee.psychology.mcmaster.ca>
	<E2B70835-15BD-406F-9B26-8D658EDD6795@ee.duke.edu>
Message-ID: <Pine.LNX.4.64.0802041346020.27267@coffee.psychology.mcmaster.ca>

>> I'm looking forward to OpenMX - it's a message-passing layer amenable to 
>> ethernet, but well-suited for MPI.  any OpenMX people care to comment?
>
> Do you have any links to the current status of this effort?  All my Googling 
> leads to links on a package (also called OpenMX) for nano-material 
> simulations.

unfortunately no.  all I've had is cruel teasing messages from 
myricom-related people.  "code-tease" ;)

to me, it seems like this would be a fairly high priority for 
myricom, since it emphasizes the value of ethernet interop,
whether 1Gb or 10Gb.


From Brice.Goglin at inria.fr  Mon Feb  4 11:17:23 2008
From: Brice.Goglin at inria.fr (Brice Goglin)
Date: Mon, 04 Feb 2008 20:17:23 +0100
Subject: [Beowulf] TIPC in a Beowulf?
In-Reply-To: <Pine.LNX.4.64.0802031720120.2756@coffee.psychology.mcmaster.ca>
References: <5a1205b30802031035wf416840s115b6a5278c20812@mail.gmail.com>
	<Pine.LNX.4.64.0802031720120.2756@coffee.psychology.mcmaster.ca>
Message-ID: <47A764C3.30301@inria.fr>

Mark Hahn wrote:
> I'm looking forward to OpenMX - it's a message-passing layer amenable
> to ethernet, but well-suited for MPI.  any OpenMX people care to comment?

Hi,

http://open-mx.org give a pretty good summary of the current status of
Open-MX. The stack is plugged on top of the Ethernet layer in the Linux
kernel to send/receive MX messages. The MX firmware is basically
emulated in a kernel module without requiring any specific feature in
the hardware.

Release 0.3 is young but I am confident that it's not too bad. The
performance still needs improvement but the stack is already reasonably
stable. At least MPICH-MX and Open MPI build on top of it and complete
IMB. I encourage people to test it and send some feedback.

If you need more information, feel free to ask.

Brice


From werstiuk at platform.com  Mon Feb  4 11:24:25 2008
From: werstiuk at platform.com (Nick Werstiuk)
Date: Mon, 4 Feb 2008 14:24:25 -0500
Subject: [Beowulf] TIPC in a Beowulf?
Message-ID: <531893A968B34D40B36C7A6445BC828A0127177D@catoexm06.noam.corp.platform.com>

I came across this site that has some information on the project
including access to the current version of the code, and a paper
describing the approach.

http://open-mx.gforge.inria.fr/

Regards,
Nick


-----Original Message-----
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]
On Behalf Of Mark Hahn
Sent: Monday, February 04, 2008 2:10 PM
To: Bill Rankin
Cc: Beowulf List
Subject: Re: [Beowulf] TIPC in a Beowulf?

>> I'm looking forward to OpenMX - it's a message-passing layer amenable

>> to ethernet, but well-suited for MPI.  any OpenMX people care to
comment?
>
> Do you have any links to the current status of this effort?  All my 
> Googling leads to links on a package (also called OpenMX) for 
> nano-material simulations.

unfortunately no.  all I've had is cruel teasing messages from
myricom-related people.  "code-tease" ;)

to me, it seems like this would be a fairly high priority for myricom,
since it emphasizes the value of ethernet interop, whether 1Gb or 10Gb.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org To change your subscription
(digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From ebiederm at xmission.com  Tue Feb  5 05:59:15 2008
From: ebiederm at xmission.com (Eric W. Biederman)
Date: Tue, 05 Feb 2008 06:59:15 -0700
Subject: [Beowulf] Re: Cheap SDR IB
In-Reply-To: <E1JKK7N-0001Vz-Kl@mendel.bio.caltech.edu> (David Mathog's
	message of "Wed, 30 Jan 2008 13:05:13 -0800")
References: <E1JKK7N-0001Vz-Kl@mendel.bio.caltech.edu>
Message-ID: <m18x1zec7g.fsf@ebiederm.dsl.xmission.com>

"David Mathog" <mathog at caltech.edu> writes:

> Joe Landman <landman at scalableinformatics.com> wrote:
>> Gilad Shainer wrote:
>> 
>> >> IB for gaming?  I have one ratio: 1e-1/3e-6.  that's human 
>> >> reaction time versus IB latency.
>> >>
>> > 
>> > Oh yes... I guess you did not play for a long time. Did you? Talk
>> > with someone who suffer from lagging and you will get the story, even
>> > When he has a great video card. It's the network and the CPU overhead
>> > that are the cause of this issue 
>> 
>> Er... ah ... yeah.  Milliseconds is typical in FPS games.  hundreds of 
>> ms are bad.  Hundreds of microseconds aren't ... ok, depends upon your 
>> FPS, I am sure the military folks have *really* fun ones which require 
>> that sort of latency.
>
> Many FPS games are still keyboard driven, and the scan rate on the
> keyboard is likely only on the order of 10Hz.  Gaming mice scan position
> a lot faster though, last I looked they were closing in on 10000 data
> points per second. Even so, human reaction time is now, and probably
> will be forever, at the .1 second level, so even if that gaming mouse
> could record 1000 button presses a second, no gamer is ever going to be
> able to push that button at anywhere near that rate.
>
> IB would be massive overkill for gaming, 100 (or even 10) baseT should
> work just fine unless the network is hideously congested, in which case
> the game is probably going to become unplayable due to dropped UDP packets.

Spin it the other way.  Scale your online gaming server cluster using IB,
and you probably have something.

Eric


From rgb at phy.duke.edu  Tue Feb  5 06:53:10 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 5 Feb 2008 09:53:10 -0500 (EST)
Subject: [Beowulf] Re: Cheap SDR IB
In-Reply-To: <m18x1zec7g.fsf@ebiederm.dsl.xmission.com>
References: <E1JKK7N-0001Vz-Kl@mendel.bio.caltech.edu>
	<m18x1zec7g.fsf@ebiederm.dsl.xmission.com>
Message-ID: <Pine.LNX.4.64.0802050942180.4055@cain.rgb.private.net>

On Tue, 5 Feb 2008, Eric W. Biederman wrote:

> Spin it the other way.  Scale your online gaming server cluster using IB,
> and you probably have something.

And they may well do this.  There are a lot of problems in provisioning
online MMRPGs with "Universes" that are shared with HPC clusters and
with HA clusters.  Most of the sane ones spin off the actual rendering
onto the clients, but they are still responsible for managing a huge
inventory of objects as well as all the NPCs, in realtime interaction
with PCs, in a large distributed "space".  In some cases e.g. WoW the
space has some fairly obvious boundaries -- different continents are
plausibly on different servers in a realm cluster, ditto instances,
where there are clear "cuts" when your character is "moved" from one
server to another.  They may even partition continents, but to do that
(and manage a smooth passage across "country" boundaries) they need
bottlenecks to limit traffic and a region of real-time overlap where
characters are maintained (as it were) on both servers.  Here IB or gigE
would be very useful.  It also might let them increase the fineness or
granularity of boundaries, increase the server capacity for handling
large numbers of simultaneous gamers by adding more physical servers
to handle the large numbers of players that can occur in any given
continent or country, and so on.

Actually, MMRPGs are fun both to play and to think about as a cluster
problem.  But the big companies tend to be a bit chary of revealing
their technology, although I have read a few articles on the subject.
It is likely that many of the details of their implementations remain
hidden.

    rgb

>
> Eric
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From eugen at leitl.org  Tue Feb  5 08:13:28 2008
From: eugen at leitl.org (Eugen Leitl)
Date: Tue, 5 Feb 2008 17:13:28 +0100
Subject: [Beowulf] Re: Cheap SDR IB
In-Reply-To: <Pine.LNX.4.64.0802050942180.4055@cain.rgb.private.net>
References: <E1JKK7N-0001Vz-Kl@mendel.bio.caltech.edu>
	<m18x1zec7g.fsf@ebiederm.dsl.xmission.com>
	<Pine.LNX.4.64.0802050942180.4055@cain.rgb.private.net>
Message-ID: <20080205161328.GJ10128@leitl.org>

On Tue, Feb 05, 2008 at 09:53:10AM -0500, Robert G. Brown wrote:

> And they may well do this.  There are a lot of problems in provisioning
> online MMRPGs with "Universes" that are shared with HPC clusters and
> with HA clusters.  Most of the sane ones spin off the actual rendering
> onto the clients, but they are still responsible for managing a huge
> inventory of objects as well as all the NPCs, in realtime interaction
> with PCs, in a large distributed "space".  In some cases e.g. WoW the

The Second Life does the physics server-side. With the given technology,
a region (one virtual server) will become sluggish (and soon herafter 
crash) after some 60-70 avatars frolick in the area.

There's definitely potential for better interconnects and game
clusters (deja vu, we must have discussed this some 5-8 years ago).

> space has some fairly obvious boundaries -- different continents are
> plausibly on different servers in a realm cluster, ditto instances,

SL islands are rectangular boxes (the client used to crash spectacularly
when altitude exceeded a signed short int). The world tesselates trivially
on a 2d or 3rd grid/torus.

> where there are clear "cuts" when your character is "moved" from one
> server to another.  They may even partition continents, but to do that
> (and manage a smooth passage across "country" boundaries) they need
> bottlenecks to limit traffic and a region of real-time overlap where
> characters are maintained (as it were) on both servers.  Here IB or gigE
> would be very useful.  It also might let them increase the fineness or
> granularity of boundaries, increase the server capacity for handling
> large numbers of simultaneous gamers by adding more physical servers
> to handle the large numbers of players that can occur in any given
> continent or country, and so on.

To start with, writing distributed game servers with MPI would be a nice
touch. I'm not aware of any effort which does it. 
 
> Actually, MMRPGs are fun both to play and to think about as a cluster
> problem.  But the big companies tend to be a bit chary of revealing
> their technology, although I have read a few articles on the subject.
> It is likely that many of the details of their implementations remain
> hidden.

-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE


From rgb at phy.duke.edu  Tue Feb  5 08:40:24 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Tue, 5 Feb 2008 11:40:24 -0500 (EST)
Subject: [Beowulf] Re: Cheap SDR IB
In-Reply-To: <20080205161328.GJ10128@leitl.org>
References: <E1JKK7N-0001Vz-Kl@mendel.bio.caltech.edu>
	<m18x1zec7g.fsf@ebiederm.dsl.xmission.com>
	<Pine.LNX.4.64.0802050942180.4055@cain.rgb.private.net>
	<20080205161328.GJ10128@leitl.org>
Message-ID: <Pine.LNX.4.64.0802051132570.4055@cain.rgb.private.net>

On Tue, 5 Feb 2008, Eugen Leitl wrote:

> On Tue, Feb 05, 2008 at 09:53:10AM -0500, Robert G. Brown wrote:
>
>> And they may well do this.  There are a lot of problems in provisioning
>> online MMRPGs with "Universes" that are shared with HPC clusters and
>> with HA clusters.  Most of the sane ones spin off the actual rendering
>> onto the clients, but they are still responsible for managing a huge
>> inventory of objects as well as all the NPCs, in realtime interaction
>> with PCs, in a large distributed "space".  In some cases e.g. WoW the
>
> The Second Life does the physics server-side. With the given technology,
> a region (one virtual server) will become sluggish (and soon herafter
> crash) after some 60-70 avatars frolick in the area.
>
> There's definitely potential for better interconnects and game
> clusters (deja vu, we must have discussed this some 5-8 years ago).

Yeah, and my experiences with 2ndL are highly negatory as a consequence.
It is a bad cluster design.  It does not scale.

>> space has some fairly obvious boundaries -- different continents are
>> plausibly on different servers in a realm cluster, ditto instances,
>
> SL islands are rectangular boxes (the client used to crash spectacularly
> when altitude exceeded a signed short int). The world tesselates trivially
> on a 2d or 3rd grid/torus.

SL needs to adopt some of the technologies of other MMRPGs -- the ones
that work.  It makes the result more complex on the client side -- one
has to update WoW every six months or so with new textures, maps, and
display side bugfixes -- but it scales much, much better on the server
side and is much less bottlenecked at the client side network (which may
be "only" DSL).  SL is a resource hog all around.

I note that people are most impressed with it if they've never hung out
in one of the well-designed, scalable worlds of WoW.  Any player of WoW
would laugh and cry to see how primitive, slow, clumsy it is.  It has a
good idea -- the ability of users to create objects and add them to the
environment -- but it needs a much better algorithm for managing the
construction process and an object-oriented look-ahead synchronization
process that reduces the bottlenecks to something endurable.

   rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From wrankin at ee.duke.edu  Tue Feb  5 09:21:53 2008
From: wrankin at ee.duke.edu (Bill Rankin)
Date: Tue, 5 Feb 2008 12:21:53 -0500
Subject: [Beowulf] Re: Cheap SDR IB
References: <D1E92FEF-581E-4563-AA93-0921D536E418@ee.duke.edu>
Message-ID: <4295A7EF-1D80-437D-AF11-4AF57958437D@ee.duke.edu>


>>
>> There's definitely potential for better interconnects and game
>> clusters (deja vu, we must have discussed this some 5-8 years ago).
>
> Yeah, and my experiences with 2ndL are highly negatory as a  
> consequence.
> It is a bad cluster design.  It does not scale.

I have not puttered around in SL for a while, but IIRC one of the  
"problems" is that SL allows the user to create their own fairly  
complex physical models and devices which is computationally  
restrictive when modeled on the server side and also bandwidth  
restricted when pushing the models out to the client.

WoW, OTOH heavily restrict user customization which saves both server  
cycles as well as bandwidth.  This does heavily restrict the user  
experience (which is one of the strengths of SL) but pays back in  
responsiveness.

> SL needs to adopt some of the technologies of other MMRPGs -- the ones
> that work.  It makes the result more complex on the client side -- one
> has to update WoW every six months or so with new textures, maps, and
> display side bugfixes

Actually, SL was going through a frequent update period for quite a  
while and I don't think it was any better than Wow in that respect.


-b


From rgb at phy.duke.edu  Wed Feb  6 07:40:30 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 6 Feb 2008 10:40:30 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
Message-ID: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>

Anybody on list have any idea why PVM fails to add hosts over a wireless
link?  I've now tried this over multiple distro version and at least one
PVM update, and it just doesn't work.  Works fine over a wire, fails on
wireless, and as far as I know wire and wireless are both "identical"
at the kernel interface layer so that any e.g. socket one might open is
absolutely ecumenical about what the underlying hardware is (good old
ISO/OSI layering, right?).

And yes, I'm well aware that from a latency/bw point of view this
arrangement isn't going to be a speed demon or scale terribly well, but
for testing PVM from a laptop or writing code from a laptop or just
playing with PVM itself for fun or profit from a laptop it would
certainly be lovely if it WORKED, however poorly as far as IPCs are
concerned.

Yup, tried it one last time.  Locks it right up it does, have to kill
pvm[d] by hand and hand-remove the lockfiles, just like I did two or
three years ago...

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From peter.st.john at gmail.com  Wed Feb  6 08:34:14 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Wed, 6 Feb 2008 11:34:14 -0500
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
Message-ID: <e4d4fd070802060834s1bf62b54i906945ba66184cd9@mail.gmail.com>

RGB,
Are you using 3.4.5 re "improved use on Beowulf..."? I was thinking along
the lines of the script invoking PVM doing something to reboot or refresh
the network, and saw "New features in PVM 3.4.x include communication
contexts...". I'd be happy to read a perl or Cish thing if an extra pair of
eyes might notice something, but I don't know where to start.
YMHS Peter

On Feb 6, 2008 10:40 AM, Robert G. Brown <rgb at phy.duke.edu> wrote:

> Anybody on list have any idea why PVM fails to add hosts over a wireless
> link?  I've now tried this over multiple distro version and at least one
> PVM update, and it just doesn't work.  Works fine over a wire, fails on
> wireless, and as far as I know wire and wireless are both "identical"
> at the kernel interface layer so that any e.g. socket one might open is
> absolutely ecumenical about what the underlying hardware is (good old
> ISO/OSI layering, right?).
>
> And yes, I'm well aware that from a latency/bw point of view this
> arrangement isn't going to be a speed demon or scale terribly well, but
> for testing PVM from a laptop or writing code from a laptop or just
> playing with PVM itself for fun or profit from a laptop it would
> certainly be lovely if it WORKED, however poorly as far as IPCs are
> concerned.
>
> Yup, tried it one last time.  Locks it right up it does, have to kill
> pvm[d] by hand and hand-remove the lockfiles, just like I did two or
> three years ago...
>
>    rgb
>
> --
> Robert G. Brown                            Phone(cell): 1-919-280-8443
> Duke University Physics Dept, Box 90305
> Durham, N.C. 27708-0305
> Web: http://www.phy.duke.edu/~rgb
> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080206/6dfed82c/attachment.html>

From wrankin at ee.duke.edu  Wed Feb  6 09:33:11 2008
From: wrankin at ee.duke.edu (Bill Rankin)
Date: Wed, 6 Feb 2008 12:33:11 -0500
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
Message-ID: <02A63D14-3E34-4C0E-A012-D491922AC023@ee.duke.edu>

Hey Rob,

Could it be a node naming issue where the wireless IP does not  
resolve to the same address as that used in the machinefile?  I seem  
to recall a similar issue back when we PVM on machines with multiple  
network connections.

Just a thought,

-bill


On Feb 6, 2008, at 10:40 AM, Robert G. Brown wrote:

> Anybody on list have any idea why PVM fails to add hosts over a  
> wireless
> link?  I've now tried this over multiple distro version and at  
> least one
> PVM update, and it just doesn't work.  Works fine over a wire,  
> fails on
> wireless, and as far as I know wire and wireless are both "identical"
> at the kernel interface layer so that any e.g. socket one might  
> open is
> absolutely ecumenical about what the underlying hardware is (good old
> ISO/OSI layering, right?).
>


From reuti at staff.uni-marburg.de  Wed Feb  6 09:33:47 2008
From: reuti at staff.uni-marburg.de (Reuti)
Date: Wed, 6 Feb 2008 18:33:47 +0100
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
Message-ID: <ADCA3651-21A3-437E-9855-4170258D2B37@staff.uni-marburg.de>

Hi,

Am 06.02.2008 um 16:40 schrieb Robert G. Brown:

> Anybody on list have any idea why PVM fails to add hosts over a  
> wireless
> link?  I've now tried this over multiple distro version and at  
> least one
> PVM update, and it just doesn't work.  Works fine over a wire,  
> fails on
> wireless, and as far as I know wire and wireless are both "identical"
> at the kernel interface layer so that any e.g. socket one might  
> open is
> absolutely ecumenical about what the underlying hardware is (good old
> ISO/OSI layering, right?).
>
> And yes, I'm well aware that from a latency/bw point of view this
> arrangement isn't going to be a speed demon or scale terribly well,  
> but
> for testing PVM from a laptop or writing code from a laptop or just
> playing with PVM itself for fun or profit from a laptop it would
> certainly be lovely if it WORKED, however poorly as far as IPCs are
> concerned.
>
> Yup, tried it one last time.  Locks it right up it does, have to kill
> pvm[d] by hand and hand-remove the lockfiles, just like I did two or
> three years ago...

is the wireless one the primary interface? Maybe a mismatch which  
hostname is used for which interface? Using wireless could be similar  
like using a secondary interface.

-- Reuti


>    rgb
>
> -- 
> Robert G. Brown                            Phone(cell): 1-919-280-8443
> Duke University Physics Dept, Box 90305
> Durham, N.C. 27708-0305
> Web: http://www.phy.duke.edu/~rgb
> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Wed Feb  6 10:21:55 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 6 Feb 2008 13:21:55 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <02A63D14-3E34-4C0E-A012-D491922AC023@ee.duke.edu>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
	<02A63D14-3E34-4C0E-A012-D491922AC023@ee.duke.edu>
Message-ID: <Pine.LNX.4.64.0802061312001.20835@cain.rgb.private.net>

On Wed, 6 Feb 2008, Bill Rankin wrote:

> Hey Rob,
>
> Could it be a node naming issue where the wireless IP does not resolve to the 
> same address as that used in the machinefile?  I seem to recall a similar 
> issue back when we PVM on machines with multiple network connections.

pvmd is actually starting up on the target machine -- it works that far.
The master node IP number is correct, as is the slave IP number (both
visible as arguments to pvmd).  The name I'm using is the one associated
with the wireless interface in question, both machines ping in all four
directions by name with the correct internet address.  All my machines
are configured more or less identically, use the same environment
variables, support transparent ssh command execution (which obviously
works even in PVM as the daemon is being spawned on the correct target).

The wireless interfaces have the right MTU and look exactly like the
ethernet devices they in fact are to the kernel AFAIK.  In every other
aspect I've ever tested, including my own homemade socket code, response
to both tcp and udp daemons, ability to mount NFS, support ssh, and so
on and so forth, they behave like TCP/IP sockets over ethernet devices
as far as systems calls go -- they use the same interface, and the whole
point of OSI/ISO is that code should not depend on the hardware layer
and in general on even a roughly posix compliant machine using standard
devices and e.g. the socket API it doesn't.

Last time I encountered this, I actually cranked up the -d0x0 stuff and
"watched" as the system went through to where it hung in the middle of
doing some part of the post-spawn handshaking.

I suspect a race condition, probably caused by using raw UDP with some
assumption of latency during the handshake.  The one way I can think of
that the two connections differ is in their latency -- even the
bandwidth of wireless is every bit as great as 10B2 networks I've run
PVM on in years past (on proportionally slower CPUs, of course).  If the
master or slave send out an acknowledgement packet either before the
window where the other can receive it or after it has grown bored and
stopped listening, it might fail to properly bind or something.  It
seems like it would be a bug, not a feature, but if I were feeling
infinitely masochistic and were to wander down into Other People's
Source (ouch!) to try to debug this, that's what I'd look for first.

Any PVM developers still on list?  Any comments from them?

    rgb

>
> Just a thought,
>
> -bill
>
>
> On Feb 6, 2008, at 10:40 AM, Robert G. Brown wrote:
>
>> Anybody on list have any idea why PVM fails to add hosts over a wireless
>> link?  I've now tried this over multiple distro version and at least one
>> PVM update, and it just doesn't work.  Works fine over a wire, fails on
>> wireless, and as far as I know wire and wireless are both "identical"
>> at the kernel interface layer so that any e.g. socket one might open is
>> absolutely ecumenical about what the underlying hardware is (good old
>> ISO/OSI layering, right?).
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From dnlombar at ichips.intel.com  Wed Feb  6 11:06:08 2008
From: dnlombar at ichips.intel.com (Lombard, David N)
Date: Wed, 6 Feb 2008 11:06:08 -0800
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
Message-ID: <20080206190608.GA6306@nlxdcldnl2.cl.intel.com>

On Wed, Feb 06, 2008 at 10:40:30AM -0500, Robert G. Brown wrote:
> Anybody on list have any idea why PVM fails to add hosts over a wireless
> link?  I've now tried this over multiple distro version and at least one
> PVM update, and it just doesn't work.  Works fine over a wire, fails on
> wireless, and as far as I know wire and wireless are both "identical"
> at the kernel interface layer so that any e.g. socket one might open is
> absolutely ecumenical about what the underlying hardware is (good old
> ISO/OSI layering, right?).

What is the device name?  Perhaps PVM doesn't like the name?
Are you running multiple devices?
Does the system set its node name or is some odd name provided by DHCP?
Other name resolution problems?

> And yes, I'm well aware that from a latency/bw point of view this
> arrangement isn't going to be a speed demon or scale terribly well, but
> for testing PVM from a laptop or writing code from a laptop or just
> playing with PVM itself for fun or profit from a laptop it would
> certainly be lovely if it WORKED, however poorly as far as IPCs are
> concerned.

I've run multiple VMware instances on my *Linux* laptop back in the day
when I did OSCAR development and Rocks evals.

-- 
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.


From James.P.Lux at jpl.nasa.gov  Wed Feb  6 11:26:35 2008
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Wed, 06 Feb 2008 11:26:35 -0800
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
Message-ID: <6.2.3.4.2.20080206110810.02e03408@mail.jpl.nasa.gov>

At 07:40 AM 2/6/2008, Robert G. Brown wrote:
>Anybody on list have any idea why PVM fails to add hosts over a wireless
>link?  I've now tried this over multiple distro version and at least one
>PVM update, and it just doesn't work.  Works fine over a wire, fails on
>wireless, and as far as I know wire and wireless are both "identical"
>at the kernel interface layer so that any e.g. socket one might open is
>absolutely ecumenical about what the underlying hardware is (good old
>ISO/OSI layering, right?).
>
>And yes, I'm well aware that from a latency/bw point of view this
>arrangement isn't going to be a speed demon or scale terribly well, but
>for testing PVM from a laptop or writing code from a laptop or just
>playing with PVM itself for fun or profit from a laptop it would
>certainly be lovely if it WORKED, however poorly as far as IPCs are
>concerned.


You brave man.. trying to do what is trivial in a wired network with 
wireless stuff.

I would look for timing assumptions that aren't met in the wireless 
environment. There's a channel capacity issue, of course, but there's 
also some constraints on round trip messages, particularly if you've 
got a "infrastructure" network as opposed to "ad-hoc".  A packet from 
A to B has to go from A to Access Point( AP), which takes some back 
and forth handshaking and protocol overhead.  Then, it gets sent from 
AP to B, with more back and forth.  Don't expect 1 ms ping times...

I spent quite a while getting NTP (which I thought would be trivial.. 
it explicitly handles long delays and intermittent connections) to 
work in a 802.11a network, complicated by the fact that I was using 
Access Points (in a "point to multipoint" configuration) as the 
interfaces, so the computers actually had a wired ethernet connection 
through a dumb 5 port switch, to the wireless AP. Getting PXE and 
DHCP to work was trivial by comparison

Lots of weird things happen in these systems because there are hidden 
assumptions about timing and whether a path exists between two points.

Jim


From reuti at staff.uni-marburg.de  Wed Feb  6 11:52:09 2008
From: reuti at staff.uni-marburg.de (Reuti)
Date: Wed, 6 Feb 2008 20:52:09 +0100
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802061312001.20835@cain.rgb.private.net>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
	<02A63D14-3E34-4C0E-A012-D491922AC023@ee.duke.edu>
	<Pine.LNX.4.64.0802061312001.20835@cain.rgb.private.net>
Message-ID: <220FE1C2-C27A-4B94-8060-D4D78DFCF50A@staff.uni-marburg.de>

Am 06.02.2008 um 19:21 schrieb Robert G. Brown:

> On Wed, 6 Feb 2008, Bill Rankin wrote:
>
>> Hey Rob,
>>
>> Could it be a node naming issue where the wireless IP does not  
>> resolve to the same address as that used in the machinefile?  I  
>> seem to recall a similar issue back when we PVM on machines with  
>> multiple network connections.
>
> pvmd is actually starting up on the target machine -- it works that  
> far.
> The master node IP number is correct, as is the slave IP number (both
> visible as arguments to pvmd).  The name I'm using is the one  
> associated
> with the wireless interface in question, both machines ping in all  
> four
> directions by name with the correct internet address.  All my machines
> are configured more or less identically, use the same environment
> variables, support transparent ssh command execution (which obviously
> works even in PVM as the daemon is being spawned on the correct  
> target).
>
> The wireless interfaces have the right MTU and look exactly like the
> ethernet devices they in fact are to the kernel AFAIK.  In every other
> aspect I've ever tested, including my own homemade socket code,  
> response
> to both tcp and udp daemons, ability to mount NFS, support ssh, and so
> on and so forth, they behave like TCP/IP sockets over ethernet devices
> as far as systems calls go -- they use the same interface, and the  
> whole
> point of OSI/ISO is that code should not depend on the hardware layer
> and in general on even a roughly posix compliant machine using  
> standard
> devices and e.g. the socket API it doesn't.
>
> Last time I encountered this, I actually cranked up the -d0x0 stuff  
> and
> "watched" as the system went through to where it hung in the middle of
> doing some part of the post-spawn handshaking.

Just an idea to check: PVM can also be started without rsh/ssh  
between the machines. You have to copy and paste some things from  
here to there and back and can startup all daemons this way by hand  
(page 30 in the PVM book). Maybe this works - just to narrow the cause.

-- Reuti


> I suspect a race condition, probably caused by using raw UDP with some
> assumption of latency during the handshake.  The one way I can  
> think of
> that the two connections differ is in their latency -- even the
> bandwidth of wireless is every bit as great as 10B2 networks I've run
> PVM on in years past (on proportionally slower CPUs, of course).   
> If the
> master or slave send out an acknowledgement packet either before the
> window where the other can receive it or after it has grown bored and
> stopped listening, it might fail to properly bind or something.  It
> seems like it would be a bug, not a feature, but if I were feeling
> infinitely masochistic and were to wander down into Other People's
> Source (ouch!) to try to debug this, that's what I'd look for first.
>
> Any PVM developers still on list?  Any comments from them?
>
>    rgb
>
>>
>> Just a thought,
>>
>> -bill
>>
>>
>> On Feb 6, 2008, at 10:40 AM, Robert G. Brown wrote:
>>
>>> Anybody on list have any idea why PVM fails to add hosts over a  
>>> wireless
>>> link?  I've now tried this over multiple distro version and at  
>>> least one
>>> PVM update, and it just doesn't work.  Works fine over a wire,  
>>> fails on
>>> wireless, and as far as I know wire and wireless are both  
>>> "identical"
>>> at the kernel interface layer so that any e.g. socket one might  
>>> open is
>>> absolutely ecumenical about what the underlying hardware is (good  
>>> old
>>> ISO/OSI layering, right?).
>>
>
> -- 
> Robert G. Brown                            Phone(cell): 1-919-280-8443
> Duke University Physics Dept, Box 90305
> Durham, N.C. 27708-0305
> Web: http://www.phy.duke.edu/~rgb
> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf


From mathog at caltech.edu  Wed Feb  6 13:01:22 2008
From: mathog at caltech.edu (David Mathog)
Date: Wed, 06 Feb 2008 13:01:22 -0800
Subject: [Beowulf] Re: PVM on wireless...
Message-ID: <E1JMrOU-00062O-A3@mendel.bio.caltech.edu>


> Anybody on list have any idea why PVM fails to add hosts over a wireless
> link?  I've now tried this over multiple distro version and at least one
> PVM update, and it just doesn't work.  Works fine over a wire, fails on
> wireless, and as far as I know wire and wireless are both "identical"
> at the kernel interface layer so that any e.g. socket one might open is
> absolutely ecumenical about what the underlying hardware is (good old
> ISO/OSI layering, right?).

Sounds like multiple network hell, with some type of name mismatch
causing the problems.  Start up pvmd directly on one of the wireless
machines and then use pvm to see what it calls itself.  If that
differs in any way from the entries in your host list then that is
probably the problem.  If they come up the same then run -d settings on
pvmd to find out more information.

It is also possible the firewall settings are different, and the wired
interface allows pvm connections in some way that the wireless does not.

Did you try starting pvmd on a pure wireless machine and see if it can
connect to other pure wireless machines?  It would be good to get the
wired interfaces completely out of the equation.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From rgb at phy.duke.edu  Wed Feb  6 13:24:06 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 6 Feb 2008 16:24:06 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <6.2.3.4.2.20080206110810.02e03408@mail.jpl.nasa.gov>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
	<6.2.3.4.2.20080206110810.02e03408@mail.jpl.nasa.gov>
Message-ID: <Pine.LNX.4.64.0802061619550.22237@cain.rgb.private.net>

On Wed, 6 Feb 2008, Jim Lux wrote:

> You brave man.. trying to do what is trivial in a wired network with wireless 
> stuff.
>
> I would look for timing assumptions that aren't met in the wireless 
> environment. There's a channel capacity issue, of course, but there's also 
> some constraints on round trip messages, particularly if you've got a 
> "infrastructure" network as opposed to "ad-hoc".  A packet from A to B has to 
> go from A to Access Point( AP), which takes some back and forth handshaking 
> and protocol overhead.  Then, it gets sent from AP to B, with more back and 
> forth.  Don't expect 1 ms ping times...
>
> I spent quite a while getting NTP (which I thought would be trivial.. it 
> explicitly handles long delays and intermittent connections) to work in a 
> 802.11a network, complicated by the fact that I was using Access Points (in a 
> "point to multipoint" configuration) as the interfaces, so the computers 
> actually had a wired ethernet connection through a dumb 5 port switch, to the 
> wireless AP. Getting PXE and DHCP to work was trivial by comparison
>
> Lots of weird things happen in these systems because there are hidden 
> assumptions about timing and whether a path exists between two points.

This is what I think that it probably is -- a race condition of some
sort caused by a timing assumption, almost certainly of UDP packets as
TCP should be robust.  I should look at whether PVM can be built on top
of just TCP these days.  It used to be UDP "for efficiency" but that
always means that you have to code your own reliability, packet
reordering and so on into the connection, usually leaving some things
OUT or you'll end up re-implementing TCP anyway, probably badly.  I
could be bitten by something left out that is causing certain packet
sequences to arrive out of (presumed) order and have the master waiting
forever for a packet that already came and was dropped.

But only a look at the raw code (or setting a tcp-only flag at build
time, if there is such a thing) will tell me for sure.

    rgb

>
> Jim
>
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From rgb at phy.duke.edu  Wed Feb  6 13:28:24 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 6 Feb 2008 16:28:24 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <220FE1C2-C27A-4B94-8060-D4D78DFCF50A@staff.uni-marburg.de>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
	<02A63D14-3E34-4C0E-A012-D491922AC023@ee.duke.edu>
	<Pine.LNX.4.64.0802061312001.20835@cain.rgb.private.net>
	<220FE1C2-C27A-4B94-8060-D4D78DFCF50A@staff.uni-marburg.de>
Message-ID: <Pine.LNX.4.64.0802061624510.22237@cain.rgb.private.net>

On Wed, 6 Feb 2008, Reuti wrote:

> Just an idea to check: PVM can also be started without rsh/ssh between the 
> machines. You have to copy and paste some things from here to there and back 
> and can startup all daemons this way by hand (page 30 in the PVM book). Maybe 
> this works - just to narrow the cause.

I'll look into this, thanks, although the daemon IS started -- the block
it is somewhere after that.  But it is well worth trying anyway.

I also wonder about ports and WAP interactions.  I've got my WAP
configured (AFAICT) as an internal switch, not really as a router.  As
in my laptop get DCHP service from my linux server, not the WAP, which
is flat to broadcasts, has no port filtering on the internal network
etc.

I even ran tcpdump on the problem last time it happened -- maybe I
should try it again.

    rgb

>
> -- Reuti
>
>
>> I suspect a race condition, probably caused by using raw UDP with some
>> assumption of latency during the handshake.  The one way I can think of
>> that the two connections differ is in their latency -- even the
>> bandwidth of wireless is every bit as great as 10B2 networks I've run
>> PVM on in years past (on proportionally slower CPUs, of course).  If the
>> master or slave send out an acknowledgement packet either before the
>> window where the other can receive it or after it has grown bored and
>> stopped listening, it might fail to properly bind or something.  It
>> seems like it would be a bug, not a feature, but if I were feeling
>> infinitely masochistic and were to wander down into Other People's
>> Source (ouch!) to try to debug this, that's what I'd look for first.
>> 
>> Any PVM developers still on list?  Any comments from them?
>>
>>   rgb
>> 
>>> 
>>> Just a thought,
>>> 
>>> -bill
>>> 
>>> 
>>> On Feb 6, 2008, at 10:40 AM, Robert G. Brown wrote:
>>> 
>>>> Anybody on list have any idea why PVM fails to add hosts over a wireless
>>>> link?  I've now tried this over multiple distro version and at least one
>>>> PVM update, and it just doesn't work.  Works fine over a wire, fails on
>>>> wireless, and as far as I know wire and wireless are both "identical"
>>>> at the kernel interface layer so that any e.g. socket one might open is
>>>> absolutely ecumenical about what the underlying hardware is (good old
>>>> ISO/OSI layering, right?).
>>> 
>> 
>> -- 
>> Robert G. Brown                            Phone(cell): 1-919-280-8443
>> Duke University Physics Dept, Box 90305
>> Durham, N.C. 27708-0305
>> Web: http://www.phy.duke.edu/~rgb
>> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From rgb at phy.duke.edu  Wed Feb  6 13:42:04 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 6 Feb 2008 16:42:04 -0500 (EST)
Subject: [Beowulf] Re: PVM on wireless...
In-Reply-To: <E1JMrOU-00062O-A3@mendel.bio.caltech.edu>
References: <E1JMrOU-00062O-A3@mendel.bio.caltech.edu>
Message-ID: <Pine.LNX.4.64.0802061629170.22237@cain.rgb.private.net>

On Wed, 6 Feb 2008, David Mathog wrote:

>
>> Anybody on list have any idea why PVM fails to add hosts over a wireless
>> link?  I've now tried this over multiple distro version and at least one
>> PVM update, and it just doesn't work.  Works fine over a wire, fails on
>> wireless, and as far as I know wire and wireless are both "identical"
>> at the kernel interface layer so that any e.g. socket one might open is
>> absolutely ecumenical about what the underlying hardware is (good old
>> ISO/OSI layering, right?).
>
> Sounds like multiple network hell, with some type of name mismatch
> causing the problems.  Start up pvmd directly on one of the wireless
> machines and then use pvm to see what it calls itself.  If that
> differs in any way from the entries in your host list then that is
> probably the problem.  If they come up the same then run -d settings on
> pvmd to find out more information.
>
> It is also possible the firewall settings are different, and the wired
> interface allows pvm connections in some way that the wireless does not.
>
> Did you try starting pvmd on a pure wireless machine and see if it can
> connect to other pure wireless machines?  It would be good to get the
> wired interfaces completely out of the equation.

Any connection with wireless on at least one end fails.  Or if you like,
only wire-to-wire succeeds.

And I HAVE been doing TCP/IP sysadmin for about twenty-one years now,
pro-grade linux for twelve-plus.  I really don't think that there is
much of a chance left that there is any trivial networking error
underlying this, as of course I've checked this pretty carefully (in two
completely different instances, with significant changes to my home
network -- different primary server, different WAP, different wireless
cards, different laptops and as I said, the mapping between IP number
and slave pvmd is exactly correct as are all host table entries, ping
works by name or IP to the same IP(s), ssh works by name or IP, http
works ditto, wulfware works ditto (and shows both interfaces), NM
manages wireless now while then I did it by hand, the kernels 2.4 vs 2.6
different, yet the symptoms are exactly the same.  It works to a point
just half-way through the handshaking and then, AFTER the remote daemon
is successfully spawned with the right lockfiles and IP numbers visible
to ps with ww, it freezes until something times out, then it fails while
claiming that it succeeded in adding the remote host.

I can literally snap the same box onto a wire, wait for it to get an IP
number on the wire, and rerun the experiment on the same hardware and it
works perfectly (with a different but identically entered name, of
course).  And it is the wireless name that corresponds with the
`hostname` (in /etc/sysconfig/network), not that this should matter (and
it doesn't on the wire).

That's not to say that I can't make a mistake -- only that I've checked
all the really obvious ones and EVERYTHING ELSE works perfectly and
universally independent of wire vs wired.  I snap in a wire in my
office, snap it off the wire and onto wireless, and back again, back and
forth home to office many times per boot.  After about ten days of this
NM will sometimes destabilize as maybe the wireless card fails to hold
state perfectly, but in the meantime every network-using tool BUT pvm
just works, exactly as one expects.

    rgb

>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From James.P.Lux at jpl.nasa.gov  Wed Feb  6 14:16:13 2008
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Wed, 06 Feb 2008 14:16:13 -0800
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802061624510.22237@cain.rgb.private.net>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
	<02A63D14-3E34-4C0E-A012-D491922AC023@ee.duke.edu>
	<Pine.LNX.4.64.0802061312001.20835@cain.rgb.private.net>
	<220FE1C2-C27A-4B94-8060-D4D78DFCF50A@staff.uni-marburg.de>
	<Pine.LNX.4.64.0802061624510.22237@cain.rgb.private.net>
Message-ID: <6.2.3.4.2.20080206140052.032df040@mail.jpl.nasa.gov>

At 01:28 PM 2/6/2008, Robert G. Brown wrote:
>On Wed, 6 Feb 2008, Reuti wrote:
>
>>Just an idea to check: PVM can also be started without rsh/ssh 
>>between the machines. You have to copy and paste some things from 
>>here to there and back and can startup all daemons this way by hand 
>>(page 30 in the PVM book). Maybe this works - just to narrow the cause.
>
>I'll look into this, thanks, although the daemon IS started -- the block
>it is somewhere after that.  But it is well worth trying anyway.
>
>I also wonder about ports and WAP interactions.  I've got my WAP
>configured (AFAICT) as an internal switch, not really as a router.  As
>in my laptop get DCHP service from my linux server, not the WAP, which
>is flat to broadcasts, has no port filtering on the internal network
>etc.

Ahh.. but there is a "routing" process of sorts inside your AP.  It 
has to bridge from the 802.3 wired world to the 802.11 wireless 
world, and that usually involves some store and forward type 
processing.  Some of these are implemented as a store and forward 
router (e.g. home firewall) with one of the logical ports connected 
to the wireless modem.  Very, very few access points (if any) are 
actually a dumb packet oriented bridge that just unwraps the payload 
from one frame type and shoots it out rewrapped for the other.  The 
AP has to do things like send out broadcast frames with the SSID, 
send and receive the link setup/teardown kinds of frames (i.e. the 
link between your PC's wireless interface and the AP), as well as 
bridging/routing traffic from the wired network to the wireless network.

Who's to say what kind of logic they have inside there to deal with 
all the issues (the wireless MAC and the wired MAC are different, if 
nothing else.)

Jim Lux


From kohlja at ornl.gov  Wed Feb  6 15:13:28 2008
From: kohlja at ornl.gov (kohlja at ornl.gov)
Date: Wed, 6 Feb 2008 18:13:28 -0500
Subject: [Beowulf] PVM on wireless...
Message-ID: <20080206231328.GA1249@neo.csm.ornl.gov>

Hey Gang!

Sounds like you're having some "fun" with PVM over wireless...?  :-)

(A buddy (Wael Elwasif) forwarded your discussion to me;
please always feel free to copy "pvm at msr.csm.ornl.gov"
with PVM inquiries when you get stuck.  I try to be
pretty responsive, though this is all unfunded work now... :)

So, the master's network interface/IP selection was my first
guess, too, after reading about your situation, but this
email below would seem to eliminate that possibility...

Just to be sure though, I assume you're starting PVM
on the master host with the "-nfoo" host name argument,
to choose the desired network interface/IP address,
and that the /tmp/pvml.<uid> log file on the master
reflects/verifies this IP...?  :)

Are there any error messages in the PVM log files
on either the master or the slave machines...?

(Btw, which $PVM_ARCH are we talking about here,
"LINUX" or "BEOLIN"...? :)

There are a few weird scenarios under which PVM will
quietly drop or ignore packets coming from the slave
daemons, when the IP doesn't appear to match what
the master expects... ("to serve you better" and
protect against external intrusions, ha ha ha... :)

As far as timing out/latency, which was another line
of your discussion I read through, I don't _think_ PVM
cares about the fine-grained latency that you're talking
about, between wireless and wired...

The daemons are on a nice long timeout, like 3 _minutes_
before they assume something died...

And for startup, the master doesn't strictly "wait" for
the slaves to connect, it merely provides them with the
proper socket address for where to connect themselves up...
(hence the option you've mentioned about manually starting
a slave daemon, and having it just connect up to the master)

So what about firewalls or blocked ports...?

Does the wireless network leave the PVM ports open?
(The port number is chosen at random by the system,
unless the "$PVMNETSOCKPORT" environment variable
is set with a starting port number for the desired
range...)

Anything in the master's regular system logs
(or the slave's PVM log file) about "Connection
Refused"...?

Just an idear.  Please lemme know if this is all
still a dead end.

(And send along any error messages from the PVM logs...! :-)

Good Luck & "Long Live PVM"...!  :)

	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
	(a.k.a. Jim Kohl, kohlja at ornl.gov :)

  > From: "Robert G. Brown" <rgb at phy.duke.edu>
  > Date: Wed, 6 Feb 2008 13:21:55 -0500 (EST)
  > Subject: Re: [Beowulf] PVM on wireless...
  > To: Bill Rankin <wrankin at ee.duke.edu>
  > Cc: Beowulf Mailing List <beowulf at beowulf.org>
  > Message-ID: <Pine.LNX.4.64.0802061312001.20835 at cain.rgb.private.net>
  > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
  > 
  > On Wed, 6 Feb 2008, Bill Rankin wrote:
  > 
  > > Hey Rob,
  > >
  > > Could it be a node naming issue where the wireless IP does not resolve
  > > to
  > > the same address as that used in the machinefile?  I seem to recall a
  > > similar issue back when we PVM on machines with multiple network
  > > connections.
  > 
  > pvmd is actually starting up on the target machine -- it works that far.
  > The master node IP number is correct, as is the slave IP number (both
  > visible as arguments to pvmd).  The name I'm using is the one associated
  > with the wireless interface in question, both machines ping in all four
  > directions by name with the correct internet address.  All my machines
  > are configured more or less identically, use the same environment
  > variables, support transparent ssh command execution (which obviously
  > works even in PVM as the daemon is being spawned on the correct target).
  > 
  > The wireless interfaces have the right MTU and look exactly like the
  > ethernet devices they in fact are to the kernel AFAIK.  In every other
  > aspect I've ever tested, including my own homemade socket code, response
  > to both tcp and udp daemons, ability to mount NFS, support ssh, and so
  > on and so forth, they behave like TCP/IP sockets over ethernet devices
  > as far as systems calls go -- they use the same interface, and the whole
  > point of OSI/ISO is that code should not depend on the hardware layer
  > and in general on even a roughly posix compliant machine using standard
  > devices and e.g. the socket API it doesn't.
  > 
  > Last time I encountered this, I actually cranked up the -d0x0 stuff and
  > "watched" as the system went through to where it hung in the middle of
  > doing some part of the post-spawn handshaking.
  > 
  > I suspect a race condition, probably caused by using raw UDP with some
  > assumption of latency during the handshake.  The one way I can think of
  > that the two connections differ is in their latency -- even the
  > bandwidth of wireless is every bit as great as 10B2 networks I've run
  > PVM on in years past (on proportionally slower CPUs, of course).  If the
  > master or slave send out an acknowledgement packet either before the
  > window where the other can receive it or after it has grown bored and
  > stopped listening, it might fail to properly bind or something.  It
  > seems like it would be a bug, not a feature, but if I were feeling
  > infinitely masochistic and were to wander down into Other People's
  > Source (ouch!) to try to debug this, that's what I'd look for first.
  > 
  > Any PVM developers still on list?  Any comments from them?
  > 
  >     rgb

(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:

   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  They
   Oak Ridge National Laboratory              still owe you money, Fool!"
   kohlja at ornl.gov
   http://www.csm.ornl.gov/~kohl/          Long Live Curtis Blues!!!

:):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)


From wrankin at ee.duke.edu  Wed Feb  6 15:18:27 2008
From: wrankin at ee.duke.edu (Bill Rankin)
Date: Wed, 6 Feb 2008 18:18:27 -0500
Subject: [Beowulf] Re: PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802061629170.22237@cain.rgb.private.net>
References: <E1JMrOU-00062O-A3@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802061629170.22237@cain.rgb.private.net>
Message-ID: <3622EF04-4040-4EBD-AF07-0BF4D1CAB8AD@ee.duke.edu>

I have a home setup similar to yours - a WAP acting as a firewall,  
dhcp from a linux server.  I have a spare laptop running CentOS, so  
I'll give it a check tonight to see if mine runs.

Q1: to you have DHCP giving a static address to your laptop based  
upon it's MAC?

Q2: have you tried this with the PVM 3.4.4 RPMs (I think you  
mentioned you were running 3.4.5)?

-b

>
> I can literally snap the same box onto a wire, wait for it to get  
> an IP
> number on the wire, and rerun the experiment on the same hardware  
> and it
> works perfectly (with a different but identically entered name, of
> course).  And it is the wireless name that corresponds with the
> `hostname` (in /etc/sysconfig/network), not that this should matter  
> (and
> it doesn't on the wire).
>


From reuti at staff.uni-marburg.de  Wed Feb  6 16:39:58 2008
From: reuti at staff.uni-marburg.de (Reuti)
Date: Thu, 7 Feb 2008 01:39:58 +0100
Subject: [Beowulf] Re: PVM on wireless...
In-Reply-To: <3622EF04-4040-4EBD-AF07-0BF4D1CAB8AD@ee.duke.edu>
References: <E1JMrOU-00062O-A3@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802061629170.22237@cain.rgb.private.net>
	<3622EF04-4040-4EBD-AF07-0BF4D1CAB8AD@ee.duke.edu>
Message-ID: <42BC3720-4E9C-43D9-9F9D-2D82F0590D9D@staff.uni-marburg.de>

Am 07.02.2008 um 00:18 schrieb Bill Rankin:

> I have a home setup similar to yours - a WAP acting as a firewall,  
> dhcp from a linux server.  I have a spare laptop running CentOS, so  
> I'll give it a check tonight to see if mine runs.
>
> Q1: to you have DHCP giving a static address to your laptop based  
> upon it's MAC?
>
> Q2: have you tried this with the PVM 3.4.4 RPMs (I think you  
> mentioned you were running 3.4.5)?

There are even newer patches:

http://www.csm.ornl.gov/~kohl/PVM/pvm3.4.5+9.tar.Z

-- Reuti


> -b
>
>>
>> I can literally snap the same box onto a wire, wait for it to get  
>> an IP
>> number on the wire, and rerun the experiment on the same hardware  
>> and it
>> works perfectly (with a different but identically entered name, of
>> course).  And it is the wireless name that corresponds with the
>> `hostname` (in /etc/sysconfig/network), not that this should  
>> matter (and
>> it doesn't on the wire).
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb  7 08:08:46 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 7 Feb 2008 11:08:46 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <6.2.3.4.2.20080206140052.032df040@mail.jpl.nasa.gov>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net>
	<02A63D14-3E34-4C0E-A012-D491922AC023@ee.duke.edu>
	<Pine.LNX.4.64.0802061312001.20835@cain.rgb.private.net>
	<220FE1C2-C27A-4B94-8060-D4D78DFCF50A@staff.uni-marburg.de>
	<Pine.LNX.4.64.0802061624510.22237@cain.rgb.private.net>
	<6.2.3.4.2.20080206140052.032df040@mail.jpl.nasa.gov>
Message-ID: <Pine.LNX.4.64.0802071025450.22976@cain.rgb.private.net>

On Wed, 6 Feb 2008, Jim Lux wrote:

> Ahh.. but there is a "routing" process of sorts inside your AP.  It has to 
> bridge from the 802.3 wired world to the 802.11 wireless world, and that 
> usually involves some store and forward type processing.  Some of these are 
> implemented as a store and forward router (e.g. home firewall) with one of 
> the logical ports connected to the wireless modem.  Very, very few access 
> points (if any) are actually a dumb packet oriented bridge that just unwraps 
> the payload from one frame type and shoots it out rewrapped for the other. 
> The AP has to do things like send out broadcast frames with the SSID, send 
> and receive the link setup/teardown kinds of frames (i.e. the link between 
> your PC's wireless interface and the AP), as well as bridging/routing traffic 
> from the wired network to the wireless network.
>
> Who's to say what kind of logic they have inside there to deal with all the 
> issues (the wireless MAC and the wired MAC are different, if nothing else.)

No arguments, but...

As far as the programmer API is concerned, IP is IP is IP, TCP is even
more removed.  The whole point of TCP, in fact, is that one is NOT
supposed to need to know or care if the packet one is wrapping up for
some destination is about to go out on a wire or wireless link, travel
over copper or fiber, pass through hubs, bridges, routers, switches.  A
properly formed packet that isn't in a channel controlled by e.g.
firewalls or port blockers is "guaranteed" to reach its destination, if
its destination be reachable and correctly bidirectionally routed, and
even to be resequenced and/or retransmitted if need be until the entire
message is at least "reasonably" accurately received by the receiver.

UDP is somewhat different.  It is a connectionless protocol, for one
thing.  However, the most important difference is that it is not a
"reliable" protocol -- is is close to what one might call "raw" IP.
Form a packet, drop it on the wire, pray that it is received.  If it is
part of a sequence of packets, pray that they are received in the
correct order, as WAN connections may well switch routes or delay
individual packets in route as the circumstances of traffic dictate or
lose a packet altogether.

Services built on UDP (PVM and at one time NFS) have to basically
replicate TCP's e.g. packet sequencing and reliable delivery checks in
order to become reliable.  Ordinarily UDP based services are
non-critical, and they are usually offered only "on the same wire" -- on
a network without a lot of routing hops in between, although switched
connections or single-hop bridges don't usually constitute a problem --
unless UDP is so augmented to make it reliable, and even then it is RARE
to run a UDP-based service over a WAN AFAIK.

I still don't seriously suspect that WAP per se, because every other
service in the Universe, TCP or UDP or ICMP, that I've used over
wireless works perfectly, always.  Oh, the connection itself isn't
horribly reliable -- turn on the microwave oven, drop the link, load the
device heavily, links get a bit flaky -- but EVERYthing works when the
link is up and solid.  To the best of my ability to test it (which isn't
terribly shabby, given nmap after all), it is transparent to IP from
broadcasts on down to individual ports on the local bridged 192.168.1.x
network, in both directions.

What is different on a WAP is timing (e.g. latency).  As you say,
there's a fair bit of out-of-band traffic associated with wireless
links.  My MIMO router up to the very latest firmware upgrade would
generate all sorts of spurious traffic that I suspect was associated
with link optimization and so on, but of course it was difficult to know
for sure as it was largely out of band.  Even so, however, it is really,
really odd that PVM has a segment that is so sensitive (or so unusual in
terms of its socket code) that it fails while everything else works.

Anyway, it sounds like the general answer is that nobody on list has
really encountered this or knows what is causing it, so I guess my
choices are to grab the PVM sources, do a build, do a -d0x0 run to
isolate once again the precise point where the process of adding a
wireless host fails, instrument the code to the point (possibly on both
ends -- it could be the target PVMD as easily as the master) where I can
actually see what is or isn't getting through, and then either figure
out why and "properly" fix it or muck around with the code to where the
problem goes away even though I don't know why (by e.g. inserting
"arbitrary" delays here or there to give a wireless network time to
catch up and avoid a race, which sucks I agree but which often works
anyway...;-) OR to just blow it off again and live with it, like I did
last time.

Or I suppose I could always file a bugzilla report and hope that it
filters back to the developers who actually know the code and can
properly fix it.  Hmmm, time time time.  Who has the time.

    rgb

>
> Jim Lux
>
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From wrankin at ee.duke.edu  Thu Feb  7 09:15:34 2008
From: wrankin at ee.duke.edu (Bill Rankin)
Date: Thu, 7 Feb 2008 12:15:34 -0500
Subject: [Beowulf] Re: PVM on wireless...
In-Reply-To: <3622EF04-4040-4EBD-AF07-0BF4D1CAB8AD@ee.duke.edu>
References: <E1JMrOU-00062O-A3@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802061629170.22237@cain.rgb.private.net>
	<3622EF04-4040-4EBD-AF07-0BF4D1CAB8AD@ee.duke.edu>
Message-ID: <6DAAD27B-C644-4C8F-AFD9-D7A2DEE65C35@ee.duke.edu>

Update.

I got PVM running on my laptop and successfully added one of my  
servers to the hostlist using the command line at the pvm prompt.   
This was over wireless.

The laptop was running pvm 3.4.5-7 rpm under CentOS 5.  The other  
machine had pvm 3.4.4 under CentOS 4.

The main bits seemed to be:

Getting PVM_ROOT=/usr/share/pvm3 and PVM_RSH=ssh set on both sides  
(added to .bashrc).  Checked by doing an 'ssh <remotemachine> export'  
and verified contents.

Rob: do you do host-based authentication under ssh?  I don't, so I  
had to type in my passwords at the 'pvm>' prompt.

Sorry I can't offer anything more.

-bill

On Feb 6, 2008, at 6:18 PM, Bill Rankin wrote:

> I have a home setup similar to yours - a WAP acting as a firewall,  
> dhcp from a linux server.  I have a spare laptop running CentOS, so  
> I'll give it a check tonight to see if mine runs.
>
> Q1: to you have DHCP giving a static address to your laptop based  
> upon it's MAC?
>
> Q2: have you tried this with the PVM 3.4.4 RPMs (I think you  
> mentioned you were running 3.4.5)?
>
> -b
>
>>
>> I can literally snap the same box onto a wire, wait for it to get  
>> an IP
>> number on the wire, and rerun the experiment on the same hardware  
>> and it
>> works perfectly (with a different but identically entered name, of
>> course).  And it is the wireless name that corresponds with the
>> `hostname` (in /etc/sysconfig/network), not that this should  
>> matter (and
>> it doesn't on the wire).
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf


From rgb at phy.duke.edu  Thu Feb  7 09:55:31 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 7 Feb 2008 12:55:31 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <20080206231328.GA1249@neo.csm.ornl.gov>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
Message-ID: <Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>

On Wed, 6 Feb 2008, kohlja at ornl.gov wrote:

> Hey Gang!
>
> Sounds like you're having some "fun" with PVM over wireless...?  :-)
>
> (A buddy (Wael Elwasif) forwarded your discussion to me;
> please always feel free to copy "pvm at msr.csm.ornl.gov"
> with PVM inquiries when you get stuck.  I try to be
> pretty responsive, though this is all unfunded work now... :)

Bless you.

However, I've just manage to figure the problem out on my own.  It is,
after all, a firewall issue.  There are apparently different/new
defaults in Fedora 7 and 8 than I expected.  If I >>completely disable<<
the firewall it works.  This isn't really desireable, so I'll go back
and see if I can figure out how to open the minimal set of ports to make
it work.  I wasn't seeing it in my earlier tests because I was verifying
that it worked FROM a newly installed wired Fedora 8 host to my older
hosts, that happened to be wired, or to a fedora 7 or fedora 8 laptop
that wouldn't work even with the appropriate interfaces set to trusted.
When just to be thorough I tried to configure the F8 wired system from
an older F6 wired system, it failed too, which led me to try disabling
the firewall altogether.

I apologize to all those who wasted time trying to help me with
something I should have figured out on my own.  I was fooled by the
accidental appearance of order, with both my extant laptops running the
same dysfunctional firewall, and by testing connections only FROM my one
wired host running F8.  I should have just kept plugging until I tested
to and from every pair.

While I've got the One True PVM Human(s) on the line, though -- a
suggestion for PVM to help others avoid this problem in the future on
networks wired and wireless:

It would really, really help if man pvm (or man pvmd or man pvm_intro)
documented a suitable firewall setting that will let PVM function
without just turning off the firewall altogether.  There is no pvm setup
in /etc/services, for example, no pvm checkbox in the panels managed by
system-config-firewall in the latest Fedoras, no suggestion as to what
what protected port(s) or ranges one has to enable explicitly.  In fact
for once even google is failing me -- I'm not finding a lot of
documentation or remarks by ANYONE on what ports pvm needs open (besides
ssh, which obviously is open and works).  Usually as long as the
spawning of a network application itself works using an enabled
protected port (in this case, I would have expected ssh), the secondary
ports opened in unprotected space just work.  Am I wrong in this?  Do I
need to explicitly open more ports somewhere?

To find out, this leaves me with running e.g. tcpdump and watching as
pvm attempts to connect, opening port ranges one at a time and doing a
binary search, or something similarly painful.  Or just asking you.  So
what (minimal set of) ports do I need to leave open besides ssh, which
is always open on my systems anyway?

An additional suggestion would be to (if possible) have the RPM install
"fix" the port situation so that pvm shows up on system-config-firewall
and/or finish with a message to the installer that a particular firewall
setting must be installed or enabled and/or add something to the
debugging info provided by pvm so that on a timeout (in particular) it
prints something like "Unable to connect due to timeout.  Verify that
pvm is correctly installed and that port range xxxx-xxxx is open on the
target."

I actually help a lot of people get started with PVM (they write me
offline because I have a template PVM tarball up on my personal website)
and the more I know, the better I can help them...;-)

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From wrankin at ee.duke.edu  Thu Feb  7 10:34:37 2008
From: wrankin at ee.duke.edu (Bill Rankin)
Date: Thu, 7 Feb 2008 13:34:37 -0500
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
Message-ID: <99265B4F-305C-4E99-AA7F-93CAABE571B8@ee.duke.edu>

I think that I managed to replicate your problem, Rob.

Laptop running CentoOS5, pvm 3.4.5-7(rpm), wireless ethernet.
Server running FC6, pvm 3.4.5-7(rpm)

Ssh working fine in both directions, PVM_ROOT and PVM_RSH set  
accordingly.

Running "pvm" from the shell on the server and doing an "add  
<laptop>" at the prompt.
Prompted for password.
PVM then hangs waiting to add remote host.
On the remote host, we see the pvmd running with a "ps".

If I do nothing: the remote pvmd eventually dies and after that the  
command prompt on the server returns with a "1 successful" message,  
but a "conf" command shows that no hosts were added.

Here is the weird part: if after I issue the "add <laptop>" command,  
I then go over to the laptop and run "pvm" from a shell, the  
connection is made and the hosts are successfully added.

So you may want to try this and see if you get similar behavior.


Last datapoint: if from my laptop I attempt to add a host that has  
PVM 3.4.4 (CentOS4 rpm) installed, it starts up fine.  So I think  
that it's a bug in 3.4.5-7.  I haven't tried it over a wired  
connection yet.

So you may want to try dropping back to version 3.4.4 on all machines  
and see if that helps.


Jim Kohl at ORNL seems to have several patches to 3.4.5, and I'm  
wondering if this issue has already been addressed.


-bill


From rgb at phy.duke.edu  Thu Feb  7 10:41:57 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 7 Feb 2008 13:41:57 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <99265B4F-305C-4E99-AA7F-93CAABE571B8@ee.duke.edu>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
	<99265B4F-305C-4E99-AA7F-93CAABE571B8@ee.duke.edu>
Message-ID: <Pine.LNX.4.64.0802071340310.23523@cain.rgb.private.net>

On Thu, 7 Feb 2008, Bill Rankin wrote:

> I think that I managed to replicate your problem, Rob.
>
> Laptop running CentoOS5, pvm 3.4.5-7(rpm), wireless ethernet.
> Server running FC6, pvm 3.4.5-7(rpm)
>
> Ssh working fine in both directions, PVM_ROOT and PVM_RSH set accordingly.

Try it with the firewalls completely down and I'll bet it works.

However, it is REALLY strange that it works with them UP for some
combinations.  Or not so strange -- that's what was fooling me, after
all.  Perhaps the port ranges being used are varying with version or
chance.

    rgb

>
> Running "pvm" from the shell on the server and doing an "add <laptop>" at the 
> prompt.
> Prompted for password.
> PVM then hangs waiting to add remote host.
> On the remote host, we see the pvmd running with a "ps".
>
> If I do nothing: the remote pvmd eventually dies and after that the command 
> prompt on the server returns with a "1 successful" message, but a "conf" 
> command shows that no hosts were added.
>
> Here is the weird part: if after I issue the "add <laptop>" command, I then 
> go over to the laptop and run "pvm" from a shell, the connection is made and 
> the hosts are successfully added.
>
> So you may want to try this and see if you get similar behavior.
>
>
> Last datapoint: if from my laptop I attempt to add a host that has PVM 3.4.4 
> (CentOS4 rpm) installed, it starts up fine.  So I think that it's a bug in 
> 3.4.5-7.  I haven't tried it over a wired connection yet.
>
> So you may want to try dropping back to version 3.4.4 on all machines and see 
> if that helps.
>
>
> Jim Kohl at ORNL seems to have several patches to 3.4.5, and I'm wondering if 
> this issue has already been addressed.
>
>
> -bill

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From kohlja at ornl.gov  Thu Feb  7 10:53:04 2008
From: kohlja at ornl.gov (kohlja at ornl.gov)
Date: Thu, 7 Feb 2008 13:53:04 -0500
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
Message-ID: <20080207185304.GA11286@neo.csm.ornl.gov>

Hi Robert/Rob/RGB!  :-)

  On Thu, Feb 07, 2008 at 12:55:31PM -0500, Robert G. Brown wrote:
  > On Wed, 6 Feb 2008, kohlja at ornl.gov wrote:
  >> Hey Gang!
  >> Sounds like you're having some "fun" with PVM over wireless...?  :-)
  >> (A buddy (Wael Elwasif) forwarded your discussion to me;
  >> please always feel free to copy "pvm at msr.csm.ornl.gov"
  >> with PVM inquiries when you get stuck.  I try to be
  >> pretty responsive, though this is all unfunded work now... :)

  > Bless you.

De nada, you're welcome.  :-)

  > However, I've just manage to figure the problem out on my own.  It is,
  > after all, a firewall issue... <snip/>

Ah, Good!  Glad that's all it was, not that it wasn't a hassle to identify! :)

Sorry it was so non-obvious from the PVM side of things...  :-b

  > While I've got the One True PVM Human(s) on the line, though...

Mwuahahahahahaaaa...  :-)

  > -- a suggestion for PVM to help others avoid this problem in the future
  > on networks wired and wireless:

  > It would really, really help if man pvm (or man pvmd or man pvm_intro)
  > documented a suitable firewall setting that will let PVM function
  > without just turning off the firewall altogether.  There is no pvm setup
  > in /etc/services, for example, no pvm checkbox in the panels managed by
  > system-config-firewall in the latest Fedoras, no suggestion as to what
  > what protected port(s) or ranges one has to enable explicitly.  In fact
  > for once even google is failing me -- I'm not finding a lot of
  > documentation or remarks by ANYONE on what ports pvm needs open (besides
  > ssh, which obviously is open and works).  Usually as long as the
  > spawning of a network application itself works using an enabled
  > protected port (in this case, I would have expected ssh), the secondary
  > ports opened in unprotected space just work.  Am I wrong in this?  Do I
  > need to explicitly open more ports somewhere?

Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use as
many ports as you have machines in your cluster, or could use just 1.  :-}

Normally, the master pvmd creates/accepts connections over a small
set of ports, possibly 1, but if PvmRouteDirect is enabled in a PVM
application, then a myriad of direct-connection socket links are
created, to link whichever machines the local PVM application tasks
communicate with, on a demand-driven basis...

So it's not generally possible to specify an explicit "range" of ports.
However, it _is_ possible to set the "starting" port for this collection,
using the aforementioned "$PVMNETSOCKPORT" environment variable.

This sets the first port that PVM will try to use, and all subsequent
ports will usually be consecutive positive increments of that starting
port (i.e. PVMNETSOCKPORT++... :-).

So in most cases, you could probably plan on opening up a 100 or 1000
ports _somewhere_ in your firewall, depending on your needs, and then
just tell PVM where to start, using $PVMNETSOCKPORT...

I've always considered this solution a bit of a kludge, which is why
it doesn't show up in the man pages, but if it works well enough to
save people lots of hassle, then I can add some commentary on it...?

  > To find out, this leaves me with running e.g. tcpdump and watching as
  > pvm attempts to connect, opening port ranges one at a time and doing a
  > binary search, or something similarly painful.  Or just asking you.  So
  > what (minimal set of) ports do I need to leave open besides ssh, which
  > is always open on my systems anyway?

  > An additional suggestion would be to (if possible) have the RPM install
  > "fix" the port situation so that pvm shows up on system-config-firewall
  > and/or finish with a message to the installer that a particular firewall
  > setting must be installed or enabled and/or add something to the
  > debugging info provided by pvm so that on a timeout (in particular) it
  > prints something like "Unable to connect due to timeout.  Verify that
  > pvm is correctly installed and that port range xxxx-xxxx is open on the
  > target."

You _should_ be getting some sort of timeout message in the slave
pvmd's log file (/tmp/pvml.<uid> on the slave machine), when the
connection request to the master pvmd doesn't get a reply...?

It may depend on the firewall settings, but a nice "Connection
Refused" would usually go a long way toward diagnosing things,
whereas the more secure firewall alternative of simply
"no response" would only result in a "timed out" PVM message...

I'm open to suggestions on ways to identify or diagnose the problem...!

Thanks Much for your interest and feedback!

All the Best,

	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)

  > I actually help a lot of people get started with PVM (they write me
  > offline because I have a template PVM tarball up on my personal website)
  > and the more I know, the better I can help them...;-)

  >    rgb

  > -- 
  > Robert G. Brown                            Phone(cell): 1-919-280-8443
  > Duke University Physics Dept, Box 90305
  > Durham, N.C. 27708-0305
  > Web: http://www.phy.duke.edu/~rgb
  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977

(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:

   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  They
   Oak Ridge National Laboratory              still owe you money, Fool!"
   kohlja at ornl.gov
   http://www.csm.ornl.gov/~kohl/          Long Live Curtis Blues!!!

:):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)


From wrankin at ee.duke.edu  Thu Feb  7 11:23:13 2008
From: wrankin at ee.duke.edu (Bill Rankin)
Date: Thu, 7 Feb 2008 14:23:13 -0500
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802071340310.23523@cain.rgb.private.net>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
	<99265B4F-305C-4E99-AA7F-93CAABE571B8@ee.duke.edu>
	<Pine.LNX.4.64.0802071340310.23523@cain.rgb.private.net>
Message-ID: <5D962A53-7593-4370-98FC-F0D74702208A@ee.duke.edu>


On Feb 7, 2008, at 1:41 PM, Robert G. Brown wrote:

> On Thu, 7 Feb 2008, Bill Rankin wrote:
>
>> I think that I managed to replicate your problem, Rob.
>>
>> Laptop running CentoOS5, pvm 3.4.5-7(rpm), wireless ethernet.
>> Server running FC6, pvm 3.4.5-7(rpm)
>>
>> Ssh working fine in both directions, PVM_ROOT and PVM_RSH set  
>> accordingly.
>
> Try it with the firewalls completely down and I'll bet it works.

Well, duh.

Yeah, that was it.  Although disabling the firewall on a wireless  
connection does not give me the warm fuzzies.


> However, it is REALLY strange that it works with them UP for some
> combinations.  Or not so strange -- that's what was fooling me, after
> all.  Perhaps the port ranges being used are varying with version or
> chance.

I suspect that's the issue here.

-b


From gerry.creager at tamu.edu  Thu Feb  7 11:26:47 2008
From: gerry.creager at tamu.edu (Gerry Creager)
Date: Thu, 07 Feb 2008 13:26:47 -0600
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802071340310.23523@cain.rgb.private.net>
References: <20080206231328.GA1249@neo.csm.ornl.gov>	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>	<99265B4F-305C-4E99-AA7F-93CAABE571B8@ee.duke.edu>
	<Pine.LNX.4.64.0802071340310.23523@cain.rgb.private.net>
Message-ID: <47AB5B77.9050802@tamu.edu>

FWIW, we saw this with ROCKS and MPICH, a couple of years ago.  Took a 
lot of firewall tweaking, and it's been too many beers to recall the 
details, to get things working.  It is odd.

gerry

Robert G. Brown wrote:
> On Thu, 7 Feb 2008, Bill Rankin wrote:
> 
>> I think that I managed to replicate your problem, Rob.
>>
>> Laptop running CentoOS5, pvm 3.4.5-7(rpm), wireless ethernet.
>> Server running FC6, pvm 3.4.5-7(rpm)
>>
>> Ssh working fine in both directions, PVM_ROOT and PVM_RSH set 
>> accordingly.
> 
> Try it with the firewalls completely down and I'll bet it works.
> 
> However, it is REALLY strange that it works with them UP for some
> combinations.  Or not so strange -- that's what was fooling me, after
> all.  Perhaps the port ranges being used are varying with version or
> chance.
> 
>    rgb
> 
>>
>> Running "pvm" from the shell on the server and doing an "add <laptop>" 
>> at the prompt.
>> Prompted for password.
>> PVM then hangs waiting to add remote host.
>> On the remote host, we see the pvmd running with a "ps".
>>
>> If I do nothing: the remote pvmd eventually dies and after that the 
>> command prompt on the server returns with a "1 successful" message, 
>> but a "conf" command shows that no hosts were added.
>>
>> Here is the weird part: if after I issue the "add <laptop>" command, I 
>> then go over to the laptop and run "pvm" from a shell, the connection 
>> is made and the hosts are successfully added.
>>
>> So you may want to try this and see if you get similar behavior.
>>
>>
>> Last datapoint: if from my laptop I attempt to add a host that has PVM 
>> 3.4.4 (CentOS4 rpm) installed, it starts up fine.  So I think that 
>> it's a bug in 3.4.5-7.  I haven't tried it over a wired connection yet.
>>
>> So you may want to try dropping back to version 3.4.4 on all machines 
>> and see if that helps.
>>
>>
>> Jim Kohl at ORNL seems to have several patches to 3.4.5, and I'm 
>> wondering if this issue has already been addressed.
>>
>>
>> -bill
> 

-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843


From mathog at caltech.edu  Thu Feb  7 12:33:08 2008
From: mathog at caltech.edu (David Mathog)
Date: Thu, 07 Feb 2008 12:33:08 -0800
Subject: [Beowulf] Re: PVM on wireless...
Message-ID: <E1JNDQi-0006l8-Oi@mendel.bio.caltech.edu>


> From: "Robert G. Brown" <rgb at phy.duke.edu>

> However, I've just manage to figure the problem out on my own.  It is,
> after all, a firewall issue. 

Good that you sorted that out.  

A word of warning though, just yesterday I ran into a case where the
command to "turn the firewall off", didn't.  What it did instead was
wall off the machine.  This was on a vanilla Mandriva 2007.1 machine, after:

 /etc/rc.d/init.d/shorewall stop

iptables showed that the Input and Forward chains were set to DROP.
Of course the only way I could find this out was on the console of that
machine, which was luckily only about 5 feet way.  This may be what
is desired in some instances, but it wasn't what I  wanted here. 
(Plus it would suck big time if that happened on
a remotely administered machine.)  To really get rid of the firewall

  /etc/rc.d/init.d/iptables stop

was also needed.  After that iptables --list showed the expected
ACCEPT on all 3 chains and the packets that needed to get through
for the test finally did.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From rgb at phy.duke.edu  Thu Feb  7 13:42:21 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 7 Feb 2008 16:42:21 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <20080207185304.GA11286@neo.csm.ornl.gov>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
	<20080207185304.GA11286@neo.csm.ornl.gov>
Message-ID: <Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>

On Thu, 7 Feb 2008, kohlja at ornl.gov wrote:

>  > It would really, really help if man pvm (or man pvmd or man pvm_intro)
>  > documented a suitable firewall setting that will let PVM function
>  > without just turning off the firewall altogether.  There is no pvm setup
>  > in /etc/services, for example, no pvm checkbox in the panels managed by
>  > system-config-firewall in the latest Fedoras, no suggestion as to what
>  > what protected port(s) or ranges one has to enable explicitly.  In fact
>  > for once even google is failing me -- I'm not finding a lot of
>  > documentation or remarks by ANYONE on what ports pvm needs open (besides
>  > ssh, which obviously is open and works).  Usually as long as the
>  > spawning of a network application itself works using an enabled
>  > protected port (in this case, I would have expected ssh), the secondary
>  > ports opened in unprotected space just work.  Am I wrong in this?  Do I
>  > need to explicitly open more ports somewhere?
>
> Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use as
> many ports as you have machines in your cluster, or could use just 1.  :-}
>
> Normally, the master pvmd creates/accepts connections over a small
> set of ports, possibly 1, but if PvmRouteDirect is enabled in a PVM
> application, then a myriad of direct-connection socket links are
> created, to link whichever machines the local PVM application tasks
> communicate with, on a demand-driven basis...
>
> So it's not generally possible to specify an explicit "range" of ports.
> However, it _is_ possible to set the "starting" port for this collection,
> using the aforementioned "$PVMNETSOCKPORT" environment variable.

OK, I'm giving this a try.  Although I'd have to ask why pvmd doesn't do
the fork thing and clone a single open port on which it listens into a
dynamically allocated port that inherits from the open one.  In
principle one only needs a single port to be open to connect to pretty
much any network based application, or so I had thought.  At least, I do
that in xmlsysd and never have to punch more than one porthole through a
firewall.

Hmmm, it's working sort of -- looks like I need to open UPD ports,
right, not TCP?  Having trouble on one host where I've punched the hole
but didn't >>locally<< set PVMNETSOCKPORT to match, so I'm trying again
with the local environment variable set.

Yup, that works.

So I'm guessing that pvmd reads it as it starts up wherever.  Why does
it need to do this on a client?  Can't the port(s) be passed from the
master when it starts up pvmd?

> This sets the first port that PVM will try to use, and all subsequent
> ports will usually be consecutive positive increments of that starting
> port (i.e. PVMNETSOCKPORT++... :-).
>
> So in most cases, you could probably plan on opening up a 100 or 1000
> ports _somewhere_ in your firewall, depending on your needs, and then
> just tell PVM where to start, using $PVMNETSOCKPORT...
>
> I've always considered this solution a bit of a kludge, which is why
> it doesn't show up in the man pages, but if it works well enough to
> save people lots of hassle, then I can add some commentary on it...?

Kludge or not, how can you have an environment variable in an
application and not provide knowledge of it or instructions on its use
in the man page?  Something like:

  PVM requires open ports on target hosts to function.  Many hosts are
  installed with strong firewall rules by default.  If you install pvm on
  a slave and pvm appears to hang when you attempt to add it, eventually
  timing out without success, consider adding the following to your local
  personal or system environment (in, for example, ~/.bash_profile on all
  hosts):

    PVMNETSOCKPORT=10000
    export PVMNETSOCKPORT

  Then configure your firewall(s) to open a range of udp ports starting
  at this value, such as 10000-11024 (which need be any larger than the
  largest number of machines you expect to have in your virtual machine).

However a better solution still is to have the daemon fork on a single
"permanent" port address > 1024, e.g. 10000, and get a negotiated
connection in the upper (non-protected) port space that way.

> It may depend on the firewall settings, but a nice "Connection
> Refused" would usually go a long way toward diagnosing things,
> whereas the more secure firewall alternative of simply
> "no response" would only result in a "timed out" PVM message...
>
> I'm open to suggestions on ways to identify or diagnose the problem...!

As I said, document EVERYTHING in the man page(s).  It is what it is for.
Lots of users do, in fact, RTFM but get frustrated and give up when they
try something and it just doesn't work and they can't see why.

On the same line, a perennial problem with PVM is getting it to work
with rsh and ssh.  In fact, half the problems I help people with who
randomly write me is getting it to work with one or the other.  The
internal diagnostics are certainly very helpful, at this point, but it
would also be worth adding a new man page like pvm_rsh that does nothing
but walk users through the ritual of setting PVM_RSH and establishing
appropriate e.g. ssh keys.

Just a thought or two.

    rgb

>
> Thanks Much for your interest and feedback!
>
> All the Best,
>
> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
>
>  > I actually help a lot of people get started with PVM (they write me
>  > offline because I have a template PVM tarball up on my personal website)
>  > and the more I know, the better I can help them...;-)
>
>  >    rgb
>
>  > --
>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
>  > Duke University Physics Dept, Box 90305
>  > Durham, N.C. 27708-0305
>  > Web: http://www.phy.duke.edu/~rgb
>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>
> (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:
>
>   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  They
>   Oak Ridge National Laboratory              still owe you money, Fool!"
>   kohlja at ornl.gov
>   http://www.csm.ornl.gov/~kohl/          Long Live Curtis Blues!!!
>
> :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From kohlja at ornl.gov  Thu Feb  7 14:11:32 2008
From: kohlja at ornl.gov (kohlja at ornl.gov)
Date: Thu, 7 Feb 2008 17:11:32 -0500
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
	<20080207185304.GA11286@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>
Message-ID: <20080207221132.GA26027@neo.csm.ornl.gov>

Hey RGB!

Glad the env var worked for you, and sorry PVM is such a port hog.  :-]

It was all written long before firewalls were in such common usage
(heck, it was built around .rhosts for authentication! :).

Btw, if I'm not mistaken, I think the master pvmd connects _back_
to the slave pvmd, too, so both sides need proper holes in their
firewalls, and corresponding PVMNETSOCKPORT settings...?

I understand your basic premise on documenting "all" features
in man pages; my resistance for certain features is based on
past experiences from users "poking around" and shooting
themselves in the foot by trying every little tweak mentioned
in the man page, whether they needed it or not...!  :-}

I guess way back when we learned to hide some features to avoid
confusion with novice users, in the hope that more advanced users
would be smart enough to stumble onto them (or ask us howto :).

I admit this may be an antiquated cynical mentality, and I
further concur that PVMNETSOCKPORT is an obvious omission
in the basic documentation/faq...

Thanks for your suggested text!  (And the suggestion to
enhance our coverage of rsh/ssh usage... :-)

All the Best,

	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)

  On Thu, Feb 07, 2008 at 04:42:21PM -0500, Robert G. Brown wrote:
  >>  > It would really, really help if man pvm (or man pvmd or man pvm_intro)
  >>  > documented a suitable firewall setting that will let PVM function
  >>  > without just turning off the firewall altogether.  There is no pvm 
  >> setup
  >>  > in /etc/services, for example, no pvm checkbox in the panels managed by
  >>  > system-config-firewall in the latest Fedoras, no suggestion as to what
  >>  > what protected port(s) or ranges one has to enable explicitly.  In fact
  >>  > for once even google is failing me -- I'm not finding a lot of
  >>  > documentation or remarks by ANYONE on what ports pvm needs open 
  >> (besides
  >>  > ssh, which obviously is open and works).  Usually as long as the
  >>  > spawning of a network application itself works using an enabled
  >>  > protected port (in this case, I would have expected ssh), the secondary
  >>  > ports opened in unprotected space just work.  Am I wrong in this?  Do I
  >>  > need to explicitly open more ports somewhere?
  >>
  >> Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use as
  >> many ports as you have machines in your cluster, or could use just 1.  :-}
  >>
  >> Normally, the master pvmd creates/accepts connections over a small
  >> set of ports, possibly 1, but if PvmRouteDirect is enabled in a PVM
  >> application, then a myriad of direct-connection socket links are
  >> created, to link whichever machines the local PVM application tasks
  >> communicate with, on a demand-driven basis...
  >>
  >> So it's not generally possible to specify an explicit "range" of ports.
  >> However, it _is_ possible to set the "starting" port for this collection,
  >> using the aforementioned "$PVMNETSOCKPORT" environment variable.

  > OK, I'm giving this a try.  Although I'd have to ask why pvmd doesn't do
  > the fork thing and clone a single open port on which it listens into a
  > dynamically allocated port that inherits from the open one.  In
  > principle one only needs a single port to be open to connect to pretty
  > much any network based application, or so I had thought.  At least, I do
  > that in xmlsysd and never have to punch more than one porthole through a
  > firewall.

  > Hmmm, it's working sort of -- looks like I need to open UPD ports,
  > right, not TCP?  Having trouble on one host where I've punched the hole
  > but didn't >>locally<< set PVMNETSOCKPORT to match, so I'm trying again
  > with the local environment variable set.

  > Yup, that works.

  > So I'm guessing that pvmd reads it as it starts up wherever.  Why does
  > it need to do this on a client?  Can't the port(s) be passed from the
  > master when it starts up pvmd?

  >> This sets the first port that PVM will try to use, and all subsequent
  >> ports will usually be consecutive positive increments of that starting
  >> port (i.e. PVMNETSOCKPORT++... :-).
  >>
  >> So in most cases, you could probably plan on opening up a 100 or 1000
  >> ports _somewhere_ in your firewall, depending on your needs, and then
  >> just tell PVM where to start, using $PVMNETSOCKPORT...
  >>
  >> I've always considered this solution a bit of a kludge, which is why
  >> it doesn't show up in the man pages, but if it works well enough to
  >> save people lots of hassle, then I can add some commentary on it...?

  > Kludge or not, how can you have an environment variable in an
  > application and not provide knowledge of it or instructions on its use
  > in the man page?  Something like:

  >  PVM requires open ports on target hosts to function.  Many hosts are
  >  installed with strong firewall rules by default.  If you install pvm on
  >  a slave and pvm appears to hang when you attempt to add it, eventually
  >  timing out without success, consider adding the following to your local
  >  personal or system environment (in, for example, ~/.bash_profile on all
  >  hosts):

  >    PVMNETSOCKPORT=10000
  >    export PVMNETSOCKPORT

  >  Then configure your firewall(s) to open a range of udp ports starting
  >  at this value, such as 10000-11024 (which need be any larger than the
  >  largest number of machines you expect to have in your virtual machine).

  > However a better solution still is to have the daemon fork on a single
  > "permanent" port address > 1024, e.g. 10000, and get a negotiated
  > connection in the upper (non-protected) port space that way.

  >> It may depend on the firewall settings, but a nice "Connection
  >> Refused" would usually go a long way toward diagnosing things,
  >> whereas the more secure firewall alternative of simply
  >> "no response" would only result in a "timed out" PVM message...
  >>
  >> I'm open to suggestions on ways to identify or diagnose the problem...!

  > As I said, document EVERYTHING in the man page(s).  It is what it is for.
  > Lots of users do, in fact, RTFM but get frustrated and give up when they
  > try something and it just doesn't work and they can't see why.

  > On the same line, a perennial problem with PVM is getting it to work
  > with rsh and ssh.  In fact, half the problems I help people with who
  > randomly write me is getting it to work with one or the other.  The
  > internal diagnostics are certainly very helpful, at this point, but it
  > would also be worth adding a new man page like pvm_rsh that does nothing
  > but walk users through the ritual of setting PVM_RSH and establishing
  > appropriate e.g. ssh keys.

  > Just a thought or two.

  >    rgb

  >>
  >> Thanks Much for your interest and feedback!
  >>
  >> All the Best,
  >>
  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
  >>
  >>  > I actually help a lot of people get started with PVM (they write me
  >>  > offline because I have a template PVM tarball up on my personal 
  >> website)
  >>  > and the more I know, the better I can help them...;-)
  >>
  >>  >    rgb
  >>
  >>  > --
  >>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
  >>  > Duke University Physics Dept, Box 90305
  >>  > Durham, N.C. 27708-0305
  >>  > Web: http://www.phy.duke.edu/~rgb
  >>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
  >>
  >> (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:
  >>
  >>   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  They
  >>   Oak Ridge National Laboratory              still owe you money, Fool!"
  >>   kohlja at ornl.gov
  >>   http://www.csm.ornl.gov/~kohl/          Long Live Curtis Blues!!!
  >>
  >> :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
  >>

  > -- 
  > Robert G. Brown                            Phone(cell): 1-919-280-8443
  > Duke University Physics Dept, Box 90305
  > Durham, N.C. 27708-0305
  > Web: http://www.phy.duke.edu/~rgb
  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From rgb at phy.duke.edu  Fri Feb  8 02:35:31 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 8 Feb 2008 05:35:31 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <20080207221132.GA26027@neo.csm.ornl.gov>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
	<20080207185304.GA11286@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>
	<20080207221132.GA26027@neo.csm.ornl.gov>
Message-ID: <Pine.LNX.4.64.0802080504480.23523@cain.rgb.private.net>

On Thu, 7 Feb 2008, kohlja at ornl.gov wrote:

> I admit this may be an antiquated cynical mentality, and I
> further concur that PVMNETSOCKPORT is an obvious omission
> in the basic documentation/faq...

As they say, you can't RTFM if there ain't no FM... (or if the solution
exists but isn't there).

One is reminded of Dr. Strangelove, where the president (Peter Sellers)
has just learned that if the maverick B52 piloted by Slim Pickens gets
through, a doomsday device that is supposed to deter first nuclear
strikes will go off that will destroy the world.  Unfortunately, the
Soviet Union didn't actually tell us that it was built.  Dr.
Strangelove (Peter Sellers), after musing for a moment on the brilliance
of the concept, turns and says in an increasingly shrill voice:

   But...the whole point of the Doomsday Machine...is lost...if you keep
   it a SECRET. Why didn't you tell the world, eh?

Hmmm...;-)

    rgb

> Thanks for your suggested text!  (And the suggestion to
> enhance our coverage of rsh/ssh usage... :-)

Ya, well.  Just now finished telling the umptieth would-be PVM user how
to go about it in an email message, augmenting further online docs such
as this one:

   http://www.uow.edu.au/~suresh/web/cfamily/pvm.html

which is actually pretty decent, although I generally use the ssh
default dsa instead of rsa since on linux boxes it invariably works.
But better than forcing each user to employ google to snarf out
solutions to each problem they encounter, how much better to write a
really nice "Getting Started with PVM" or perhaps better still, a "PVM
HOWTO" on tldp.org.  Publish there, and be sure to include a copy in
plain sight in /usr/share/pvm3/PVM_HOWTO.

Truthfully, good documentation, especially a walkthrough tutorial on
getting started (including sample code or links to sample code) that
takes a would-be user from "yum install pvm\*" to executing a Real
Parallel Program (however trivial) on a two node cluster would really
encourage the use of the library.  Adding a bit more (such as a PVM
program development template) would be only icing on the cake, so to
speak.

If I had the time I'd write it myself.  I've already got a project_pvm
program template up on the web, but it is sadly underdocumented through
the setup of PVM itself.

    rgb

>
> All the Best,
>
> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
>
>  On Thu, Feb 07, 2008 at 04:42:21PM -0500, Robert G. Brown wrote:
>  >>  > It would really, really help if man pvm (or man pvmd or man pvm_intro)
>  >>  > documented a suitable firewall setting that will let PVM function
>  >>  > without just turning off the firewall altogether.  There is no pvm
>  >> setup
>  >>  > in /etc/services, for example, no pvm checkbox in the panels managed by
>  >>  > system-config-firewall in the latest Fedoras, no suggestion as to what
>  >>  > what protected port(s) or ranges one has to enable explicitly.  In fact
>  >>  > for once even google is failing me -- I'm not finding a lot of
>  >>  > documentation or remarks by ANYONE on what ports pvm needs open
>  >> (besides
>  >>  > ssh, which obviously is open and works).  Usually as long as the
>  >>  > spawning of a network application itself works using an enabled
>  >>  > protected port (in this case, I would have expected ssh), the secondary
>  >>  > ports opened in unprotected space just work.  Am I wrong in this?  Do I
>  >>  > need to explicitly open more ports somewhere?
>  >>
>  >> Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use as
>  >> many ports as you have machines in your cluster, or could use just 1.  :-}
>  >>
>  >> Normally, the master pvmd creates/accepts connections over a small
>  >> set of ports, possibly 1, but if PvmRouteDirect is enabled in a PVM
>  >> application, then a myriad of direct-connection socket links are
>  >> created, to link whichever machines the local PVM application tasks
>  >> communicate with, on a demand-driven basis...
>  >>
>  >> So it's not generally possible to specify an explicit "range" of ports.
>  >> However, it _is_ possible to set the "starting" port for this collection,
>  >> using the aforementioned "$PVMNETSOCKPORT" environment variable.
>
>  > OK, I'm giving this a try.  Although I'd have to ask why pvmd doesn't do
>  > the fork thing and clone a single open port on which it listens into a
>  > dynamically allocated port that inherits from the open one.  In
>  > principle one only needs a single port to be open to connect to pretty
>  > much any network based application, or so I had thought.  At least, I do
>  > that in xmlsysd and never have to punch more than one porthole through a
>  > firewall.
>
>  > Hmmm, it's working sort of -- looks like I need to open UPD ports,
>  > right, not TCP?  Having trouble on one host where I've punched the hole
>  > but didn't >>locally<< set PVMNETSOCKPORT to match, so I'm trying again
>  > with the local environment variable set.
>
>  > Yup, that works.
>
>  > So I'm guessing that pvmd reads it as it starts up wherever.  Why does
>  > it need to do this on a client?  Can't the port(s) be passed from the
>  > master when it starts up pvmd?
>
>  >> This sets the first port that PVM will try to use, and all subsequent
>  >> ports will usually be consecutive positive increments of that starting
>  >> port (i.e. PVMNETSOCKPORT++... :-).
>  >>
>  >> So in most cases, you could probably plan on opening up a 100 or 1000
>  >> ports _somewhere_ in your firewall, depending on your needs, and then
>  >> just tell PVM where to start, using $PVMNETSOCKPORT...
>  >>
>  >> I've always considered this solution a bit of a kludge, which is why
>  >> it doesn't show up in the man pages, but if it works well enough to
>  >> save people lots of hassle, then I can add some commentary on it...?
>
>  > Kludge or not, how can you have an environment variable in an
>  > application and not provide knowledge of it or instructions on its use
>  > in the man page?  Something like:
>
>  >  PVM requires open ports on target hosts to function.  Many hosts are
>  >  installed with strong firewall rules by default.  If you install pvm on
>  >  a slave and pvm appears to hang when you attempt to add it, eventually
>  >  timing out without success, consider adding the following to your local
>  >  personal or system environment (in, for example, ~/.bash_profile on all
>  >  hosts):
>
>  >    PVMNETSOCKPORT=10000
>  >    export PVMNETSOCKPORT
>
>  >  Then configure your firewall(s) to open a range of udp ports starting
>  >  at this value, such as 10000-11024 (which need be any larger than the
>  >  largest number of machines you expect to have in your virtual machine).
>
>  > However a better solution still is to have the daemon fork on a single
>  > "permanent" port address > 1024, e.g. 10000, and get a negotiated
>  > connection in the upper (non-protected) port space that way.
>
>  >> It may depend on the firewall settings, but a nice "Connection
>  >> Refused" would usually go a long way toward diagnosing things,
>  >> whereas the more secure firewall alternative of simply
>  >> "no response" would only result in a "timed out" PVM message...
>  >>
>  >> I'm open to suggestions on ways to identify or diagnose the problem...!
>
>  > As I said, document EVERYTHING in the man page(s).  It is what it is for.
>  > Lots of users do, in fact, RTFM but get frustrated and give up when they
>  > try something and it just doesn't work and they can't see why.
>
>  > On the same line, a perennial problem with PVM is getting it to work
>  > with rsh and ssh.  In fact, half the problems I help people with who
>  > randomly write me is getting it to work with one or the other.  The
>  > internal diagnostics are certainly very helpful, at this point, but it
>  > would also be worth adding a new man page like pvm_rsh that does nothing
>  > but walk users through the ritual of setting PVM_RSH and establishing
>  > appropriate e.g. ssh keys.
>
>  > Just a thought or two.
>
>  >    rgb
>
>  >>
>  >> Thanks Much for your interest and feedback!
>  >>
>  >> All the Best,
>  >>
>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
>  >>
>  >>  > I actually help a lot of people get started with PVM (they write me
>  >>  > offline because I have a template PVM tarball up on my personal
>  >> website)
>  >>  > and the more I know, the better I can help them...;-)
>  >>
>  >>  >    rgb
>  >>
>  >>  > --
>  >>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
>  >>  > Duke University Physics Dept, Box 90305
>  >>  > Durham, N.C. 27708-0305
>  >>  > Web: http://www.phy.duke.edu/~rgb
>  >>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>  >>
>  >> (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:
>  >>
>  >>   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  They
>  >>   Oak Ridge National Laboratory              still owe you money, Fool!"
>  >>   kohlja at ornl.gov
>  >>   http://www.csm.ornl.gov/~kohl/          Long Live Curtis Blues!!!
>  >>
>  >> :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
>  >>
>
>  > --
>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
>  > Duke University Physics Dept, Box 90305
>  > Durham, N.C. 27708-0305
>  > Web: http://www.phy.duke.edu/~rgb
>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From bernard at vanhpc.org  Mon Feb  4 12:43:13 2008
From: bernard at vanhpc.org (Bernard Li)
Date: Mon, 4 Feb 2008 12:43:13 -0800
Subject: [Beowulf] TIPC in a Beowulf?
In-Reply-To: <Pine.LNX.4.64.0802031720120.2756@coffee.psychology.mcmaster.ca>
References: <5a1205b30802031035wf416840s115b6a5278c20812@mail.gmail.com>
	<Pine.LNX.4.64.0802031720120.2756@coffee.psychology.mcmaster.ca>
Message-ID: <d4c731da0802041243g1936cf97tbea0a13750a32ffe@mail.gmail.com>

Hi Mark:

On 2/3/08, Mark Hahn <hahn at mcmaster.ca> wrote:

> I _think_ I'm not confusing TIPC with SCTP (which also seems to be rather
> telecom-oriented.)

Talking about SCTP, FYI both MPICH and Open MPI supports it.  For more
information, please see:

http://www.cs.ubc.ca/labs/dsg/mpi-sctp/

Cheers,

Bernard


From gmichal at uow.edu.au  Tue Feb  5 04:33:32 2008
From: gmichal at uow.edu.au (Guillaume Michal)
Date: Tue, 05 Feb 2008 23:33:32 +1100
Subject: [Beowulf] Rocks 4.3 and user accounts
Message-ID: <1202214812.6258.25.camel@earth>

Hi all, ( sorry for the duplicate mail, previous one was sent while I
didn't finished it...)

I'm in a process of deploying a HPC based on the rocks distribution.
After some troubles to install the system, I finally succeeded. Nodes
are up and everything seems find when I look at the Ganglia reports.
However I have some troubles with the users accounts. I googled to find
some info  without success so I try here in  case some of you are more
used to Rocks than me. 

Here is the problem, I decided to launch a simple test using mpirun from
mpich as described in the user guide (chapter 2.1 Launching Interactive
jobs). As specified, I created a user and synced the nodes from the
frontend:
# useradd someuser 
# rocks sync users

Then created the "machines" file and put it in the user home directory 
( /export/home/someuser ):
compute-0-0
compute-0-1

And followed the procedure to run it after putting the "HPL.dat" I found
in /var/www/blahblahblah in the user home directory.
	$ ssh-agent $SHELL
	$ ssh-add
	$ /opt/mpich/gnu/bin/mpirun -nolocal -np 2 -machinefile
machines /opt/hpl/mpich-hpl/bin/xhpl

At the moment the problem seems to be the rsa key and ssh. the $HOME
variable is set to /home/someuser whereas the real home is
in /export/home/someuser on the frontend. It leads to some errors when
creating the RSA key as it looks into /home/someuser to create the
files.
I modified the $HOME variable but while it's ok on the frontend, it is
not on the nodes as they expect something in /home/someuser. 
I though the problem was due to the useradd command so I decided to
create the user under Gnome (there must be a way in the console, 'd like
to know it btw...) by specifying the real path to the home user
(/export/home/someuser).
It was then possible to create the key properly. However as soon as I
sync the users on the cluster ( rocks sync users ), the $HOME variable
is set back to /home/someuser -> go back to the beging.

For me it appears to be a problem with the $HOME variable which should
be /export/home/someuser on the frontend and /home/someuser on the
nodes, but this should be automatically defined by Rocks, shouldn't it?

So basically how can I create users account with rsa key, sync the
cluster and still keep the right $HOME variables? I'm kind of lost here
or I clearly miss a point somewhere.

BTW, I kind of tweek "/etc/profile" to modify the $HOME variable and
declare it properly. As I did a standard Rocks installation I do not
believe one has to do this, apart from the fact it's a dirty method :-(

Thank you for your advices

Guillaume  


From joerg.bashir at gmail.com  Wed Feb  6 21:26:51 2008
From: joerg.bashir at gmail.com (Joerg Bashir)
Date: Wed, 6 Feb 2008 21:26:51 -0800
Subject: [Beowulf] For Sale: Quadrics TG108 with spare cards
Message-ID: <a86437870802062126j79c4417di4b14b13575e37735@mail.gmail.com>

I have a Quadrics TG108 for sale.  It is loaded with 2 mgmt cards and 12 8
port line cards for 96 10Gb ports.  It also comes with 1 or 2 spare line
cards and 1 spare mgmt card.

http://www.quadrics.com/Quadrics/QuadricsHome.nsf/DisplayPages/0836A75A25AAC0B3802570AB003CDF91

Make an offer?  I thought this list would be an appropriate forum, if not,
please accept my apology.

-Joerg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080206/60aaf7a6/attachment.html>

From gmichal at uow.edu.au  Thu Feb  7 02:18:18 2008
From: gmichal at uow.edu.au (Guillaume Michal)
Date: Thu, 07 Feb 2008 21:18:18 +1100
Subject: [Beowulf] Linpack and peak performance
Message-ID: <op.t55f8sk2kq1em0@earth>

Hi all,
We set up our first cluster in our faculty this week. As we are new to  
cluster computing, there is a lot to learn. We performed some linpack test  
using the OpenMPI benchmark available in the Rocks 4.3 distribution. The  
system is as follow:
  - GigB ethernet with switch HP Procurve 2800 series
  - 1 Master node: 500GB sata HDD, two intel quad core E5410 at 2.33GHz, 2GB  
mem
  - 4 nodes each having: 80GB sata HDD, two intel quad core E5410 at 2.33GHz,  
8GB mem

First I'm a bit confused by the parameters P and Q in HPL.dat and how to  
use them properly. I noticed a 4P 2Q test is not equivalent to a 2P 4Q,  
generally speaking it does not commute. Why? What is clearly P and Q then:  
P for number of processors per nodes and Q for the number of nodes?

Secondly, what is the definition of processor for a quad core  
architecture? I suppose a quad core should be counted as 4 processors.

I launched Linpack using Ns=10000 and various configuration for P and Q.  
At the moment I got a maximum of 78 Gflops using P=8 Q=4 -> 32 processors.

If I'm right the peak performance should be Rpeak= 4 cores x 4 floting  
point op per cycle x 2.33 Ghz x 8 quad cores = 298 Gflops.
Which would lead to a test running at ~25% Rpeak.

This is very low and I see 3 causes for the problem:
	- I miscalculated Rpeak
	- P and Q are not set properly
	- there is a serious bottelneck

Thanks for your advices

Guillaume


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


From bjtstarks at gmail.com  Thu Feb  7 07:45:16 2008
From: bjtstarks at gmail.com (Berkley Starks)
Date: Thu, 7 Feb 2008 08:45:16 -0700
Subject: [Beowulf] Setting up a new Beowulf cluster
Message-ID: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com>

Hello all,

I've been a computer user for the past several years working in different
areas of the IT world.  I've recently been commissioned by my university to
set up the first operating Beowulf Cluster.

I'm am moderately familiar with the Linux OS, having ran it for the past
several years using the distro's of Debian, Ubuntu, Fedora Core, and
Mandriva.

With setting up this new cluster I would like any advice possible on what OS
to use, how to set it up, and any other pertinent information that I might
need.

Thanks,

Berkley Starks

Oh, and the cluster will be used for computational physics.  I am a physics
major making it for the physics department here.  It will need to be able to
use C++ and Fortran at a bare minimum.

Thanks again
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080207/c478b116/attachment.html>

From eagles051387 at gmail.com  Wed Feb  6 08:02:07 2008
From: eagles051387 at gmail.com (Jon Aquilina)
Date: Wed, 6 Feb 2008 17:02:07 +0100
Subject: [Beowulf] getting kubuntu to perform as a cluster os
Message-ID: <a31cd3860802060802n1720061aha28f73e7ab8adf82@mail.gmail.com>

what would be necessary to get a normal desktop os such as kubuntu to run as
a clusterable os

-- 
Jonathan Aquilina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080206/4503061d/attachment.html>

From peter.st.john at gmail.com  Fri Feb  8 08:09:53 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 8 Feb 2008 11:09:53 -0500
Subject: [Beowulf] Setting up a new Beowulf cluster
In-Reply-To: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com>
References: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com>
Message-ID: <e4d4fd070802080809w352a2d7etedbcc30bf265e59d@mail.gmail.com>

Berkley,
I'm looking forward to the forthcoming advice, but as a dilettetante in the
field, I'll take the opportunity to presage two things:

1. RGB will say Fedora. You said Physics. Think of RGB has your Guardian
Angel.

2. Everyone will say, "how many nodes? what kind of budget are we talking
about? Which applications are  you targetting first, or project to need?" I
think the most helpful thing would be to project the number of nodes you
have in mind, e.g. 8, like me, or 8096? Makes huge differences in the
network topolgies possible. Is the budget more like "using a dozen or so
available outmoded workstations" or "they are building a two acre site to
house it now"?

Peter

On Feb 7, 2008 10:45 AM, Berkley Starks <bjtstarks at gmail.com> wrote:

> Hello all,
>
> I've been a computer user for the past several years working in different
> areas of the IT world.  I've recently been commissioned by my university to
> set up the first operating Beowulf Cluster.
>
> I'm am moderately familiar with the Linux OS, having ran it for the past
> several years using the distro's of Debian, Ubuntu, Fedora Core, and
> Mandriva.
>
> With setting up this new cluster I would like any advice possible on what
> OS to use, how to set it up, and any other pertinent information that I
> might need.
>
> Thanks,
>
> Berkley Starks
>
> Oh, and the cluster will be used for computational physics.  I am a
> physics major making it for the physics department here.  It will need to be
> able to use C++ and Fortran at a bare minimum.
>
> Thanks again
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080208/9fd29ad3/attachment.html>

From coutinho at dcc.ufmg.br  Fri Feb  8 08:11:03 2008
From: coutinho at dcc.ufmg.br (Bruno Coutinho)
Date: Fri, 8 Feb 2008 14:11:03 -0200
Subject: [Beowulf] getting kubuntu to perform as a cluster os
In-Reply-To: <a31cd3860802060802n1720061aha28f73e7ab8adf82@mail.gmail.com>
References: <a31cd3860802060802n1720061aha28f73e7ab8adf82@mail.gmail.com>
Message-ID: <a8d96dec0802080811k4d19202hf6f1e93c713bb63d@mail.gmail.com>

install pvm or mpi:
there are several MPI implementations like mpich2 openmpi or lam/mpi
OBS:
Never install more than one MPI implementation in your cluster, or you can
get werid errors running MPI aplications.
One safe way to install several mpi implementations is putting them in
nonstandard directories and controlling wich is acessible via the modules
program: http://modules.sourceforge.net/

change the kernel from *-generic (desktop) to *-server

it's good to have the home directory unique globally putting it in a NFS
server (package nfs-kernel-server)

to get more productivity from your cluster, you can install a cluster
scheduler like sun grid engine, pbs, torque, etc...

You have to select a authentication method for your cluster:
Local files replicated in eevery node (for small clusters)
NIS (insecure)
LDAP


2008/2/6, Jon Aquilina <eagles051387 at gmail.com>:
>
> what would be necessary to get a normal desktop os such as kubuntu to run
> as a clusterable os
>
> --
> Jonathan Aquilina
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080208/8cb89973/attachment.html>

From rgb at phy.duke.edu  Fri Feb  8 08:11:31 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 8 Feb 2008 11:11:31 -0500 (EST)
Subject: [Beowulf] Setting up a new Beowulf cluster
In-Reply-To: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com>
References: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802081101580.23523@cain.rgb.private.net>

On Thu, 7 Feb 2008, Berkley Starks wrote:

> Hello all,
>
> I've been a computer user for the past several years working in different
> areas of the IT world.  I've recently been commissioned by my university to
> set up the first operating Beowulf Cluster.
>
> I'm am moderately familiar with the Linux OS, having ran it for the past
> several years using the distro's of Debian, Ubuntu, Fedora Core, and
> Mandriva.
>
> With setting up this new cluster I would like any advice possible on what OS
> to use, how to set it up, and any other pertinent information that I might
> need.

This question has been answered on-list in detail a few zillion times.
I'd suggest consulting (in rough order):

   a) The list archives (now that you're a member you can get to them,
although they are digested and googleable for the most part anyway).

   b) Google.  For example, there is a lovely howto here:

     http://www.linux.org/docs/ldp/howto/Parallel-Processing-HOWTO.html

that is remarkably current and a good quick place to start.

   c) Feel free to browse my free online book here:

     http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book.php

I'm working on making it paper-printable via lulu, but I need time I
don't have and so that project languishes a bit.  You "can" get a paper
copy there if you want, but it is pretty much what is on the free
website including the holes.

> Oh, and the cluster will be used for computational physics.  I am a physics
> major making it for the physics department here.  It will need to be able to
> use C++ and Fortran at a bare minimum.

C, C++ and Fortran are all no problem.  The more important questions
are:

   a) How coupled are the parallel tasks?  That is, do you want a cluster
that can run N independent jobs on N independent nodes (where the jobs
don't communicate with each other at all), or do you want a cluster
where the N nodes all do work on a common task as part of one massive
parallel program?  If the former, you're in luck and cluster design is
easy and the cluster purchase will be cheap.

   b) If they are coupled, are the tasks "tightly coupled" so each
subtask can only advance a little bit before communications are required
in order to take the next step?  "Synchronous" so all steps have to be
completed on all nodes before any can advance?  Are the messages really
big (bandwidth limited) or tiny and frequent (latency limited)?

If any of these latter answers are "yes", post a detailed description of
the tasks (as best you can) to get some advice on choosing a network, as
that's the design parameter that is largely controlled by the answers.

    rgb

>
> Thanks again
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From kohlja at ornl.gov  Fri Feb  8 08:15:26 2008
From: kohlja at ornl.gov (kohlja at ornl.gov)
Date: Fri, 8 Feb 2008 11:15:26 -0500
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802080504480.23523@cain.rgb.private.net>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
	<20080207185304.GA11286@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>
	<20080207221132.GA26027@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802080504480.23523@cain.rgb.private.net>
Message-ID: <20080208161526.GA25597@neo.csm.ornl.gov>

Awesome Strangelove Reference...!  :-D

"I Have A Plan...!"  :-)

Yep, I am now getting inundated with people having rsh/ssh problems
with PVM, so a higher power clearly wants me to better document this.

Thanks Much, Will Do...  :)

	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)

  On Fri, Feb 08, 2008 at 05:35:31AM -0500, Robert G. Brown wrote:
  > On Thu, 7 Feb 2008, kohlja at ornl.gov wrote:

  >> I admit this may be an antiquated cynical mentality, and I
  >> further concur that PVMNETSOCKPORT is an obvious omission
  >> in the basic documentation/faq...

  > As they say, you can't RTFM if there ain't no FM... (or if the solution
  > exists but isn't there).

  > One is reminded of Dr. Strangelove, where the president (Peter Sellers)
  > has just learned that if the maverick B52 piloted by Slim Pickens gets
  > through, a doomsday device that is supposed to deter first nuclear
  > strikes will go off that will destroy the world.  Unfortunately, the
  > Soviet Union didn't actually tell us that it was built.  Dr.
  > Strangelove (Peter Sellers), after musing for a moment on the brilliance
  > of the concept, turns and says in an increasingly shrill voice:

  >   But...the whole point of the Doomsday Machine...is lost...if you keep
  >   it a SECRET. Why didn't you tell the world, eh?

  > Hmmm...;-)

  >    rgb

  >> Thanks for your suggested text!  (And the suggestion to
  >> enhance our coverage of rsh/ssh usage... :-)

  > Ya, well.  Just now finished telling the umptieth would-be PVM user how
  > to go about it in an email message, augmenting further online docs such
  > as this one:

  >   http://www.uow.edu.au/~suresh/web/cfamily/pvm.html

  > which is actually pretty decent, although I generally use the ssh
  > default dsa instead of rsa since on linux boxes it invariably works.
  > But better than forcing each user to employ google to snarf out
  > solutions to each problem they encounter, how much better to write a
  > really nice "Getting Started with PVM" or perhaps better still, a "PVM
  > HOWTO" on tldp.org.  Publish there, and be sure to include a copy in
  > plain sight in /usr/share/pvm3/PVM_HOWTO.

  > Truthfully, good documentation, especially a walkthrough tutorial on
  > getting started (including sample code or links to sample code) that
  > takes a would-be user from "yum install pvm\*" to executing a Real
  > Parallel Program (however trivial) on a two node cluster would really
  > encourage the use of the library.  Adding a bit more (such as a PVM
  > program development template) would be only icing on the cake, so to
  > speak.

  > If I had the time I'd write it myself.  I've already got a project_pvm
  > program template up on the web, but it is sadly underdocumented through
  > the setup of PVM itself.

  >    rgb

  >>
  >> All the Best,
  >>
  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
  >>
  >>  On Thu, Feb 07, 2008 at 04:42:21PM -0500, Robert G. Brown wrote:
  >>  >>  > It would really, really help if man pvm (or man pvmd or man 
  >> pvm_intro)
  >>  >>  > documented a suitable firewall setting that will let PVM function
  >>  >>  > without just turning off the firewall altogether.  There is no pvm
  >>  >> setup
  >>  >>  > in /etc/services, for example, no pvm checkbox in the panels 
  >> managed by
  >>  >>  > system-config-firewall in the latest Fedoras, no suggestion as to 
  >> what
  >>  >>  > what protected port(s) or ranges one has to enable explicitly.  In 
  >> fact
  >>  >>  > for once even google is failing me -- I'm not finding a lot of
  >>  >>  > documentation or remarks by ANYONE on what ports pvm needs open
  >>  >> (besides
  >>  >>  > ssh, which obviously is open and works).  Usually as long as the
  >>  >>  > spawning of a network application itself works using an enabled
  >>  >>  > protected port (in this case, I would have expected ssh), the 
  >> secondary
  >>  >>  > ports opened in unprotected space just work.  Am I wrong in this?  
  >> Do I
  >>  >>  > need to explicitly open more ports somewhere?
  >>  >>
  >>  >> Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use as
  >>  >> many ports as you have machines in your cluster, or could use just 1.  
  >> :-}
  >>  >>
  >>  >> Normally, the master pvmd creates/accepts connections over a small
  >>  >> set of ports, possibly 1, but if PvmRouteDirect is enabled in a PVM
  >>  >> application, then a myriad of direct-connection socket links are
  >>  >> created, to link whichever machines the local PVM application tasks
  >>  >> communicate with, on a demand-driven basis...
  >>  >>
  >>  >> So it's not generally possible to specify an explicit "range" of 
  >> ports.
  >>  >> However, it _is_ possible to set the "starting" port for this 
  >> collection,
  >>  >> using the aforementioned "$PVMNETSOCKPORT" environment variable.
  >>
  >>  > OK, I'm giving this a try.  Although I'd have to ask why pvmd doesn't 
  >> do
  >>  > the fork thing and clone a single open port on which it listens into a
  >>  > dynamically allocated port that inherits from the open one.  In
  >>  > principle one only needs a single port to be open to connect to pretty
  >>  > much any network based application, or so I had thought.  At least, I 
  >> do
  >>  > that in xmlsysd and never have to punch more than one porthole through 
  >> a
  >>  > firewall.
  >>
  >>  > Hmmm, it's working sort of -- looks like I need to open UPD ports,
  >>  > right, not TCP?  Having trouble on one host where I've punched the hole
  >>  > but didn't >>locally<< set PVMNETSOCKPORT to match, so I'm trying again
  >>  > with the local environment variable set.
  >>
  >>  > Yup, that works.
  >>
  >>  > So I'm guessing that pvmd reads it as it starts up wherever.  Why does
  >>  > it need to do this on a client?  Can't the port(s) be passed from the
  >>  > master when it starts up pvmd?
  >>
  >>  >> This sets the first port that PVM will try to use, and all subsequent
  >>  >> ports will usually be consecutive positive increments of that starting
  >>  >> port (i.e. PVMNETSOCKPORT++... :-).
  >>  >>
  >>  >> So in most cases, you could probably plan on opening up a 100 or 1000
  >>  >> ports _somewhere_ in your firewall, depending on your needs, and then
  >>  >> just tell PVM where to start, using $PVMNETSOCKPORT...
  >>  >>
  >>  >> I've always considered this solution a bit of a kludge, which is why
  >>  >> it doesn't show up in the man pages, but if it works well enough to
  >>  >> save people lots of hassle, then I can add some commentary on it...?
  >>
  >>  > Kludge or not, how can you have an environment variable in an
  >>  > application and not provide knowledge of it or instructions on its use
  >>  > in the man page?  Something like:
  >>
  >>  >  PVM requires open ports on target hosts to function.  Many hosts are
  >>  >  installed with strong firewall rules by default.  If you install pvm 
  >> on
  >>  >  a slave and pvm appears to hang when you attempt to add it, eventually
  >>  >  timing out without success, consider adding the following to your 
  >> local
  >>  >  personal or system environment (in, for example, ~/.bash_profile on 
  >> all
  >>  >  hosts):
  >>
  >>  >    PVMNETSOCKPORT=10000
  >>  >    export PVMNETSOCKPORT
  >>
  >>  >  Then configure your firewall(s) to open a range of udp ports starting
  >>  >  at this value, such as 10000-11024 (which need be any larger than the
  >>  >  largest number of machines you expect to have in your virtual 
  >> machine).
  >>
  >>  > However a better solution still is to have the daemon fork on a single
  >>  > "permanent" port address > 1024, e.g. 10000, and get a negotiated
  >>  > connection in the upper (non-protected) port space that way.
  >>
  >>  >> It may depend on the firewall settings, but a nice "Connection
  >>  >> Refused" would usually go a long way toward diagnosing things,
  >>  >> whereas the more secure firewall alternative of simply
  >>  >> "no response" would only result in a "timed out" PVM message...
  >>  >>
  >>  >> I'm open to suggestions on ways to identify or diagnose the 
  >> problem...!
  >>
  >>  > As I said, document EVERYTHING in the man page(s).  It is what it is 
  >> for.
  >>  > Lots of users do, in fact, RTFM but get frustrated and give up when 
  >> they
  >>  > try something and it just doesn't work and they can't see why.
  >>
  >>  > On the same line, a perennial problem with PVM is getting it to work
  >>  > with rsh and ssh.  In fact, half the problems I help people with who
  >>  > randomly write me is getting it to work with one or the other.  The
  >>  > internal diagnostics are certainly very helpful, at this point, but it
  >>  > would also be worth adding a new man page like pvm_rsh that does 
  >> nothing
  >>  > but walk users through the ritual of setting PVM_RSH and establishing
  >>  > appropriate e.g. ssh keys.
  >>
  >>  > Just a thought or two.
  >>
  >>  >    rgb
  >>
  >>  >>
  >>  >> Thanks Much for your interest and feedback!
  >>  >>
  >>  >> All the Best,
  >>  >>
  >>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
  >>  >>
  >>  >>  > I actually help a lot of people get started with PVM (they write me
  >>  >>  > offline because I have a template PVM tarball up on my personal
  >>  >> website)
  >>  >>  > and the more I know, the better I can help them...;-)
  >>  >>
  >>  >>  >    rgb
  >>  >>
  >>  >>  > --
  >>  >>  > Robert G. Brown                            Phone(cell): 
  >> 1-919-280-8443
  >>  >>  > Duke University Physics Dept, Box 90305
  >>  >>  > Durham, N.C. 27708-0305
  >>  >>  > Web: http://www.phy.duke.edu/~rgb
  >>  >>  > Book of Lilith Website: 
  >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  >>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
  >>  >>
  >>  >> 
  >> (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:
  >>  >>
  >>  >>   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  They
  >>  >>   Oak Ridge National Laboratory              still owe you money, 
  >> Fool!"
  >>  >>   kohlja at ornl.gov
  >>  >>   http://www.csm.ornl.gov/~kohl/          Long Live Curtis Blues!!!
  >>  >>
  >>  >> 
  >> :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
  >>  >>
  >>
  >>  > --
  >>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
  >>  > Duke University Physics Dept, Box 90305
  >>  > Durham, N.C. 27708-0305
  >>  > Web: http://www.phy.duke.edu/~rgb
  >>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
  >>

  > -- 
  > Robert G. Brown                            Phone(cell): 1-919-280-8443
  > Duke University Physics Dept, Box 90305
  > Durham, N.C. 27708-0305
  > Web: http://www.phy.duke.edu/~rgb
  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From ascheinine at tuffmail.us  Fri Feb  8 08:19:29 2008
From: ascheinine at tuffmail.us (Alan Louis Scheinine)
Date: Fri, 08 Feb 2008 17:19:29 +0100
Subject: [Beowulf] Rocks 4.3 and user accounts
In-Reply-To: <1202214812.6258.25.camel@earth>
References: <1202214812.6258.25.camel@earth>
Message-ID: <47AC8111.2020300@tuffmail.us>

I have not used Rocks, nevertheless a solution that
comes to mind is to have a soft (symbolic) link on the
frontend from /home to /export/home .  Such a solution
would only work if there is no directory /home on the
frontend with users not in /export/home , such as a user
associated with a database only on the front end.
Best regards,
Alan
-- 

  Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna
  Center for Advanced Studies, Research, and Development in Sardinia

  Postal Address:               |  Physical Address for FedEx, UPS, DHL:
  ---------------               |  -------------------------------------
  Alan Scheinine                |  Alan Scheinine
  c/o CRS4                      |  c/o CRS4
  C.P. n. 25                    |  Loc. Pixina Manna Edificio 1
  09010 Pula (Cagliari), Italy  |  09010 Pula (Cagliari), Italy

  Email: scheinin at crs4.it

  Phone: 070 9250 238  [+39 070 9250 238]
  Fax:   070 9250 216 or 220  [+39 070 9250 216 or +39 070 9250 220]
  Operator at reception: 070 9250 1  [+39 070 9250 1]
  Mobile phone: 347 7990472  [+39 347 7990472]


From tjrc at sanger.ac.uk  Fri Feb  8 08:29:53 2008
From: tjrc at sanger.ac.uk (Tim Cutts)
Date: Fri, 8 Feb 2008 16:29:53 +0000
Subject: [Beowulf] getting kubuntu to perform as a cluster os
In-Reply-To: <a31cd3860802060802n1720061aha28f73e7ab8adf82@mail.gmail.com>
References: <a31cd3860802060802n1720061aha28f73e7ab8adf82@mail.gmail.com>
Message-ID: <F12F4134-AC75-4B26-976F-05D2FFEE3255@sanger.ac.uk>


On 6 Feb 2008, at 4:02 pm, Jon Aquilina wrote:

> what would be necessary to get a normal desktop os such as kubuntu  
> to run as
> a clusterable os

Not much.  It's just Linux at the end of the day.  We use Debian as  
our cluster OS, and all kubuntu is is a slightly tarted up version of  
Debian with better graphics hardware support.  Debian actually lends  
itself to cluster use just fine, and includes a lot of useful cluster  
software (especially in the upcoming "lenny" release, due in  
September, which I note will have a Lustre 1.6.x client package as  
part of the standard distribution)

Tim


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From daniel at cttc.upc.edu  Fri Feb  8 08:30:14 2008
From: daniel at cttc.upc.edu (Daniel Fernandez)
Date: Fri, 08 Feb 2008 17:30:14 +0100
Subject: [Beowulf] Infiniband and multi-cpu configuration
Message-ID: <1202488214.10138.289.camel@qeldroma.cttc.org>

Hi beowulf users,

We'll move our GigE structure to an Infiniband 4X DDR one ( prices have
dropped quite a bit ). Also we'll build on AMD Opteron up to 4 or 8
cores. 

In case of 8 cores:

	A 4 socket dual-core solution *must* scale better than a 2 socket
quad-core one, that is talking about memory bandwith ( nearly double ).
On the other hand, the Hypertransport links on Opteron 2000/8000 series
theorically rated at a 8 GB/s per link, so that would be as equal as 4X
SDR Infiniband...

	A configuration like:

		 2 PCs with 2 socket and 2 dual-core Opterons linked together with
Infiniband 4X DDR ( 8 cores )

	Should perform as:

		 1 PC with 4 socket ( dual-core ) Opteron based.

	Saving cost on Infiniband hardware.

	When maximizing cores per node, reducing network connections and
network protocol overhead and considering Opteron memory architecture...
is 8 ( 4 sockets * 2 cores ) an adequate number or a 4 ( 2 sockets * 2
cores ) is better?

Also onboard memory Inifiniband HCAs must perform better than
memory-less ones, that is... but how much? any real numbers out there?

Thanks in advance.
	
---
Daniel Fernandez <daniel at cttc.upc.edu>
Heat and Mass Transfer Center - CTTC
www.cttc.upc.edu
UPC Campus Industrials , ETSEIAT , TR4 Building

	
-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que est? net.
For all your IT requirements visit: http://www.transtec.co.uk


From rgb at phy.duke.edu  Fri Feb  8 08:35:03 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 8 Feb 2008 11:35:03 -0500 (EST)
Subject: [Beowulf] Setting up a new Beowulf cluster
In-Reply-To: <Pine.LNX.4.64.0802081101580.23523@cain.rgb.private.net>
References: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com>
	<Pine.LNX.4.64.0802081101580.23523@cain.rgb.private.net>
Message-ID: <Pine.LNX.4.64.0802081131280.23523@cain.rgb.private.net>

Oops.  I can't make Peter out to be a liar, so:

   "Fedora"

There. I said it and I'm glad.

I don't know what it all means, or course -- you can build a perfectly
peachy cluster on Debian, SuSE, FreeBSD, Solaris, Windows in a pinch
(although you'll hate yourself LONG before you're done if you try), any
other linux distribution you might find, and ... yes ... Fedora.

Oh, and did I mention Scyld and the "professional" turnkey type
clusters?

But let's get into all that in round two, when you have a bit more of an
idea what goes into cluster design and when we all have a bit more of an
idea of what KIND of cluster we need to be designing...;-)

Then I might even say "Fedora" again!

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From rgb at phy.duke.edu  Fri Feb  8 08:40:50 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 8 Feb 2008 11:40:50 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <20080208161526.GA25597@neo.csm.ornl.gov>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
	<20080207185304.GA11286@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>
	<20080207221132.GA26027@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802080504480.23523@cain.rgb.private.net>
	<20080208161526.GA25597@neo.csm.ornl.gov>
Message-ID: <Pine.LNX.4.64.0802081135250.23523@cain.rgb.private.net>

On Fri, 8 Feb 2008, kohlja at ornl.gov wrote:

> Awesome Strangelove Reference...!  :-D
>
> "I Have A Plan...!"  :-)
>
> Yep, I am now getting inundated with people having rsh/ssh problems
> with PVM, so a higher power clearly wants me to better document this.
>
> Thanks Much, Will Do...  :)

Excellentamundo!  At some point at your convenience in the future when
you have all kinds of time to metaphorically sit down and REALLY work
over PVM, I have about 800 specific suggestions for bringing it up to
current and modern and everything.  Just a wee list.  You know:

   * Purge aimk for all time, die die die
   * Actually use the FSH so e.g. apropos pvm works.
   * Document the hell out of everything
   * Rewrite the network back end in a way that openly encourages high
end network vendors to contribute reusable non-IP native drivers
   * Add a (possibly macro-driven) middle layer that makes PVM into MPI
as well -- one set of actual message-passing functions, two conformally
mapped call interfaces.
   * Make Ctrl-C work so one can break out of the annoying timeout on add
hosts when things don't work.
   * Make the console capable of cleaning up after a crash or
interruption.

that kind of thing...;-)

    rgb

>
> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
>
>  On Fri, Feb 08, 2008 at 05:35:31AM -0500, Robert G. Brown wrote:
>  > On Thu, 7 Feb 2008, kohlja at ornl.gov wrote:
>
>  >> I admit this may be an antiquated cynical mentality, and I
>  >> further concur that PVMNETSOCKPORT is an obvious omission
>  >> in the basic documentation/faq...
>
>  > As they say, you can't RTFM if there ain't no FM... (or if the solution
>  > exists but isn't there).
>
>  > One is reminded of Dr. Strangelove, where the president (Peter Sellers)
>  > has just learned that if the maverick B52 piloted by Slim Pickens gets
>  > through, a doomsday device that is supposed to deter first nuclear
>  > strikes will go off that will destroy the world.  Unfortunately, the
>  > Soviet Union didn't actually tell us that it was built.  Dr.
>  > Strangelove (Peter Sellers), after musing for a moment on the brilliance
>  > of the concept, turns and says in an increasingly shrill voice:
>
>  >   But...the whole point of the Doomsday Machine...is lost...if you keep
>  >   it a SECRET. Why didn't you tell the world, eh?
>
>  > Hmmm...;-)
>
>  >    rgb
>
>  >> Thanks for your suggested text!  (And the suggestion to
>  >> enhance our coverage of rsh/ssh usage... :-)
>
>  > Ya, well.  Just now finished telling the umptieth would-be PVM user how
>  > to go about it in an email message, augmenting further online docs such
>  > as this one:
>
>  >   http://www.uow.edu.au/~suresh/web/cfamily/pvm.html
>
>  > which is actually pretty decent, although I generally use the ssh
>  > default dsa instead of rsa since on linux boxes it invariably works.
>  > But better than forcing each user to employ google to snarf out
>  > solutions to each problem they encounter, how much better to write a
>  > really nice "Getting Started with PVM" or perhaps better still, a "PVM
>  > HOWTO" on tldp.org.  Publish there, and be sure to include a copy in
>  > plain sight in /usr/share/pvm3/PVM_HOWTO.
>
>  > Truthfully, good documentation, especially a walkthrough tutorial on
>  > getting started (including sample code or links to sample code) that
>  > takes a would-be user from "yum install pvm\*" to executing a Real
>  > Parallel Program (however trivial) on a two node cluster would really
>  > encourage the use of the library.  Adding a bit more (such as a PVM
>  > program development template) would be only icing on the cake, so to
>  > speak.
>
>  > If I had the time I'd write it myself.  I've already got a project_pvm
>  > program template up on the web, but it is sadly underdocumented through
>  > the setup of PVM itself.
>
>  >    rgb
>
>  >>
>  >> All the Best,
>  >>
>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
>  >>
>  >>  On Thu, Feb 07, 2008 at 04:42:21PM -0500, Robert G. Brown wrote:
>  >>  >>  > It would really, really help if man pvm (or man pvmd or man
>  >> pvm_intro)
>  >>  >>  > documented a suitable firewall setting that will let PVM function
>  >>  >>  > without just turning off the firewall altogether.  There is no pvm
>  >>  >> setup
>  >>  >>  > in /etc/services, for example, no pvm checkbox in the panels
>  >> managed by
>  >>  >>  > system-config-firewall in the latest Fedoras, no suggestion as to
>  >> what
>  >>  >>  > what protected port(s) or ranges one has to enable explicitly.  In
>  >> fact
>  >>  >>  > for once even google is failing me -- I'm not finding a lot of
>  >>  >>  > documentation or remarks by ANYONE on what ports pvm needs open
>  >>  >> (besides
>  >>  >>  > ssh, which obviously is open and works).  Usually as long as the
>  >>  >>  > spawning of a network application itself works using an enabled
>  >>  >>  > protected port (in this case, I would have expected ssh), the
>  >> secondary
>  >>  >>  > ports opened in unprotected space just work.  Am I wrong in this?
>  >> Do I
>  >>  >>  > need to explicitly open more ports somewhere?
>  >>  >>
>  >>  >> Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use as
>  >>  >> many ports as you have machines in your cluster, or could use just 1.
>  >> :-}
>  >>  >>
>  >>  >> Normally, the master pvmd creates/accepts connections over a small
>  >>  >> set of ports, possibly 1, but if PvmRouteDirect is enabled in a PVM
>  >>  >> application, then a myriad of direct-connection socket links are
>  >>  >> created, to link whichever machines the local PVM application tasks
>  >>  >> communicate with, on a demand-driven basis...
>  >>  >>
>  >>  >> So it's not generally possible to specify an explicit "range" of
>  >> ports.
>  >>  >> However, it _is_ possible to set the "starting" port for this
>  >> collection,
>  >>  >> using the aforementioned "$PVMNETSOCKPORT" environment variable.
>  >>
>  >>  > OK, I'm giving this a try.  Although I'd have to ask why pvmd doesn't
>  >> do
>  >>  > the fork thing and clone a single open port on which it listens into a
>  >>  > dynamically allocated port that inherits from the open one.  In
>  >>  > principle one only needs a single port to be open to connect to pretty
>  >>  > much any network based application, or so I had thought.  At least, I
>  >> do
>  >>  > that in xmlsysd and never have to punch more than one porthole through
>  >> a
>  >>  > firewall.
>  >>
>  >>  > Hmmm, it's working sort of -- looks like I need to open UPD ports,
>  >>  > right, not TCP?  Having trouble on one host where I've punched the hole
>  >>  > but didn't >>locally<< set PVMNETSOCKPORT to match, so I'm trying again
>  >>  > with the local environment variable set.
>  >>
>  >>  > Yup, that works.
>  >>
>  >>  > So I'm guessing that pvmd reads it as it starts up wherever.  Why does
>  >>  > it need to do this on a client?  Can't the port(s) be passed from the
>  >>  > master when it starts up pvmd?
>  >>
>  >>  >> This sets the first port that PVM will try to use, and all subsequent
>  >>  >> ports will usually be consecutive positive increments of that starting
>  >>  >> port (i.e. PVMNETSOCKPORT++... :-).
>  >>  >>
>  >>  >> So in most cases, you could probably plan on opening up a 100 or 1000
>  >>  >> ports _somewhere_ in your firewall, depending on your needs, and then
>  >>  >> just tell PVM where to start, using $PVMNETSOCKPORT...
>  >>  >>
>  >>  >> I've always considered this solution a bit of a kludge, which is why
>  >>  >> it doesn't show up in the man pages, but if it works well enough to
>  >>  >> save people lots of hassle, then I can add some commentary on it...?
>  >>
>  >>  > Kludge or not, how can you have an environment variable in an
>  >>  > application and not provide knowledge of it or instructions on its use
>  >>  > in the man page?  Something like:
>  >>
>  >>  >  PVM requires open ports on target hosts to function.  Many hosts are
>  >>  >  installed with strong firewall rules by default.  If you install pvm
>  >> on
>  >>  >  a slave and pvm appears to hang when you attempt to add it, eventually
>  >>  >  timing out without success, consider adding the following to your
>  >> local
>  >>  >  personal or system environment (in, for example, ~/.bash_profile on
>  >> all
>  >>  >  hosts):
>  >>
>  >>  >    PVMNETSOCKPORT=10000
>  >>  >    export PVMNETSOCKPORT
>  >>
>  >>  >  Then configure your firewall(s) to open a range of udp ports starting
>  >>  >  at this value, such as 10000-11024 (which need be any larger than the
>  >>  >  largest number of machines you expect to have in your virtual
>  >> machine).
>  >>
>  >>  > However a better solution still is to have the daemon fork on a single
>  >>  > "permanent" port address > 1024, e.g. 10000, and get a negotiated
>  >>  > connection in the upper (non-protected) port space that way.
>  >>
>  >>  >> It may depend on the firewall settings, but a nice "Connection
>  >>  >> Refused" would usually go a long way toward diagnosing things,
>  >>  >> whereas the more secure firewall alternative of simply
>  >>  >> "no response" would only result in a "timed out" PVM message...
>  >>  >>
>  >>  >> I'm open to suggestions on ways to identify or diagnose the
>  >> problem...!
>  >>
>  >>  > As I said, document EVERYTHING in the man page(s).  It is what it is
>  >> for.
>  >>  > Lots of users do, in fact, RTFM but get frustrated and give up when
>  >> they
>  >>  > try something and it just doesn't work and they can't see why.
>  >>
>  >>  > On the same line, a perennial problem with PVM is getting it to work
>  >>  > with rsh and ssh.  In fact, half the problems I help people with who
>  >>  > randomly write me is getting it to work with one or the other.  The
>  >>  > internal diagnostics are certainly very helpful, at this point, but it
>  >>  > would also be worth adding a new man page like pvm_rsh that does
>  >> nothing
>  >>  > but walk users through the ritual of setting PVM_RSH and establishing
>  >>  > appropriate e.g. ssh keys.
>  >>
>  >>  > Just a thought or two.
>  >>
>  >>  >    rgb
>  >>
>  >>  >>
>  >>  >> Thanks Much for your interest and feedback!
>  >>  >>
>  >>  >> All the Best,
>  >>  >>
>  >>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
>  >>  >>
>  >>  >>  > I actually help a lot of people get started with PVM (they write me
>  >>  >>  > offline because I have a template PVM tarball up on my personal
>  >>  >> website)
>  >>  >>  > and the more I know, the better I can help them...;-)
>  >>  >>
>  >>  >>  >    rgb
>  >>  >>
>  >>  >>  > --
>  >>  >>  > Robert G. Brown                            Phone(cell):
>  >> 1-919-280-8443
>  >>  >>  > Duke University Physics Dept, Box 90305
>  >>  >>  > Durham, N.C. 27708-0305
>  >>  >>  > Web: http://www.phy.duke.edu/~rgb
>  >>  >>  > Book of Lilith Website:
>  >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>  >>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>  >>  >>
>  >>  >>
>  >> (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:
>  >>  >>
>  >>  >>   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  They
>  >>  >>   Oak Ridge National Laboratory              still owe you money,
>  >> Fool!"
>  >>  >>   kohlja at ornl.gov
>  >>  >>   http://www.csm.ornl.gov/~kohl/          Long Live Curtis Blues!!!
>  >>  >>
>  >>  >>
>  >> :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
>  >>  >>
>  >>
>  >>  > --
>  >>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
>  >>  > Duke University Physics Dept, Box 90305
>  >>  > Durham, N.C. 27708-0305
>  >>  > Web: http://www.phy.duke.edu/~rgb
>  >>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>  >>
>
>  > --
>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
>  > Duke University Physics Dept, Box 90305
>  > Durham, N.C. 27708-0305
>  > Web: http://www.phy.duke.edu/~rgb
>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From jmdavis1 at vcu.edu  Fri Feb  8 09:01:16 2008
From: jmdavis1 at vcu.edu (Mike Davis)
Date: Fri, 08 Feb 2008 12:01:16 -0500
Subject: [Beowulf] Setting up a new Beowulf cluster
In-Reply-To: <Pine.LNX.4.64.0802081131280.23523@cain.rgb.private.net>
References: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com>	<Pine.LNX.4.64.0802081101580.23523@cain.rgb.private.net>
	<Pine.LNX.4.64.0802081131280.23523@cain.rgb.private.net>
Message-ID: <47AC8ADC.90701@vcu.edu>

Let us not forget CENTos for those of us that require a longer lifespan 
for our operating systems.

I have one cluster that uses CENTOS 3, CENTOS 4 and controls its storage 
via Solaris 10.  The next major upgrade will retire the CentOS 3 
machines (2.4 dual single core ghz opterons) and use CENTOS 5 on Dual 
Dual core or Dual Quad Core opterons.


Mike Davis


Robert G. Brown wrote:
> Oops.  I can't make Peter out to be a liar, so:
>
>   "Fedora"
>
> There. I said it and I'm glad.
>
> I don't know what it all means, or course -- you can build a perfectly
> peachy cluster on Debian, SuSE, FreeBSD, Solaris, Windows in a pinch
> (although you'll hate yourself LONG before you're done if you try), any
> other linux distribution you might find, and ... yes ... Fedora.
>
> Oh, and did I mention Scyld and the "professional" turnkey type
> clusters?
>
> But let's get into all that in round two, when you have a bit more of an
> idea what goes into cluster design and when we all have a bit more of an
> idea of what KIND of cluster we need to be designing...;-)
>
> Then I might even say "Fedora" again!
>
>    rgb
>


From kohlja at ornl.gov  Fri Feb  8 09:22:02 2008
From: kohlja at ornl.gov (kohlja at ornl.gov)
Date: Fri, 8 Feb 2008 12:22:02 -0500
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802081135250.23523@cain.rgb.private.net>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
	<20080207185304.GA11286@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>
	<20080207221132.GA26027@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802080504480.23523@cain.rgb.private.net>
	<20080208161526.GA25597@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802081135250.23523@cain.rgb.private.net>
Message-ID: <20080208172202.GA23503@neo.csm.ornl.gov>

  On Fri, Feb 08, 2008 at 11:40:50AM -0500, Robert G. Brown wrote:
  > On Fri, 8 Feb 2008, kohlja at ornl.gov wrote:
  >> Awesome Strangelove Reference...!  :-D
  >>
  >> "I Have A Plan...!"  :-)
  >>
  >> Yep, I am now getting inundated with people having rsh/ssh problems
  >> with PVM, so a higher power clearly wants me to better document this.
  >>
  >> Thanks Much, Will Do...  :)

  > Excellentamundo!

I'm already getting lots of practice explaining how to get this stuff
to work for 3 separate PVM users...  :)

  > At some point at your convenience in the future when
  > you have all kinds of time to metaphorically sit down and REALLY work
  > over PVM...

Ahhh...  Lemme picture that moment...  :-D

  > I have about 800 specific suggestions for bringing it up to
  > current and modern and everything.  Just a wee list.  You know:

  >   * Purge aimk for all time, die die die

Ha ha ha...  You don't like "aimk"...?  :-)

Yeah, PVM was originally pre-autoconf...  Too bad, eh...?  :)

  >   * Actually use the FSH so e.g. apropos pvm works.

I'm assuming you don't mean FSH="Follicle Stimulating Hormone";
did you mean "SSH", or am I clueless...?

Sorry, I guess I'm not "up" on all the latest \/32/\/4[vL/\r...  :-}

  >   * Document the hell out of everything

Yes!  :D

  >   * Rewrite the network back end in a way that openly encourages high
  > end network vendors to contribute reusable non-IP native drivers

Ha ha ha...  Tried to cater to vendors many times.  See all those funny
arch subdirs in pvm3/src...?  Yeah, been there, done that...

(Though I agree that building on top of some generic "standardized"
networking layer would be "nice" - there are so many to choose from... :)

  >   * Add a (possibly macro-driven) middle layer that makes PVM into MPI
  > as well -- one set of actual message-passing functions, two conformally
  > mapped call interfaces.

You mean like "PVMPI"...?

  http://www.netlib.org/utk/papers/pvmpi/paper.html

Or its offspring "MPI-Glue"...?

  http://www.scientific-computing.de/people/rabenseifner/projects/mpi_glue.html

Or do you mean something completely different...?  :)

  >   * Make Ctrl-C work so one can break out of the annoying timeout on add
  > hosts when things don't work.

Yeah, bummer eh?  :)  Where did Bob Manchek go to anyway...?

(He's the real culprit behind the majority of PVM code, btw,
I merely "inherited" the maintenance job... :)

  >   * Make the console capable of cleaning up after a crash or
  > interruption.

We talked about things we could do there, e.g. to clean up old
leftover /tmp/pvmd.* files, etc, but it was always easier to
just remove the files by hand...!  ;)

Good suggestions, though.  I'll add them to my "to do" list,
along with any others that may come up...?  :-)

Thanks, Man!

	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)

  > that kind of thing...;-)

  >    rgb

  >>
  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
  >>
  >>  On Fri, Feb 08, 2008 at 05:35:31AM -0500, Robert G. Brown wrote:
  >>  > On Thu, 7 Feb 2008, kohlja at ornl.gov wrote:
  >>
  >>  >> I admit this may be an antiquated cynical mentality, and I
  >>  >> further concur that PVMNETSOCKPORT is an obvious omission
  >>  >> in the basic documentation/faq...
  >>
  >>  > As they say, you can't RTFM if there ain't no FM... (or if the solution
  >>  > exists but isn't there).
  >>
  >>  > One is reminded of Dr. Strangelove, where the president (Peter Sellers)
  >>  > has just learned that if the maverick B52 piloted by Slim Pickens gets
  >>  > through, a doomsday device that is supposed to deter first nuclear
  >>  > strikes will go off that will destroy the world.  Unfortunately, the
  >>  > Soviet Union didn't actually tell us that it was built.  Dr.
  >>  > Strangelove (Peter Sellers), after musing for a moment on the 
  >> brilliance
  >>  > of the concept, turns and says in an increasingly shrill voice:
  >>
  >>  >   But...the whole point of the Doomsday Machine...is lost...if you keep
  >>  >   it a SECRET. Why didn't you tell the world, eh?
  >>
  >>  > Hmmm...;-)
  >>
  >>  >    rgb
  >>
  >>  >> Thanks for your suggested text!  (And the suggestion to
  >>  >> enhance our coverage of rsh/ssh usage... :-)
  >>
  >>  > Ya, well.  Just now finished telling the umptieth would-be PVM user how
  >>  > to go about it in an email message, augmenting further online docs such
  >>  > as this one:
  >>
  >>  >   http://www.uow.edu.au/~suresh/web/cfamily/pvm.html
  >>
  >>  > which is actually pretty decent, although I generally use the ssh
  >>  > default dsa instead of rsa since on linux boxes it invariably works.
  >>  > But better than forcing each user to employ google to snarf out
  >>  > solutions to each problem they encounter, how much better to write a
  >>  > really nice "Getting Started with PVM" or perhaps better still, a "PVM
  >>  > HOWTO" on tldp.org.  Publish there, and be sure to include a copy in
  >>  > plain sight in /usr/share/pvm3/PVM_HOWTO.
  >>
  >>  > Truthfully, good documentation, especially a walkthrough tutorial on
  >>  > getting started (including sample code or links to sample code) that
  >>  > takes a would-be user from "yum install pvm\*" to executing a Real
  >>  > Parallel Program (however trivial) on a two node cluster would really
  >>  > encourage the use of the library.  Adding a bit more (such as a PVM
  >>  > program development template) would be only icing on the cake, so to
  >>  > speak.
  >>
  >>  > If I had the time I'd write it myself.  I've already got a project_pvm
  >>  > program template up on the web, but it is sadly underdocumented through
  >>  > the setup of PVM itself.
  >>
  >>  >    rgb
  >>
  >>  >>
  >>  >> All the Best,
  >>  >>
  >>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
  >>  >>
  >>  >>  On Thu, Feb 07, 2008 at 04:42:21PM -0500, Robert G. Brown wrote:
  >>  >>  >>  > It would really, really help if man pvm (or man pvmd or man
  >>  >> pvm_intro)
  >>  >>  >>  > documented a suitable firewall setting that will let PVM 
  >> function
  >>  >>  >>  > without just turning off the firewall altogether.  There is no 
  >> pvm
  >>  >>  >> setup
  >>  >>  >>  > in /etc/services, for example, no pvm checkbox in the panels
  >>  >> managed by
  >>  >>  >>  > system-config-firewall in the latest Fedoras, no suggestion as 
  >> to
  >>  >> what
  >>  >>  >>  > what protected port(s) or ranges one has to enable explicitly.  
  >> In
  >>  >> fact
  >>  >>  >>  > for once even google is failing me -- I'm not finding a lot of
  >>  >>  >>  > documentation or remarks by ANYONE on what ports pvm needs open
  >>  >>  >> (besides
  >>  >>  >>  > ssh, which obviously is open and works).  Usually as long as 
  >> the
  >>  >>  >>  > spawning of a network application itself works using an enabled
  >>  >>  >>  > protected port (in this case, I would have expected ssh), the
  >>  >> secondary
  >>  >>  >>  > ports opened in unprotected space just work.  Am I wrong in 
  >> this?
  >>  >> Do I
  >>  >>  >>  > need to explicitly open more ports somewhere?
  >>  >>  >>
  >>  >>  >> Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use 
  >> as
  >>  >>  >> many ports as you have machines in your cluster, or could use just 
  >> 1.
  >>  >> :-}
  >>  >>  >>
  >>  >>  >> Normally, the master pvmd creates/accepts connections over a small
  >>  >>  >> set of ports, possibly 1, but if PvmRouteDirect is enabled in a 
  >> PVM
  >>  >>  >> application, then a myriad of direct-connection socket links are
  >>  >>  >> created, to link whichever machines the local PVM application 
  >> tasks
  >>  >>  >> communicate with, on a demand-driven basis...
  >>  >>  >>
  >>  >>  >> So it's not generally possible to specify an explicit "range" of
  >>  >> ports.
  >>  >>  >> However, it _is_ possible to set the "starting" port for this
  >>  >> collection,
  >>  >>  >> using the aforementioned "$PVMNETSOCKPORT" environment variable.
  >>  >>
  >>  >>  > OK, I'm giving this a try.  Although I'd have to ask why pvmd 
  >> doesn't
  >>  >> do
  >>  >>  > the fork thing and clone a single open port on which it listens 
  >> into a
  >>  >>  > dynamically allocated port that inherits from the open one.  In
  >>  >>  > principle one only needs a single port to be open to connect to 
  >> pretty
  >>  >>  > much any network based application, or so I had thought.  At least, 
  >> I
  >>  >> do
  >>  >>  > that in xmlsysd and never have to punch more than one porthole 
  >> through
  >>  >> a
  >>  >>  > firewall.
  >>  >>
  >>  >>  > Hmmm, it's working sort of -- looks like I need to open UPD ports,
  >>  >>  > right, not TCP?  Having trouble on one host where I've punched the 
  >> hole
  >>  >>  > but didn't >>locally<< set PVMNETSOCKPORT to match, so I'm trying 
  >> again
  >>  >>  > with the local environment variable set.
  >>  >>
  >>  >>  > Yup, that works.
  >>  >>
  >>  >>  > So I'm guessing that pvmd reads it as it starts up wherever.  Why 
  >> does
  >>  >>  > it need to do this on a client?  Can't the port(s) be passed from 
  >> the
  >>  >>  > master when it starts up pvmd?
  >>  >>
  >>  >>  >> This sets the first port that PVM will try to use, and all 
  >> subsequent
  >>  >>  >> ports will usually be consecutive positive increments of that 
  >> starting
  >>  >>  >> port (i.e. PVMNETSOCKPORT++... :-).
  >>  >>  >>
  >>  >>  >> So in most cases, you could probably plan on opening up a 100 or 
  >> 1000
  >>  >>  >> ports _somewhere_ in your firewall, depending on your needs, and 
  >> then
  >>  >>  >> just tell PVM where to start, using $PVMNETSOCKPORT...
  >>  >>  >>
  >>  >>  >> I've always considered this solution a bit of a kludge, which is 
  >> why
  >>  >>  >> it doesn't show up in the man pages, but if it works well enough 
  >> to
  >>  >>  >> save people lots of hassle, then I can add some commentary on 
  >> it...?
  >>  >>
  >>  >>  > Kludge or not, how can you have an environment variable in an
  >>  >>  > application and not provide knowledge of it or instructions on its 
  >> use
  >>  >>  > in the man page?  Something like:
  >>  >>
  >>  >>  >  PVM requires open ports on target hosts to function.  Many hosts 
  >> are
  >>  >>  >  installed with strong firewall rules by default.  If you install 
  >> pvm
  >>  >> on
  >>  >>  >  a slave and pvm appears to hang when you attempt to add it, 
  >> eventually
  >>  >>  >  timing out without success, consider adding the following to your
  >>  >> local
  >>  >>  >  personal or system environment (in, for example, ~/.bash_profile 
  >> on
  >>  >> all
  >>  >>  >  hosts):
  >>  >>
  >>  >>  >    PVMNETSOCKPORT=10000
  >>  >>  >    export PVMNETSOCKPORT
  >>  >>
  >>  >>  >  Then configure your firewall(s) to open a range of udp ports 
  >> starting
  >>  >>  >  at this value, such as 10000-11024 (which need be any larger than 
  >> the
  >>  >>  >  largest number of machines you expect to have in your virtual
  >>  >> machine).
  >>  >>
  >>  >>  > However a better solution still is to have the daemon fork on a 
  >> single
  >>  >>  > "permanent" port address > 1024, e.g. 10000, and get a negotiated
  >>  >>  > connection in the upper (non-protected) port space that way.
  >>  >>
  >>  >>  >> It may depend on the firewall settings, but a nice "Connection
  >>  >>  >> Refused" would usually go a long way toward diagnosing things,
  >>  >>  >> whereas the more secure firewall alternative of simply
  >>  >>  >> "no response" would only result in a "timed out" PVM message...
  >>  >>  >>
  >>  >>  >> I'm open to suggestions on ways to identify or diagnose the
  >>  >> problem...!
  >>  >>
  >>  >>  > As I said, document EVERYTHING in the man page(s).  It is what it 
  >> is
  >>  >> for.
  >>  >>  > Lots of users do, in fact, RTFM but get frustrated and give up when
  >>  >> they
  >>  >>  > try something and it just doesn't work and they can't see why.
  >>  >>
  >>  >>  > On the same line, a perennial problem with PVM is getting it to 
  >> work
  >>  >>  > with rsh and ssh.  In fact, half the problems I help people with 
  >> who
  >>  >>  > randomly write me is getting it to work with one or the other.  The
  >>  >>  > internal diagnostics are certainly very helpful, at this point, but 
  >> it
  >>  >>  > would also be worth adding a new man page like pvm_rsh that does
  >>  >> nothing
  >>  >>  > but walk users through the ritual of setting PVM_RSH and 
  >> establishing
  >>  >>  > appropriate e.g. ssh keys.
  >>  >>
  >>  >>  > Just a thought or two.
  >>  >>
  >>  >>  >    rgb
  >>  >>
  >>  >>  >>
  >>  >>  >> Thanks Much for your interest and feedback!
  >>  >>  >>
  >>  >>  >> All the Best,
  >>  >>  >>
  >>  >>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
  >>  >>  >>
  >>  >>  >>  > I actually help a lot of people get started with PVM (they 
  >> write me
  >>  >>  >>  > offline because I have a template PVM tarball up on my personal
  >>  >>  >> website)
  >>  >>  >>  > and the more I know, the better I can help them...;-)
  >>  >>  >>
  >>  >>  >>  >    rgb
  >>  >>  >>
  >>  >>  >>  > --
  >>  >>  >>  > Robert G. Brown                            Phone(cell):
  >>  >> 1-919-280-8443
  >>  >>  >>  > Duke University Physics Dept, Box 90305
  >>  >>  >>  > Durham, N.C. 27708-0305
  >>  >>  >>  > Web: http://www.phy.duke.edu/~rgb
  >>  >>  >>  > Book of Lilith Website:
  >>  >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  >>  >>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
  >>  >>  >>
  >>  >>  >>
  >>  >> 
  >> (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:
  >>  >>  >>
  >>  >>  >>   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!  
  >> They
  >>  >>  >>   Oak Ridge National Laboratory              still owe you money,
  >>  >> Fool!"
  >>  >>  >>   kohlja at ornl.gov
  >>  >>  >>   http://www.csm.ornl.gov/~kohl/          Long Live Curtis 
  >> Blues!!!
  >>  >>  >>
  >>  >>  >>
  >>  >> 
  >> :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
  >>  >>  >>
  >>  >>
  >>  >>  > --
  >>  >>  > Robert G. Brown                            Phone(cell): 
  >> 1-919-280-8443
  >>  >>  > Duke University Physics Dept, Box 90305
  >>  >>  > Durham, N.C. 27708-0305
  >>  >>  > Web: http://www.phy.duke.edu/~rgb
  >>  >>  > Book of Lilith Website: 
  >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  >>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
  >>  >>
  >>
  >>  > --
  >>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
  >>  > Duke University Physics Dept, Box 90305
  >>  > Durham, N.C. 27708-0305
  >>  > Web: http://www.phy.duke.edu/~rgb
  >>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
  >>
  >> _______________________________________________
  >> Beowulf mailing list, Beowulf at beowulf.org
  >> To change your subscription (digest mode or unsubscribe) visit 
  >> http://www.beowulf.org/mailman/listinfo/beowulf
  >>

  > -- 
  > Robert G. Brown                            Phone(cell): 1-919-280-8443
  > Duke University Physics Dept, Box 90305
  > Durham, N.C. 27708-0305
  > Web: http://www.phy.duke.edu/~rgb
  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From Shainer at mellanox.com  Fri Feb  8 09:21:26 2008
From: Shainer at mellanox.com (Gilad Shainer)
Date: Fri, 8 Feb 2008 09:21:26 -0800
Subject: [Beowulf] Infiniband and multi-cpu configuration
In-Reply-To: <1202488214.10138.289.camel@qeldroma.cttc.org>
Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784FEEF013@mtiexch01.mti.com>

Hi Daniel,

> 
> We'll move our GigE structure to an InfiniBand 4X DDR one ( 
> prices have dropped quite a bit ). Also we'll build on AMD 
> Opteron up to 4 or 8 cores. 
> 
> In case of 8 cores:
> 
> 	A 4 socket dual-core solution *must* scale better than 
> a 2 socket quad-core one, that is talking about memory 
> bandwith ( nearly double ).
> On the other hand, the Hypertransport links on Opteron 
> 2000/8000 series theorically rated at a 8 GB/s per link, so 
> that would be as equal as 4X SDR Infiniband...
> 
> 	A configuration like:
> 
> 		 2 PCs with 2 socket and 2 dual-core Opterons 
> linked together with Infiniband 4X DDR ( 8 cores )
> 
> 	Should perform as:
> 
> 		 1 PC with 4 socket ( dual-core ) Opteron based.
> 
> 	Saving cost on Infiniband hardware.
> 


As always, depends on the code. I saw cases where it was better to have
more servers and less CPUs per servers, and cases that it was the
opposite.


> 	When maximizing cores per node, reducing network 
> connections and network protocol overhead and considering 
> Opteron memory architecture...
> is 8 ( 4 sockets * 2 cores ) an adequate number or a 4 ( 2 
> sockets * 2 cores ) is better?
> 
> Also onboard memory InfiniBand HCAs must perform better than 
> memory-less ones, that is... but how much? any real numbers out there?
> 


No, the mem-free HCAs provide the same and in some cases if better
performance than the onboard memory HCAs. Even more, the mem-free HCAs
architecture is more advanced and provided extra goodies. There is a
white paper on Mellanox web site that cover the mem-free architecture
and performance comparison between mem-free and the onboard memory HCAs.
If you will not be able to find it, let me know and I will send you a
link.


Gilad.


From peter.st.john at gmail.com  Fri Feb  8 10:09:35 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 8 Feb 2008 13:09:35 -0500
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <20080208172202.GA23503@neo.csm.ornl.gov>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
	<20080207185304.GA11286@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>
	<20080207221132.GA26027@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802080504480.23523@cain.rgb.private.net>
	<20080208161526.GA25597@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802081135250.23523@cain.rgb.private.net>
	<20080208172202.GA23503@neo.csm.ornl.gov>
Message-ID: <e4d4fd070802081009m3fcd1858geb82eeb466dbcbbf@mail.gmail.com>

>  >   * Actually use the FSH so e.g. apropos pvm works.
>
> I'm assuming you don't mean FSH="Follicle Stimulating Hormone";
> did you mean "SSH", or am I clueless...?


I think he meant File System Hierarchy, a little further down on that same
list, http://en.wikipedia.org/wiki/FSH ?

>
>
>  >   * Make Ctrl-C work so one can break out of the annoying timeout on
> add
>  > hosts when things don't work.
>
> Would Ctrl-D work? But in that context I keep an idle sh window dedicated
to su, ps, and kill; makes me feel safe :-) but of course you are looking
for convenience.

I'm thinking that if I had RGB for QA I'd become a bartender :-) Bug reports
bigger than the Legacy Code.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080208/2dc2b290/attachment.html>

From rgb at phy.duke.edu  Fri Feb  8 12:15:19 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 8 Feb 2008 15:15:19 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <20080208172202.GA23503@neo.csm.ornl.gov>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net>
	<20080207185304.GA11286@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>
	<20080207221132.GA26027@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802080504480.23523@cain.rgb.private.net>
	<20080208161526.GA25597@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802081135250.23523@cain.rgb.private.net>
	<20080208172202.GA23503@neo.csm.ornl.gov>
Message-ID: <Pine.LNX.4.64.0802081420510.28245@ganesh>

On Fri, 8 Feb 2008, kohlja at ornl.gov wrote:

> I'm already getting lots of practice explaining how to get this stuff
> to work for 3 separate PVM users...  :)

Just record the results as you do so for reworking into the HOWTO...;-)

>  > At some point at your convenience in the future when
>  > you have all kinds of time to metaphorically sit down and REALLY work
>  > over PVM...
>
> Ahhh...  Lemme picture that moment...  :-D

:-)

>  > I have about 800 specific suggestions for bringing it up to
>  > current and modern and everything.  Just a wee list.  You know:
>
>  >   * Purge aimk for all time, die die die
>
> Ha ha ha...  You don't like "aimk"...?  :-)
>
> Yeah, PVM was originally pre-autoconf...  Too bad, eh...?  :)

Who, me?  I love aimk.  Well, I loved aimk.  Back in oh, 1994 or
thereabouts.  I was at that time managing a unix network with creeping
inhomogeneity and a mix of SySV and BSD related Unices.  I took aimk,
cut out its system-identifying heart, and incorporated it into the most
complex set of .files for my shell that you ever saw, that would
automagically look for things on any system I happened to have a copy of
my home directory copied to when I logged in, determine the
architecture, set all paths, and alias the hell out of everything so
that Unix worked -- for me -- pretty much the same on AIX, NeXT, SGI,
Sun, Ultrix...

However, the world has changed.  It has shrunk to pretty much Linux,
FreeBSD, and Solaris as surviving Unices, Windoze, and MacOSX as a
newbie Unix.  All the Unices share a large fraction of their code base
at this point.  People have invented POSIX and (gasp) standards.  And
frankly, the end result of all of this is that most of the complexity in
aimk is totally, totally obsolete and merely obstructs the ease of
working on or building the application.

One of SGE's worst features is that it is built on top of #!*@ aimk.
PVM has an excuse (it preceeded the GBT).  What is SGE's?

aimk die.  Standard Unix build (patched/ifdef'd as needed for WinXX, the
only remaining real maverick) live.

>  >   * Actually use the FSH so e.g. apropos pvm works.
>
> I'm assuming you don't mean FSH="Follicle Stimulating Hormone";
> did you mean "SSH", or am I clueless...?

Sigh.  Sorry, I'm the brainless one, transposed the H and S:

   http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard

There are basically a couple of ways one can think about installing pvm
in an FHS-compliant way.  One is just like it is now, but in /opt, as
that is what it is for.  The other (better) solution is to install the
binary, include, libraries and man pages native on their correct paths,
which would be /usr/bin, /usr/include, /usr/lib, and /usr/share/man
respectively IIRC.  Documentation in /usr/share/doc/pvmwhatever.  Any
shared DATA (e.g. shared configuration data) into either /etc/pvm* if it
is likely to vary per system or /usr/share/pvm if it is really
crossmountable in an architecture independent way.  Putting architecture
specific libraries and binaries in /usr/share is just wrong.

The Gnu Build Tools will manage all of this mostly for you if you make
the effort to port the build process over to them (which I freely admit
is likely to be a daunting chore and might require rethinking aspects of
PVM itself and how to run it in a heterogeneous environment).

>  >   * Document the hell out of everything
>
> Yes!  :D

<smile>

>  >   * Rewrite the network back end in a way that openly encourages high
>  > end network vendors to contribute reusable non-IP native drivers
>
> Ha ha ha...  Tried to cater to vendors many times.  See all those funny
> arch subdirs in pvm3/src...?  Yeah, been there, done that...
>
> (Though I agree that building on top of some generic "standardized"
> networking layer would be "nice" - there are so many to choose from... :)

Right, but note that MPI (at least some MPIs) support Real Cluster
Networks.  PVM supports them only using IP.  That statement means that
perhaps 1/4 to 1/3 of all possible parallel programmers reject PVM right
out of the box.  Another 1/3 are gone because embarrassingly parallel
programs usually don't quite justify even the complexity of PVM (even
though it is "perfect" for master-slave and EP projects in a lot of
ways).  The 1/3 that are left MIGHT use PVM, but if they think that
their code EVER might run on a high-end cluster, or if they want to pad
their resume with experience that might one day let them code something
new on a high-end cluster, they have to think hard about the PVM vs MPI
choice.  The result is that I'm guessing that there are 5 MPI
programmers for every 1 PVM programmer (and a lot of the latter, like
me, learned PVM back in the early 90's when it was the only game in
town).

So one possible solution is to build PVM in such a way that it can share
drivers with e.g. MPICH right out of the box, so to speak.  But this, of
course, also means altering other things, and agreeing on something of
an ABI for each hardware network device you have available.

On the good side, it would let you add a native ethernet channel that
didn't use (even) UDP/IP, good only on completely non-routed flat
switched networks.

>  >   * Add a (possibly macro-driven) middle layer that makes PVM into MPI
>  > as well -- one set of actual message-passing functions, two conformally
>  > mapped call interfaces.
>
> You mean like "PVMPI"...?
>
>  http://www.netlib.org/utk/papers/pvmpi/paper.html
>
> Or its offspring "MPI-Glue"...?
>
>  http://www.scientific-computing.de/people/rabenseifner/projects/mpi_glue.html
>
> Or do you mean something completely different...?  :)

No, like that, sure -- or the even older papers on the PVM website
(unless these are they).  But actually done and distributed in PVM, so
one doesn't actually need PVM -- and -- LAM on a system, especially
given that LAM is a lot like PVM except where it's not.  Possibly not as
good, I'd even say, but I'm not enough of an MPI user to be able to
fairly judge.

>  >   * Make Ctrl-C work so one can break out of the annoying timeout on add
>  > hosts when things don't work.
>
> Yeah, bummer eh?  :)  Where did Bob Manchek go to anyway...?
>
> (He's the real culprit behind the majority of PVM code, btw,
> I merely "inherited" the maintenance job... :)

I know how that goes.  And it is always a tradeoff, too.  For just ME,
it only wastes time in three or four minute chunkies, every now and
then.  It would take days, weeks, to recover the time required to fix
it.  But then you multiply "me" by an actual user base, and you come to
realize that stuff like this costs a huge amount of distributed
productivity and it's insane not to fix it.  Except that (naturally) you
aren't getting PAID to fix it so it's hours of YOUR time for minutes of
benefit to save person-weeks of everybody ELSE'S time.

Still, it is harmless to suggest it so that you MIGHT add it to that
eternally optimistic opportunity cost labor queue against the day you
finish a three month project you're being paid for in three days and
need to pretend to be busy for 87 days...;-)

>  >   * Make the console capable of cleaning up after a crash or
>  > interruption.
>
> We talked about things we could do there, e.g. to clean up old
> leftover /tmp/pvmd.* files, etc, but it was always easier to
> just remove the files by hand...!  ;)

Well, or not.  It depends on how often you have to do it.  Same
computation as above -- for any single person yeah, the hassle of coding
a robust solutions isn't worth it, but distribute that hassle over a
user base of even a hundred people and suddenly it is a lot of aggregate
time, especially for novice users and support.

Remember, NO NOVICE USER is going to understand that the reason that PVM
isn't working is because they somehow exited or killed or rebooted the
master host/process and left behind tag zombie pvmd's (or worse, just
the lockfiles) on all the nodes.  I at one time wrote scripts I could
run to clean up just because if there are more than a very few nodes,
this can get really painful!  If the nodes are widely distributed on an
enterprise LAN (one thing PVM is very good for) doubly so.

So again, you lose some fraction of the novices because they get
frustrated and (correctly) view such behavior as "broken", and you at
least annoy even the tried and true PVM programmer because nobody LIKES
having to go kill a whole bunch of processes and remove all those
lockfiles by hand, only to learn that they missed one.  It isn't fun
work, and it could be automated SO easily.

If I were going to write the PVM console over myself from scratch, I
would actually parallelize it to really facilitate stateful control.  By
that I mean I would separate out the interpreter loop as an absolutely
trivial, impossible to block object, and fork off one or more slave
tasks to do the actual things you are trying to do, OR I'd make all
tasks rigorously interruptible with minimal loss of state information
(or really, both).  That way you can always get to the console, and if
you can get to the console you can always execute a reset for whatever
VM you've defined.  Right now the only way to SIMULATE this behavior is
by breaking out of a hang to the originating shell with Ctrl-Z and then
performing all sorts of violence by hand without access even to the list
of currently configured hosts.  Ug-ly...

I'd probably also leave systems in the VM (and conf display) even if
they actually failed to add or added and then died, and just mark them
down.  Add a command to restart the downed ones (or even a way of
polling and doing it automatically, along with suitable signals
returnable to a master process.

There are a zillion things one could do with such a console and
signalling system.  Gather statistics from real-time console calls (e.g.
total number of messages, total number of bytes sent, per communications
pair).  Reset an entire cluster.  Take over a running cluster and
computation from a different master so that one can reboot the master
safely.  "Stop" the computation and migrate a node task ditto.

If the console were really NICELY written, with most of the console
functions actually tied up in a library, you'd make it (relatively)
trivial to write gpvm, the ultimate gnome PVM console.

The console is one of the nicest things about PVM, and it and the
ancient but still lovely xpvm sort-of-GUI are one thing that keeps it
alive as a teaching tool if nothing else.  It is just fabulous to be
able to watch a PVM computation develop as lots of little lines and
icons and so on.  But it could be a lot better, especially more robust
and easier on novices.  And with network support that could once again
compete with MPI on the high end, I think it would experience a bit of a
resurgence because it IS a good match for many kinds of tasks.

    rgb

>
> Good suggestions, though.  I'll add them to my "to do" list,
> along with any others that may come up...?  :-)


>
> Thanks, Man!
>
> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
>
>  > that kind of thing...;-)
>
>  >    rgb
>
>  >>
>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
>  >>
>  >>  On Fri, Feb 08, 2008 at 05:35:31AM -0500, Robert G. Brown wrote:
>  >>  > On Thu, 7 Feb 2008, kohlja at ornl.gov wrote:
>  >>
>  >>  >> I admit this may be an antiquated cynical mentality, and I
>  >>  >> further concur that PVMNETSOCKPORT is an obvious omission
>  >>  >> in the basic documentation/faq...
>  >>
>  >>  > As they say, you can't RTFM if there ain't no FM... (or if the solution
>  >>  > exists but isn't there).
>  >>
>  >>  > One is reminded of Dr. Strangelove, where the president (Peter Sellers)
>  >>  > has just learned that if the maverick B52 piloted by Slim Pickens gets
>  >>  > through, a doomsday device that is supposed to deter first nuclear
>  >>  > strikes will go off that will destroy the world.  Unfortunately, the
>  >>  > Soviet Union didn't actually tell us that it was built.  Dr.
>  >>  > Strangelove (Peter Sellers), after musing for a moment on the
>  >> brilliance
>  >>  > of the concept, turns and says in an increasingly shrill voice:
>  >>
>  >>  >   But...the whole point of the Doomsday Machine...is lost...if you keep
>  >>  >   it a SECRET. Why didn't you tell the world, eh?
>  >>
>  >>  > Hmmm...;-)
>  >>
>  >>  >    rgb
>  >>
>  >>  >> Thanks for your suggested text!  (And the suggestion to
>  >>  >> enhance our coverage of rsh/ssh usage... :-)
>  >>
>  >>  > Ya, well.  Just now finished telling the umptieth would-be PVM user how
>  >>  > to go about it in an email message, augmenting further online docs such
>  >>  > as this one:
>  >>
>  >>  >   http://www.uow.edu.au/~suresh/web/cfamily/pvm.html
>  >>
>  >>  > which is actually pretty decent, although I generally use the ssh
>  >>  > default dsa instead of rsa since on linux boxes it invariably works.
>  >>  > But better than forcing each user to employ google to snarf out
>  >>  > solutions to each problem they encounter, how much better to write a
>  >>  > really nice "Getting Started with PVM" or perhaps better still, a "PVM
>  >>  > HOWTO" on tldp.org.  Publish there, and be sure to include a copy in
>  >>  > plain sight in /usr/share/pvm3/PVM_HOWTO.
>  >>
>  >>  > Truthfully, good documentation, especially a walkthrough tutorial on
>  >>  > getting started (including sample code or links to sample code) that
>  >>  > takes a would-be user from "yum install pvm\*" to executing a Real
>  >>  > Parallel Program (however trivial) on a two node cluster would really
>  >>  > encourage the use of the library.  Adding a bit more (such as a PVM
>  >>  > program development template) would be only icing on the cake, so to
>  >>  > speak.
>  >>
>  >>  > If I had the time I'd write it myself.  I've already got a project_pvm
>  >>  > program template up on the web, but it is sadly underdocumented through
>  >>  > the setup of PVM itself.
>  >>
>  >>  >    rgb
>  >>
>  >>  >>
>  >>  >> All the Best,
>  >>  >>
>  >>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
>  >>  >>
>  >>  >>  On Thu, Feb 07, 2008 at 04:42:21PM -0500, Robert G. Brown wrote:
>  >>  >>  >>  > It would really, really help if man pvm (or man pvmd or man
>  >>  >> pvm_intro)
>  >>  >>  >>  > documented a suitable firewall setting that will let PVM
>  >> function
>  >>  >>  >>  > without just turning off the firewall altogether.  There is no
>  >> pvm
>  >>  >>  >> setup
>  >>  >>  >>  > in /etc/services, for example, no pvm checkbox in the panels
>  >>  >> managed by
>  >>  >>  >>  > system-config-firewall in the latest Fedoras, no suggestion as
>  >> to
>  >>  >> what
>  >>  >>  >>  > what protected port(s) or ranges one has to enable explicitly.
>  >> In
>  >>  >> fact
>  >>  >>  >>  > for once even google is failing me -- I'm not finding a lot of
>  >>  >>  >>  > documentation or remarks by ANYONE on what ports pvm needs open
>  >>  >>  >> (besides
>  >>  >>  >>  > ssh, which obviously is open and works).  Usually as long as
>  >> the
>  >>  >>  >>  > spawning of a network application itself works using an enabled
>  >>  >>  >>  > protected port (in this case, I would have expected ssh), the
>  >>  >> secondary
>  >>  >>  >>  > ports opened in unprotected space just work.  Am I wrong in
>  >> this?
>  >>  >> Do I
>  >>  >>  >>  > need to explicitly open more ports somewhere?
>  >>  >>  >>
>  >>  >>  >> Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use
>  >> as
>  >>  >>  >> many ports as you have machines in your cluster, or could use just
>  >> 1.
>  >>  >> :-}
>  >>  >>  >>
>  >>  >>  >> Normally, the master pvmd creates/accepts connections over a small
>  >>  >>  >> set of ports, possibly 1, but if PvmRouteDirect is enabled in a
>  >> PVM
>  >>  >>  >> application, then a myriad of direct-connection socket links are
>  >>  >>  >> created, to link whichever machines the local PVM application
>  >> tasks
>  >>  >>  >> communicate with, on a demand-driven basis...
>  >>  >>  >>
>  >>  >>  >> So it's not generally possible to specify an explicit "range" of
>  >>  >> ports.
>  >>  >>  >> However, it _is_ possible to set the "starting" port for this
>  >>  >> collection,
>  >>  >>  >> using the aforementioned "$PVMNETSOCKPORT" environment variable.
>  >>  >>
>  >>  >>  > OK, I'm giving this a try.  Although I'd have to ask why pvmd
>  >> doesn't
>  >>  >> do
>  >>  >>  > the fork thing and clone a single open port on which it listens
>  >> into a
>  >>  >>  > dynamically allocated port that inherits from the open one.  In
>  >>  >>  > principle one only needs a single port to be open to connect to
>  >> pretty
>  >>  >>  > much any network based application, or so I had thought.  At least,
>  >> I
>  >>  >> do
>  >>  >>  > that in xmlsysd and never have to punch more than one porthole
>  >> through
>  >>  >> a
>  >>  >>  > firewall.
>  >>  >>
>  >>  >>  > Hmmm, it's working sort of -- looks like I need to open UPD ports,
>  >>  >>  > right, not TCP?  Having trouble on one host where I've punched the
>  >> hole
>  >>  >>  > but didn't >>locally<< set PVMNETSOCKPORT to match, so I'm trying
>  >> again
>  >>  >>  > with the local environment variable set.
>  >>  >>
>  >>  >>  > Yup, that works.
>  >>  >>
>  >>  >>  > So I'm guessing that pvmd reads it as it starts up wherever.  Why
>  >> does
>  >>  >>  > it need to do this on a client?  Can't the port(s) be passed from
>  >> the
>  >>  >>  > master when it starts up pvmd?
>  >>  >>
>  >>  >>  >> This sets the first port that PVM will try to use, and all
>  >> subsequent
>  >>  >>  >> ports will usually be consecutive positive increments of that
>  >> starting
>  >>  >>  >> port (i.e. PVMNETSOCKPORT++... :-).
>  >>  >>  >>
>  >>  >>  >> So in most cases, you could probably plan on opening up a 100 or
>  >> 1000
>  >>  >>  >> ports _somewhere_ in your firewall, depending on your needs, and
>  >> then
>  >>  >>  >> just tell PVM where to start, using $PVMNETSOCKPORT...
>  >>  >>  >>
>  >>  >>  >> I've always considered this solution a bit of a kludge, which is
>  >> why
>  >>  >>  >> it doesn't show up in the man pages, but if it works well enough
>  >> to
>  >>  >>  >> save people lots of hassle, then I can add some commentary on
>  >> it...?
>  >>  >>
>  >>  >>  > Kludge or not, how can you have an environment variable in an
>  >>  >>  > application and not provide knowledge of it or instructions on its
>  >> use
>  >>  >>  > in the man page?  Something like:
>  >>  >>
>  >>  >>  >  PVM requires open ports on target hosts to function.  Many hosts
>  >> are
>  >>  >>  >  installed with strong firewall rules by default.  If you install
>  >> pvm
>  >>  >> on
>  >>  >>  >  a slave and pvm appears to hang when you attempt to add it,
>  >> eventually
>  >>  >>  >  timing out without success, consider adding the following to your
>  >>  >> local
>  >>  >>  >  personal or system environment (in, for example, ~/.bash_profile
>  >> on
>  >>  >> all
>  >>  >>  >  hosts):
>  >>  >>
>  >>  >>  >    PVMNETSOCKPORT=10000
>  >>  >>  >    export PVMNETSOCKPORT
>  >>  >>
>  >>  >>  >  Then configure your firewall(s) to open a range of udp ports
>  >> starting
>  >>  >>  >  at this value, such as 10000-11024 (which need be any larger than
>  >> the
>  >>  >>  >  largest number of machines you expect to have in your virtual
>  >>  >> machine).
>  >>  >>
>  >>  >>  > However a better solution still is to have the daemon fork on a
>  >> single
>  >>  >>  > "permanent" port address > 1024, e.g. 10000, and get a negotiated
>  >>  >>  > connection in the upper (non-protected) port space that way.
>  >>  >>
>  >>  >>  >> It may depend on the firewall settings, but a nice "Connection
>  >>  >>  >> Refused" would usually go a long way toward diagnosing things,
>  >>  >>  >> whereas the more secure firewall alternative of simply
>  >>  >>  >> "no response" would only result in a "timed out" PVM message...
>  >>  >>  >>
>  >>  >>  >> I'm open to suggestions on ways to identify or diagnose the
>  >>  >> problem...!
>  >>  >>
>  >>  >>  > As I said, document EVERYTHING in the man page(s).  It is what it
>  >> is
>  >>  >> for.
>  >>  >>  > Lots of users do, in fact, RTFM but get frustrated and give up when
>  >>  >> they
>  >>  >>  > try something and it just doesn't work and they can't see why.
>  >>  >>
>  >>  >>  > On the same line, a perennial problem with PVM is getting it to
>  >> work
>  >>  >>  > with rsh and ssh.  In fact, half the problems I help people with
>  >> who
>  >>  >>  > randomly write me is getting it to work with one or the other.  The
>  >>  >>  > internal diagnostics are certainly very helpful, at this point, but
>  >> it
>  >>  >>  > would also be worth adding a new man page like pvm_rsh that does
>  >>  >> nothing
>  >>  >>  > but walk users through the ritual of setting PVM_RSH and
>  >> establishing
>  >>  >>  > appropriate e.g. ssh keys.
>  >>  >>
>  >>  >>  > Just a thought or two.
>  >>  >>
>  >>  >>  >    rgb
>  >>  >>
>  >>  >>  >>
>  >>  >>  >> Thanks Much for your interest and feedback!
>  >>  >>  >>
>  >>  >>  >> All the Best,
>  >>  >>  >>
>  >>  >>  >> 	Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
>  >>  >>  >>
>  >>  >>  >>  > I actually help a lot of people get started with PVM (they
>  >> write me
>  >>  >>  >>  > offline because I have a template PVM tarball up on my personal
>  >>  >>  >> website)
>  >>  >>  >>  > and the more I know, the better I can help them...;-)
>  >>  >>  >>
>  >>  >>  >>  >    rgb
>  >>  >>  >>
>  >>  >>  >>  > --
>  >>  >>  >>  > Robert G. Brown                            Phone(cell):
>  >>  >> 1-919-280-8443
>  >>  >>  >>  > Duke University Physics Dept, Box 90305
>  >>  >>  >>  > Durham, N.C. 27708-0305
>  >>  >>  >>  > Web: http://www.phy.duke.edu/~rgb
>  >>  >>  >>  > Book of Lilith Website:
>  >>  >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>  >>  >>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>  >>  >>  >>
>  >>  >>  >>
>  >>  >>
>  >> (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:
>  >>  >>  >>
>  >>  >>  >>   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!
>  >> They
>  >>  >>  >>   Oak Ridge National Laboratory              still owe you money,
>  >>  >> Fool!"
>  >>  >>  >>   kohlja at ornl.gov
>  >>  >>  >>   http://www.csm.ornl.gov/~kohl/          Long Live Curtis
>  >> Blues!!!
>  >>  >>  >>
>  >>  >>  >>
>  >>  >>
>  >> :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
>  >>  >>  >>
>  >>  >>
>  >>  >>  > --
>  >>  >>  > Robert G. Brown                            Phone(cell):
>  >> 1-919-280-8443
>  >>  >>  > Duke University Physics Dept, Box 90305
>  >>  >>  > Durham, N.C. 27708-0305
>  >>  >>  > Web: http://www.phy.duke.edu/~rgb
>  >>  >>  > Book of Lilith Website:
>  >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>  >>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>  >>  >>
>  >>
>  >>  > --
>  >>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
>  >>  > Duke University Physics Dept, Box 90305
>  >>  > Durham, N.C. 27708-0305
>  >>  > Web: http://www.phy.duke.edu/~rgb
>  >>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>  >>
>  >> _______________________________________________
>  >> Beowulf mailing list, Beowulf at beowulf.org
>  >> To change your subscription (digest mode or unsubscribe) visit
>  >> http://www.beowulf.org/mailman/listinfo/beowulf
>  >>
>
>  > --
>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
>  > Duke University Physics Dept, Box 90305
>  > Durham, N.C. 27708-0305
>  > Web: http://www.phy.duke.edu/~rgb
>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From rgb at phy.duke.edu  Fri Feb  8 12:17:45 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 8 Feb 2008 15:17:45 -0500 (EST)
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <e4d4fd070802081009m3fcd1858geb82eeb466dbcbbf@mail.gmail.com>
References: <20080206231328.GA1249@neo.csm.ornl.gov> 
	<Pine.LNX.4.64.0802071115470.23523@cain.rgb.private.net> 
	<20080207185304.GA11286@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>
	<20080207221132.GA26027@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802080504480.23523@cain.rgb.private.net>
	<20080208161526.GA25597@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802081135250.23523@cain.rgb.private.net>
	<20080208172202.GA23503@neo.csm.ornl.gov>
	<e4d4fd070802081009m3fcd1858geb82eeb466dbcbbf@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802081516410.28245@ganesh>

On Fri, 8 Feb 2008, Peter St. John wrote:

> I'm thinking that if I had RGB for QA I'd become a bartender :-) Bug reports
> bigger than the Legacy Code.

If I had you for my bug-fixer, I'd probably be your best customer... at
the bar.

   ;-)

(Sorry, couldn't resist that one:-)

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From peter.st.john at gmail.com  Fri Feb  8 12:38:00 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 8 Feb 2008 15:38:00 -0500
Subject: [Beowulf] PVM on wireless...
In-Reply-To: <Pine.LNX.4.64.0802081516410.28245@ganesh>
References: <20080206231328.GA1249@neo.csm.ornl.gov>
	<20080207185304.GA11286@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802071537450.23523@cain.rgb.private.net>
	<20080207221132.GA26027@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802080504480.23523@cain.rgb.private.net>
	<20080208161526.GA25597@neo.csm.ornl.gov>
	<Pine.LNX.4.64.0802081135250.23523@cain.rgb.private.net>
	<20080208172202.GA23503@neo.csm.ornl.gov>
	<e4d4fd070802081009m3fcd1858geb82eeb466dbcbbf@mail.gmail.com>
	<Pine.LNX.4.64.0802081516410.28245@ganesh>
Message-ID: <e4d4fd070802081238m7122c934q59d8d4e35004b929@mail.gmail.com>

RGB,
:-) no slight. In fact Euphistopheles, at LambdaMOO, called himself
SourceError Supreme (after Dr Strange comic), for some incomprehensible
reason.
peter

On Feb 8, 2008 3:17 PM, Robert G. Brown <rgb at phy.duke.edu> wrote:

> On Fri, 8 Feb 2008, Peter St. John wrote:
>
> > I'm thinking that if I had RGB for QA I'd become a bartender :-) Bug
> reports
> > bigger than the Legacy Code.
>
> If I had you for my bug-fixer, I'd probably be your best customer... at
> the bar.
>
>   ;-)
>
> (Sorry, couldn't resist that one:-)
>
>    rgb
>
> --
> Robert G. Brown                            Phone(cell): 1-919-280-8443
> Duke University Physics Dept, Box 90305
> Durham, N.C. 27708-0305
> Web: http://www.phy.duke.edu/~rgb
> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080208/3cb6021d/attachment.html>

From bjtstarks at gmail.com  Fri Feb  8 13:33:54 2008
From: bjtstarks at gmail.com (Berkley Starks)
Date: Fri, 8 Feb 2008 14:33:54 -0700
Subject: [Beowulf] Setting up a new Beowulf cluster
In-Reply-To: <Pine.LNX.4.64.0802081101580.23523@cain.rgb.private.net>
References: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com>
	<Pine.LNX.4.64.0802081101580.23523@cain.rgb.private.net>
Message-ID: <5721d9d70802081333l765cac34of24ec269d5a9e01@mail.gmail.com>

Thank you all so much for the advice so far.  This has helped me see a few
more of the things that I did not realize at first.

For a little info on the project, I developed this project as a tool to work
on my Senior Thesis in a year or so.  Doing computational nuclear physics
requires such resources.  It will also be used heavily for Monte Carlo
Simulations and just about any other form of computational physics.  The two
named are definite projects that are already on the line up for when I do
get the cluster up and functional.

I want to be able to make the cluster easily expandable, in that I will be
starting with only a few machines (about 2-8), but will be acquiring more as
time goes on.  The university that I am attending surpluses out "old"
machines every 4 years, and we have set up a program where we can get a
percentage of the surplus machines for out cluster.

So, as for size.  Initially it will be a smaller cluster, but will grow as
time goes on.

Being new to the Beowulf world, I am just mainly looking for some advice as
to what distro to use (I would never dream of setting up a cluster on
windows) and if there were any little tricks that weren't mentioned in the
setup how to guides.

Oh, and I would also like to know if there was a way to set up a task
priority where if I had only only application running it would use all the
processors on the cluster, but if I had two tasks sent to the cluster then
it would split the load between them and run both simultaneously, but still
using a maximum for the needed processors.

Thanks again so much,

Berkley

On Feb 8, 2008 9:11 AM, Robert G. Brown <rgb at phy.duke.edu> wrote:

> On Thu, 7 Feb 2008, Berkley Starks wrote:
>
> > Hello all,
> >
> > I've been a computer user for the past several years working in
> different
> > areas of the IT world.  I've recently been commissioned by my university
> to
> > set up the first operating Beowulf Cluster.
> >
> > I'm am moderately familiar with the Linux OS, having ran it for the past
> > several years using the distro's of Debian, Ubuntu, Fedora Core, and
> > Mandriva.
> >
> > With setting up this new cluster I would like any advice possible on
> what OS
> > to use, how to set it up, and any other pertinent information that I
> might
> > need.
>
> This question has been answered on-list in detail a few zillion times.
> I'd suggest consulting (in rough order):
>
>   a) The list archives (now that you're a member you can get to them,
> although they are digested and googleable for the most part anyway).
>
>   b) Google.  For example, there is a lovely howto here:
>
>     http://www.linux.org/docs/ldp/howto/Parallel-Processing-HOWTO.html
>
> that is remarkably current and a good quick place to start.
>
>   c) Feel free to browse my free online book here:
>
>     http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book.php<http://www.phy.duke.edu/%7Ergb/Beowulf/beowulf_book.php>
>
> I'm working on making it paper-printable via lulu, but I need time I
> don't have and so that project languishes a bit.  You "can" get a paper
> copy there if you want, but it is pretty much what is on the free
> website including the holes.
>
> > Oh, and the cluster will be used for computational physics.  I am a
> physics
> > major making it for the physics department here.  It will need to be
> able to
> > use C++ and Fortran at a bare minimum.
>
> C, C++ and Fortran are all no problem.  The more important questions
> are:
>
>   a) How coupled are the parallel tasks?  That is, do you want a cluster
> that can run N independent jobs on N independent nodes (where the jobs
> don't communicate with each other at all), or do you want a cluster
> where the N nodes all do work on a common task as part of one massive
> parallel program?  If the former, you're in luck and cluster design is
> easy and the cluster purchase will be cheap.
>
>   b) If they are coupled, are the tasks "tightly coupled" so each
> subtask can only advance a little bit before communications are required
> in order to take the next step?  "Synchronous" so all steps have to be
> completed on all nodes before any can advance?  Are the messages really
> big (bandwidth limited) or tiny and frequent (latency limited)?
>
> If any of these latter answers are "yes", post a detailed description of
> the tasks (as best you can) to get some advice on choosing a network, as
> that's the design parameter that is largely controlled by the answers.
>
>    rgb
>
> >
> > Thanks again
> >
>
> --
> Robert G. Brown                            Phone(cell): 1-919-280-8443
> Duke University Physics Dept, Box 90305
> Durham, N.C. 27708-0305
> Web: http://www.phy.duke.edu/~rgb <http://www.phy.duke.edu/%7Ergb>
> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php<http://www.phy.duke.edu/%7Ergb/Lilith/Lilith.php>
> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080208/de51a8f0/attachment.html>

From gdjacobs at gmail.com  Sat Feb  9 17:19:25 2008
From: gdjacobs at gmail.com (Geoff Jacobs)
Date: Sat, 09 Feb 2008 19:19:25 -0600
Subject: [Beowulf] Setting up a new Beowulf cluster
In-Reply-To: <5721d9d70802081333l765cac34of24ec269d5a9e01@mail.gmail.com>
References: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com>	<Pine.LNX.4.64.0802081101580.23523@cain.rgb.private.net>
	<5721d9d70802081333l765cac34of24ec269d5a9e01@mail.gmail.com>
Message-ID: <47AE511D.8000007@gmail.com>

Berkley Starks wrote:
> Thank you all so much for the advice so far.  This has helped me see a
> few more of the things that I did not realize at first.
> 
> For a little info on the project, I developed this project as a tool to
> work on my Senior Thesis in a year or so.  Doing computational nuclear
> physics requires such resources.  It will also be used heavily for Monte
> Carlo Simulations and just about any other form of computational
> physics.  The two named are definite projects that are already on the
> line up for when I do get the cluster up and functional.
> 
> I want to be able to make the cluster easily expandable, in that I will
> be starting with only a few machines (about 2-8), but will be acquiring
> more as time goes on.  The university that I am attending surpluses out
> "old" machines every 4 years, and we have set up a program where we can
> get a percentage of the surplus machines for out cluster.
> 
> So, as for size.  Initially it will be a smaller cluster, but will grow
> as time goes on.
> 
> Being new to the Beowulf world, I am just mainly looking for some advice
> as to what distro to use (I would never dream of setting up a cluster on
> windows) and if there were any little tricks that weren't mentioned in
> the setup how to guides.
> 
> Oh, and I would also like to know if there was a way to set up a task
> priority where if I had only only application running it would use all
> the processors on the cluster, but if I had two tasks sent to the
> cluster then it would split the load between them and run both
> simultaneously, but still using a maximum for the needed processors.
> 
> Thanks again so much,
> 
> Berkley

1) Go straight to a gigE switch. There's really no reason in pricing to
not go with gigE, unless you're getting something for free.
2) Surplus hardware will definitely allow you to work through some of
the kinks. I'm guessing the computers will be p4s? Likely with fast
ethernet. You can try your codes on the built in nics and see how things
scale. You might need to upgrade your network.
3) As far as distro is concerned, what are people familiar with at your
site? Do you project any ISV requirements?

-- 
Geoffrey D. Jacobs

To have no errors
  would be life without meaning
  No struggle, no joy


From gdjacobs at gmail.com  Sat Feb  9 17:22:24 2008
From: gdjacobs at gmail.com (Geoff Jacobs)
Date: Sat, 09 Feb 2008 19:22:24 -0600
Subject: [Beowulf] getting kubuntu to perform as a cluster os
In-Reply-To: <a31cd3860802060802n1720061aha28f73e7ab8adf82@mail.gmail.com>
References: <a31cd3860802060802n1720061aha28f73e7ab8adf82@mail.gmail.com>
Message-ID: <47AE51D0.4040008@gmail.com>

Jon Aquilina wrote:
> what would be necessary to get a normal desktop os such as kubuntu to
> run as a clusterable os
> 
> -- 
> Jonathan Aquilina
An ssh server. MPI. Compilers. Not sure if Kubuntu's kernel supports
serving NFS with a kernel module, but user space NFS should be doable.

Question is, why not use Ubuntu server?

-- 
Geoffrey D. Jacobs

To have no errors
  would be life without meaning
  No struggle, no joy


From tjrc at sanger.ac.uk  Sun Feb 10 03:01:08 2008
From: tjrc at sanger.ac.uk (Tim Cutts)
Date: Sun, 10 Feb 2008 11:01:08 +0000
Subject: [Beowulf] getting kubuntu to perform as a cluster os
In-Reply-To: <47AE51D0.4040008@gmail.com>
References: <a31cd3860802060802n1720061aha28f73e7ab8adf82@mail.gmail.com>
	<47AE51D0.4040008@gmail.com>
Message-ID: <FB46D05F-46E8-4D24-8EE7-83B142A2F537@sanger.ac.uk>


On 10 Feb 2008, at 1:22 am, Geoff Jacobs wrote:

> Jon Aquilina wrote:
>> what would be necessary to get a normal desktop os such as kubuntu to
>> run as a clusterable os
>>
>> -- 
>> Jonathan Aquilina
> An ssh server. MPI. Compilers. Not sure if Kubuntu's kernel supports
> serving NFS with a kernel module,

It should do.

> Question is, why not use Ubuntu server?

A good question...

Tim


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From coutinho at dcc.ufmg.br  Sun Feb 10 12:35:17 2008
From: coutinho at dcc.ufmg.br (Bruno Coutinho)
Date: Sun, 10 Feb 2008 18:35:17 -0200
Subject: [Beowulf] Setting up a new Beowulf cluster
In-Reply-To: <47AE511D.8000007@gmail.com>
References: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com>
	<Pine.LNX.4.64.0802081101580.23523@cain.rgb.private.net>
	<5721d9d70802081333l765cac34of24ec269d5a9e01@mail.gmail.com>
	<47AE511D.8000007@gmail.com>
Message-ID: <a8d96dec0802101235m3183b5bbxb1181999ef7126f0@mail.gmail.com>

2008/2/9, Geoff Jacobs <gdjacobs at gmail.com>:
>
> Berkley Starks wrote:
> > Thank you all so much for the advice so far.  This has helped me see a
> > few more of the things that I did not realize at first.
> >
> > For a little info on the project, I developed this project as a tool to
> > work on my Senior Thesis in a year or so.  Doing computational nuclear
> > physics requires such resources.  It will also be used heavily for Monte
> > Carlo Simulations and just about any other form of computational
> > physics.  The two named are definite projects that are already on the
> > line up for when I do get the cluster up and functional.
> >
> > I want to be able to make the cluster easily expandable, in that I will
> > be starting with only a few machines (about 2-8), but will be acquiring
> > more as time goes on.  The university that I am attending surpluses out
> > "old" machines every 4 years, and we have set up a program where we can
> > get a percentage of the surplus machines for out cluster.
> >
> > So, as for size.  Initially it will be a smaller cluster, but will grow
> > as time goes on.
> >
> > Being new to the Beowulf world, I am just mainly looking for some advice
> > as to what distro to use (I would never dream of setting up a cluster on
> > windows) and if there were any little tricks that weren't mentioned in
> > the setup how to guides.
> >
> > Oh, and I would also like to know if there was a way to set up a task
> > priority where if I had only only application running it would use all
> > the processors on the cluster, but if I had two tasks sent to the
> > cluster then it would split the load between them and run both
> > simultaneously, but still using a maximum for the needed processors.
> >
> > Thanks again so much,
> >
> > Berkley
>
> 1) Go straight to a gigE switch. There's really no reason in pricing to
> not go with gigE, unless you're getting something for free.


I agree with that.
Non mananged or "web managed" gigE switches are really cheap today.
Several motherborards come with gigE onborard, so you can get some gigE NICs
already in the surplus hardware you recieve.

2) Surplus hardware will definitely allow you to work through some of
> the kinks. I'm guessing the computers will be p4s? Likely with fast
> ethernet. You can try your codes on the built in nics and see how things
> scale. You might need to upgrade your network.


3) As far as distro is concerned, what are people familiar with at your
> site? Do you project any ISV requirements?
>
> --
> Geoffrey D. Jacobs
>
> To have no errors
>   would be life without meaning
>   No struggle, no joy
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080210/67df28be/attachment.html>

From gdjacobs at gmail.com  Sun Feb 10 13:31:16 2008
From: gdjacobs at gmail.com (Geoff Jacobs)
Date: Sun, 10 Feb 2008 15:31:16 -0600
Subject: [Beowulf] getting kubuntu to perform as a cluster os
In-Reply-To: <a31cd3860802100043k3695dc28j19cdee81090fc02a@mail.gmail.com>
References: <a31cd3860802060802n1720061aha28f73e7ab8adf82@mail.gmail.com>	
	<47AE51D0.4040008@gmail.com>
	<a31cd3860802100043k3695dc28j19cdee81090fc02a@mail.gmail.com>
Message-ID: <47AF6D24.7010106@gmail.com>

Jon Aquilina wrote:
> i would use the server but im not extremely versed with command line
> commands except the simple sudo apt-get install update upgrade
> dist-upgrade auto clean, etc

Hmm... I would suspect some command line is going to be required no
matter what. I would recommend taking some time to beef up your skills
in that area.

http://www.linuxcommand.org/learning_the_shell.php

However, it should be possible to run (for example) any *Ubuntu with a
server kernel if that increases your comfort.

BTW, feel free to contact me off list if you need any command line pointers.

-- 
Geoffrey D. Jacobs

To have no errors
  would be life without meaning
  No struggle, no joy


From Craig.Tierney at noaa.gov  Mon Feb 11 07:10:36 2008
From: Craig.Tierney at noaa.gov (Craig Tierney)
Date: Mon, 11 Feb 2008 08:10:36 -0700
Subject: [Beowulf] Infiniband and multi-cpu configuration
In-Reply-To: <1202488214.10138.289.camel@qeldroma.cttc.org>
References: <1202488214.10138.289.camel@qeldroma.cttc.org>
Message-ID: <47B0656C.8080802@noaa.gov>

Guillaume Michal wrote:
 > Hi all,
 > We set up our first cluster in our faculty this week. As we are new to cluster computing, there is a lot to learn. We performed 
some linpack test using the OpenMPI benchmark available in the Rocks 4.3 distribution. The system is as follow:
 >  - GigB ethernet with switch HP Procurve 2800 series
 >  - 1 Master node: 500GB sata HDD, two intel quad core E5410 at 2.33GHz, 2GB mem
 >  - 4 nodes each having: 80GB sata HDD, two intel quad core E5410 at 2.33GHz, 8GB mem
 >
 > First I'm a bit confused by the parameters P and Q in HPL.dat and how to use them properly. I noticed a 4P 2Q test is not 
equivalent to a 2P 4Q, generally speaking it does not commute. Why? What is clearly P and Q then: P for number of processors per 
nodes and Q for the number of nodes?
 >

Visualize the problem as a big 2d matrix.  P and Q represent how the problem
is divided.  In general, the best is when the matrix is divided into even squares.
If your core count isn't n^2, then P and Q have to be different.  From experience
P should always be less than Q.  There may be a computational reason for that
(ie, longer strides in memory), but I am not sure.


 > Secondly, what is the definition of processor for a quad core architecture? I suppose a quad core should be counted as 4 processors.

Yes, unless you are using a multithreaded BLAS library.  If you are,
you should have each node be 1 process.

 >
 > I launched Linpack using Ns=10000 and various configuration for P and Q. At the moment I got a maximum of 78 Gflops using P=8 Q=4 
-> 32 processors.

You want to use as much available memory as possible.  I use N=10000 on a
single processor, single core run with 1GB.   You can figure out a good
value of N by the following formula:

Ns=sqrt(<Memory in Bytes per core>*<Number of cores>/8)

The 8 represents the size of a double.  For <Memory in Bytes per core>, I try
to use the largest number possible, typically about 90% of max.  You never
want to go into swap during these calculations (or, have it crash because
you have diskless nodes).

Ex: If you have 2GB per core for 32p, should use Ns as:

Ns=sqrt(1900*1024*1024*32/8)
Ns=89270

Honestly, this may be overkill.  At some point, the working memory set will
be large enough so that FP performance will be the bottleneck.  I would
start with smaller numbers (say half) and work your way up to understand
what is going on.  In any case, using Ns=10000 is way to small.

 >
 > If I'm right the peak performance should be Rpeak= 4 cores x 4 floting point op per cycle x 2.33 Ghz x 8 quad cores = 298 Gflops.
 > Which would lead to a test running at ~25% Rpeak.
 >
 > This is very low and I see 3 causes for the problem:
 >     - I miscalculated Rpeak
 >     - P and Q are not set properly
 >     - there is a serious bottelneck
 >

I think your Rpeak calculation is correct (not sure how many FPs the latest
Intel chips can do).

If increasing Ns doesn't help, run smaller cases on a per node bases (using
all available memory for each node).  If you don't get the exact same
answer on every node (or at least with 2%), you have a problem.  Figure out
what is wrong with the slow nodes.  Also, run the test multiple times
on the same node and verify consistent performance.

Craig


 > Thanks for your advices
 >
 > Guillaume
 >
 >
 > --Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
 >
 > _______________________________________________
 > Beowulf mailing list, Beowulf at beowulf.org
 > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
 >


-- 
Craig Tierney (craig.tierney at noaa.gov)


From raysonlogin at gmail.com  Tue Feb 12 10:23:15 2008
From: raysonlogin at gmail.com (Rayson Ho)
Date: Tue, 12 Feb 2008 13:23:15 -0500
Subject: [Beowulf] Lustre is now more opensource than before Sun's
	acquisition
Message-ID: <73a01bf20802121023w70f73178j418dba9efa286598@mail.gmail.com>

Besides making the cvs accessible on the Internet, Sun also released
the design documents.

See the forwarded mail below...

Rayson


---------- Forwarded message ----------
From: Peter Bojanic <Peter.Bojanic at sun.com>
Date: Feb 6, 2008 5:01 PM
Subject: [Lustre-announce] Lustre developer documentation now available
To: lustre-announce at lists.lustre.org, lustre-discuss at lists.lustre.org,
lustre-devel at lists.lustre.org

Dear Lustre Community,

The Lustre Group at Sun is pleased to announce the availability of
detailed technical documentation for Lustre.

This material has existed for some time but was closely guarded
intellectual property by the former CFS. In the spirit of making
Lustre continually more open, we have been encouraged by Sun execs to
make this valuable information available to the Lustre community.

Lustre High Level Designs (HLD):
http://arch.lustre.org/index.php?title=LustreHLDs

Lustre Detailed Level Designs (DLD):
http://arch.lustre.org/index.php?title=LustreDLDs

Lustre Internals: http://wiki.lustre.org/index.php?title=Lustre_Internals

Lustre HLDs and DLDs are available for download from the CVS
repository in their original Lyx format. The HLD and DLD wiki pages
include CVS download instructions. Additionally, each wiki page has
links for the most current design documents available in PDF format.
For each PDF, the author and date are listed, along with a synopsis.
Over the coming days, we'll continue to make more of the 2008, 2007
and 2006 HLDs and DLDs available as PDFs on the wiki pages.

Note that some of this documentation may be out of date, incomplete,
or not yet implemented in Lustre. But it still provides useful
reference material for understanding Lustre.

The Lustre Internals course was taught a few times in the past year by
Peter Braam. We've decided that it represents more of a useful
knowledge resource than course material.

It is our hope that this documentation will contribute to nurturing a
stronger and more vibrant development community for Lustre.

Cheers,
Bojanic
_______________________________________________
Lustre-announce mailing list
Lustre-announce at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-announce


From rgb at phy.duke.edu  Wed Feb 13 07:45:59 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 13 Feb 2008 10:45:59 -0500 (EST)
Subject: [Beowulf] Setting up a new Beowulf cluster
In-Reply-To: <5721d9d70802081333l765cac34of24ec269d5a9e01@mail.gmail.com>
References: <5721d9d70802070745v1820b64ey671039f48b3dd5a@mail.gmail.com> 
	<Pine.LNX.4.64.0802081101580.23523@cain.rgb.private.net>
	<5721d9d70802081333l765cac34of24ec269d5a9e01@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802130750270.24493@cain.rgb.private.net>

On Fri, 8 Feb 2008, Berkley Starks wrote:

> Thank you all so much for the advice so far.  This has helped me see a few
> more of the things that I did not realize at first.
>
> For a little info on the project, I developed this project as a tool to work
> on my Senior Thesis in a year or so.  Doing computational nuclear physics
> requires such resources.  It will also be used heavily for Monte Carlo
> Simulations and just about any other form of computational physics.  The two
> named are definite projects that are already on the line up for when I do
> get the cluster up and functional.

(Sorry about the delay, I'm busy busy busy:-)

OK, this (and the stuff below) makes your job relatively easy.  I'm
going to guess that your application mix will almost certainly be
"embarrassingly parallel" at least at first -- lots of compute nodes
running MC simulations in nuclear physics (a situation we also have here
at Duke) plus people running random applications of one sort or another
in a sort of "compute farm" way.  After you've head it for a few years,
you'll probably start to develop at least a few "real parallel"
applications, so we'll use a design that can segue into that, but to do
that "right" you'll have to deliberately engineer the cluster to fit the
task and will need an actual budget.

You'll need an actual budget to get started here, too, especially if you
want to build a cluster that is actually "useful".  Here's the math.
According to Moore's Law (a scaling law for computing performance at
constant cost that has functioned at least approximately well for close
to maybe 45 years) compute power at constant cost has doubled roughly
every 18 months.  That means that four year old machines, by the time
you get them, will be 2^3 = 8 times slower than a brand new machine that
costs just as much as they cost when they were new.  Since machines --
amazingly powerful machines, like dual processor dual core 64 bit CPU
machines -- can be purchased for (say) $2000 give or take a bit
depending on what precisely you get on them and might be MORE than 8x
faster than an old 32-bit P6 machine, you're going to have the paradox
that some faculty desktops will be faster than your entire cluster.

To put it another way, while using old machines is fine for making a
learning cluster, it's going to suck in production, with a lot of work
and investment required to get to where you could go far more easily by
buying a single new desktop at modest cost.

The design I'm going to suggest for you (Geeze, I feel like Clinton on
What Not to Wear) is a tasteful cluster, one that is intially
surprisingly affordable as it gives you the opportunity to learn about
clustering and provide your nuclear group with "a" place to run jobs,
yet it can grow and change as your needs (and budget!) grow and change.

Let's budget it out.

Your cluster will need a home, and there are good homes and not so good
homes depending on its scale.  Close to networking is good.  In a rack
is great, although you can certainly get started on heavy duty steel
shelving.  On a floor that is rated to support the weight of your
growing stack of hardware is key -- a fully loaded rack can be quite
heavy, and nothing ruins your day like having a rackful of expensive
hardware crash through the floor to land on the head of somebody one
floor down (or worse, break all the way through in a cascade effect down
to the basement).  Hard on head, and likely to break all that expensive
equipment.  Oh, and the building.  Did I mention the lawsuits?

The three critical components required by your cluster in its physical
home are power and cooling and a network or network access.  A "box" --
a cluster node containing one or more processors -- typically draws
between 100W minimum to around 250W, depending on how many processors it
has, how much memory, whether or not it has disk(s) or other
peripherals.  This is rule of thumb, YMMV.  At one point I would have
estimated 100W per CPU, but nowadays I think it is probably down to more
like 50-60W per CPU core (anybody have current numbers on actual
hardware to contribute)?  If we assume that you'll get started with a
humble 8 contributed ancient P4's at 125W each, that's a kilowatt right
there.  Add networking, add disks, add monitor(s), add a separate
server, and you're right up at the limit of a standard 20 ampere
circuit.  This means that you will need AT LEAST one dedicated 20 Amp
120VAC circuit to run your cluster, and will need ADDITIONAL circuits as
your cluster grows.  They don't all have to be handy when you get
started, but if you try to put the cluster onto an existing, already
half-loaded circuit it's going to trip breakers when you first power it
on and that's embarrassing so think ahead.

As a physics student, you will recall thermodynamics.  All the power
consumed by cluster nodes appears shortly thereafter in their immediate
environment as heat.  If you remove the heat as fast as it is generated,
the environment (and nodes) remain at a constant temperature.  If not,
it gets hotter until thermal diffusion through e.g. the walls of the
space balance it out.  Computers HATE to be hot.  They express their
irritation by breaking, burning out early, actually malfunctioning and
throwing bit errors that ruin a computation.  We want our cluster to be
cool and happy and last a long time and run reliably, so we want our
cluster space to be anywhere from cool to COLD.  The rule is that
computer componets lose a year of expected lifetime for every 10 degrees
farenheit above an ambient air temperature of 68F (20C) which is a
"cool" temperature for an office.  60F is better still -- most server
rooms are maintained with ambient air temperature as cool as 50F (10C),
more likely ballpark 60F under load.  Air conditioning capacity is
measured in "tons", where a ton of AC is a unit capable of removing the
heat required to transform a ton-sized block of ice from ice into water
at 32F (latent heat of fusion, work it out) which just happens to be
~3500 watts.  You want to be able to stay AHEAD of the heat and actually
cool heat infiltration from outside, so you'll need more (25-35% more)
AC capacity in watts than you have power capacity in watts.  You also
need to worry about air circulation, especially if you're building the
cluster in e.g.  a closet (NOT recommended).  A big open room gets a bit
of convective help and is better than a small closed space.  The air
should be and remain dry.

Then the space needs networking.  There are two aspects of this to
consider, and they're not separate for the cluster design I'm going to
suggest.  One is the network required by the actual cluster nodes, which
communicate with each other (if needed) and the "master" node and other
workstations in the department (certainly) via network interconnects.
The other is the network connection to the rest of the department -- how
are people going to use the cluster?  It is by far the easiest if they
can just start jobs up from their desktops, which means everything needs
to be on the same network.  Minimally, then, the cluster space needs
>>a<< network wire running into it from the building networking closet
and connected to its presumed switch.  Beyond that, there are several
ways to proceed, depending on local politics, who provides what, who
"owns and runs" what, and practical considerations.

For example, one scenario is that you upgrade the existing building
networking closet by adding a 48 port professional-grade gigabit
ethernet switch that is uplinked into the existing possibly slower
department switch.  A nice fat bundle of cable is run from the
punchblocks in this closet back to your cluster space, and punched into
a panel of RJ45 ports in a rack in your cluster space.  As you add
nodes, you simply cable them into this rack and add cables in the wiring
closet from the punch port back to the switch.  This has many advantages
-- one being that you can hook (selected) faculty or office DESKTOPS
into the gigabit switch so they are on the same flat network,
effectively INCLUDING THEM IN THE CLUSTER.  Since some of your faculty
-- the ones doing the MC computations, for example -- will have power
desktops that might equal or exceed the power of your initial cluster,
this gives EVERYBODY potential access to all of that power if you
establish a resource-sharing policy and can make your initial cluster
3-4x as powerful as it might otherwise be quite easily, especially if
you have spare cycles you can salvage on e.g. student clusters that are
idle all night.

Another scenario is that you get a single smaller gigabit switch for
your cluster, mount it in the rack or on the shelf of that cluster, and
have a single gigabit link back to the department network.  This gives
you a bottleneck between the faculty desktops and the cluster, but for
embarrassingly parallel code it won't matter. I'm guessing this is the
way you will go initially, and you can always change over later, but
SOMETIMES if you dicker things like the former out now, you can get
other people to pay for them and end up with something really nice and
scalable for the future, or at least grease the way for later when you
need to go back and say you've outgrown the first effort and need to
reconsider.  IF you ever get to where a higher end network is necessary
-- a "real" dedicated cluster network -- you'll probably need to use the
local switch architecture anyway, although you might well have both that
network and whatever switched gigE/TCP/IP network you started with at
the same time.

Anyway, enough on infrastructure.  Let's talk about the cluster and what
you'll need to acquire or budget.  I truly think that you're going to
need a budget of a few thousand dollars even to get started, although if
you can't get even that little an amount, well, we'll do what we can.

               Cheapest Possible 2-8 Box Learning Cluster

Ingredients:  Heavy duty steel shelving ($50 at home depot).  8-10 port
gigabit switch ($50 from numerous makers and vendors).  16 6' to 14'
patch cables, cable ties, 2 surge protector power strips, small work
table/bench, work chair -- scrounged if possible, $250 would buy it all
and a nice little pocket toolkit as well.

You will need a monitor and keyboard (and possibly a mouse) on the
workbench and connectable to the backs of each node on demand.
Scrounging is OK, you can get a nice flat panel that draws less power
(and makes less heat) and is a lot easier to move around for around
$200-250, a whole KVM setup for easily less than $300.  You may want to
consider getting a small KVM switch to make it "easy" to switch between
consoles on nodes but this is a luxury item and really belongs in the
next description instead.

For nodes you take what you can scrounge and augment them by buying what
you can afford.  You should be prepared to repair nodes, buy nodes
gigabit ethernet cards, and add memory or a disk to nodes, at cost or
from a "boneyard" of scavenged parts from systems that are DOA but have
usable memory chips or CPUs or power supplies that still work.  Still,
I'm guessing you'll need a few hundred dollars absolute minimum in a
budget to get started.  Your "free" nodes will only rarely turn out to
really be free; more often you'll have to drop maybe $50 into them to
add memory and networking (again, this cost and the differential cost of
power alone favors BUYING brand new nodes over fixing up old nodes --
THERE IS NO PRICE-PERFORMANCE WIN in going cheap, for all that it is
very informative and a great learning experience).

One node you will almost certainly want to buy, or build out of the best
of what you can scrounge.  This is your cluster's "head", or "server"
node.  I'm going to suggest a flat cluster design, so the latter is a
more reasonable description.  This is a machine you fix up or purchase
with:

   * lots of memory, 1-2 GB if possible.
   * multiple CPUs or CPU cores.  2-4 if possible.
   * a "good" e.g. Intel gigabit ethernet interface, or even two.
   * 3-4 largish disks, configured in an md raid level 5.
   * a "good" graphics adapter -- one capable of running a graphical
display efficiently and at a decent resolution (which should of course
match up decently with the capability of your monitor, which I suggest
be capable of at least 1280x1024 and at least 17" diagonal).

This machine is the one that you set up with a full linux desktop and an
NFS exportable filesystem for /home and/or workspace on all of the
nodes.  It MAY end up being a DHCP/PXE server (which may require that it
be on a private network in order not to fight with departmental servers
which in turn may require that it have that second ethernet interface),
a web server (to facilitate HTTP-driven PXE installs), a diskless node
server (if you go with a diskless node design to save money and power at
the expense of a somewhat steeper initial learning curve).  In
master-slave computations it will likely be the master.  In computations
run in "batches" it will be the place those jobs are submitted, and the
place users will visit to retrieve results.  It will be the node you
"name" for the cluster (usually) where the nodes will usually have
abbreviated hostnames like b01, b02, b03...

I would budget a MINIMUM of $1500 for this node, purchased new, $2000
would be better.  If you rebuild out of parts, you'll need to scrounge
an old system with a big enough tower to be able to hold 3-4 disks
(usually a mid-size tower will be a bit tight) with as fast a CPU as you
can manage and as much memory as you can afford to add and with 1-2 gigE
interfaces.  I am not including backup devices in this cluster design --
too expensive.

This gives you (tallying things up) the need for at absolute minimum a
budget of $1000-$1500, which presumes that you scrounge nearly
everything but still need to buy disks, memory, spare parts, network
switch, with a bit leftover to handle server crashes and make life
comfortable.  You'd do far better with a budget of $3500, buying
yourself a nice server/head node, setting up a nice working environment
and a much larger network switch from the beginning and still having
$1000 or so to fix up scrounged nodes.

NOTE WELL!  As noted above, if your cluster is "flat" with the
department (linux) network, you can easily enough make your scheduler
distribute jobs out to individual (linux) desktops and include them "in"
your cluster using e.g. Condor as a resource manager.  In fact, you can
make a "cluster" out of your existing linux LAN at no investment but
time and software configuration IF your department policy and so on
permit it. It often depends on who "owns" those desktops and what they
get out of it -- linux is perfectly capable of running a desktop
interactive session with somebody AND a background numerical task with
essentially no impact of the latter on the former -- desktop computing
rarely uses as much as 1% of a system's total compute capacity.

On to what I think of as a "better idea"

               Inexpensive Starter Cluster with a Future

The good thing about the cluster above is that it is cheap.  Oh, you can
go even cheaper.  Take two systems, slap linux on them, pop them on any
old network and it is "a cluster" in that you can run computations on
both at the same time and add more nodes when you find them.  Or just
look at your department linux lan, enable logins for all users on all
desktops, establish a policy for use or install a policy tool like
condor and go "poof!  you're a cluster!".  That's a description of my
own home cluster -- a flat switched network with lots of linux boxes
that are "a cluster" when I want them to be and desktops the rest of the
time, where I don't even bother with Condor (ownership being clear,
policy unneeded).

However, the BAD thing about it is that cheap as it is, anything built
with 4 year old hardware is a loser right out of the box.  Seriously.
The differential cost of POWER ALONE over a single year will generally
buy you a single modern system that is as fast or faster than the entire
cluster.  That's the bitchin' thing about Moore's Law -- there is no
sane afterlife for systems because it gets to where the cost of
operation alone exceeds the cost of replacement, and then we can do all
sorts of TCO computations and assessments of the cost of maintenace and
conclude that it is really really dumb to do this UNLESS other people
will pay for power no matter how much you use but not give you the money
to buy nodes.  Which happens so often that it isn't funny, but it is
stupid nevertheless.  Or for student/learning clusters, where you do
what you can and have NO budget but what you can raise at a bake sale.
I advise 2-3 students a year in that category, so I'm pretty sympathetic
to it, but I advise them to come up with a few thou a year budget
nevertheless.

So here's "better" design.  It costs more initially, but it will scale
nicely out to racks and racks of systems, and the systems you get will
always be boxes that your nuclear faculty will drool over and WANT to
run their jobs on -- so much so that initially they'll fight to get
time, and be properly motivated to write grants with an equipment budget
that contributes a few nodes a year or more to your collection.

Start by buying a nice, 43U, four post, open equipment rack.  IIRC you
can get one for around $400 that will work just fine (don't get $1000+
ones with glass doors and whatever -- you're not made of money after
all).  Get a nice 48 port professional-grade rackmount gigabit ethernet
switch for maybe $800.  Get a few packets of ethernet cables, different
colors, in lengths from 6' to 14', velcro cable ties, maybe a rackmount
power distribution system (not necessarily a "UPS", mind you), cable
holders -- enough stuff to outfit your rack so that it can be kept
"pretty" -- and easy to maintain.  This might cost another $500.

Into this put a nice rackmount raid system "head node" with maybe a TB
of storage capacity and a BACKUP system -- a tape library.  Initially
you can "get by" with what amounts to an enhanced node with four disks
and no backup, but you'll have to warn your users that there is no
backup and that they are responsible for securing, copying, mirroring
their own valuable data elsewhere.  Backup is expensive (which sucks)
but for a professional operation it is obviously essential.  I'd budget
a MINIMUM of $2500 for the disk server alone, $4000-5000 for disk server
plus backup.  These numbers are starting to get really soggy -- you'd
best get real quotes for exactly what you want to START with, then go
find the money and not the other way around lest you end up short!

Nodes are then added to the limit of your budget, ideally in a standard
form.  These days I'd recommend dual processor quad core nodes for
CPU-bound Monte Carlo computations, dual-duals for codes that do a lot
of vector algebra, and possibly plain old dual processor nodes if you
have jobs that are REALLY memory bound to where even dual cores start to
collide (YMMV very much here, be warned).  dual-quads will get you
optimum raw compute capacity per dollar, though, I think, and sound
ideal for your expected initial task mix.  Outfit the nodes with at
LEAST 1 GB per core, 2 GB is better.  Any nodes you buy in this way will
have 2 gigE interfaces integrated on the motherboard, which is fine.

Try to get 3 to 4 year onsite service contracts on all "critical"
electronic hardware you buy, from the switch on down.  As noted above 3
years = "infinity", at the end of this 3 year warranty you'll need to be
looking for replacement hardware in any event, as the cost of powering
any 8 nodes for a year will get really close to the cost of BUYING a
single node that will do the work of the 8 with the power cost of only
one.

Node prices, including warranty, will then range from as low as a bit
under $2000 to $4000 depending on memory, number of cores and so on.
Avoid bleeding edge processor clocks for YOUR starter cluster -- look
for the sweet spot in CPU clock (aggregate cycles) per dollar spent,
usually the second or third cheapest available CPU in any given
configuration (bearing in mind the TOTAL SYSTEM price, not just raw CPU
price in your cost-benefit estimates).

Going this route, $1000 for rack plus accessories, $2500 for a head
node, $2000 for a single worker node, $500 for error in my seat of the
pants estimates and miscellaneous stuff -- you can "get started" with at
least 4, maybe 8 >>modern<< (64 bit, uberfast) CPU cores for around
$5000, get started with backup for around $7000, get started nicely with
as many as 24 CPU cores for maybe $12,000.  Which is still, believe it
or not, chickenfeed in the research business.

This design scales beautifully.  Go to your nuclear groups, pass the
hat.  Offer them free room in the rack, access to server and switch and
backup (all paid for by the department, the university, a startup grant,
whatever) if they pony up $2000-4000 for N-core nodes that are selected
from the following list, with mandatory onsite service contracts.
They'll jump at the chance -- they'd have to spend twice as much to get
the same capacity as THEY'D have to provide access to a server, AC,
power, infrastructure, management.  Point out that with lots of
participants, they can share resources -- everybody individually will
have down time when they're writing papers, are out of town, on vacation
-- and they can trade access to their nodes when they're not using them
to others in return for the same favor the other day.  So if they buy a
node with 8 cores in the rack and so do three other groups, there might
come a day when they can use all 32 processors in a pinch to finish off
a paper before a deadline.

It is also easy to write proposals for.  Any of your groups can write or
add to a propoposal a budget for N nodes that fit in the existing rack.
University cost-sharing is manifest, resources are well-leveraged,
funding is likely.  With a full-height rack, you can add as many as 40
1U nodes to 3-5U of switches and servers, on a floor that can hold 1 ton
per square meter, in a space that can provide 8-10 KW and 4 tons of AC
(per filled rack).

THIS sort of design can scale right on out of your department.
Chemistry may want to play.  So might engineering.  Even economics does
large scale computations nowadays.  You might find yourself setting up
and filling a cluster room with multiple racks, wall-sized Liebert ACs,
and so on.

Or anyway, you can at least dream...;-)

Obviously I favor this approach if you can finagle the minimum $5K
buy-in, STRONGLY favor it if you can scare up $7K or more.  I also tend
to recommend that you look at e.g.  www.penguincomputing.com for
possible nodes, because they are linux-passionate and their AMD opteron
nodes are excellent performers and simply (to my own experience) do not
break.  They'll likely cut you a break of a few percent on a collective
"getting-started" price as well.

When trying to "sell" this approach, point out to the powers that $5K,
$10K, $15K is not the real cost of the cluster.  The $1 per watt per
year for power and AC (estimated) is not the real cost either.  The real
cost is the human time required to design it, set it up, and manage it.
That cost is $50K and up per year!  If you're doing this as a project
"for free", they are already getting tens of thousands of dollars of
free resource, which should certainly factor into the leverage required
to pry the money loose to take proper advantage of you!

    rgb

> I want to be able to make the cluster easily expandable, in that I will be
> starting with only a few machines (about 2-8), but will be acquiring more as
> time goes on.  The university that I am attending surpluses out "old"
> machines every 4 years, and we have set up a program where we can get a
> percentage of the surplus machines for out cluster.
>
> So, as for size.  Initially it will be a smaller cluster, but will grow as
> time goes on.
>
> Being new to the Beowulf world, I am just mainly looking for some advice as
> to what distro to use (I would never dream of setting up a cluster on
> windows) and if there were any little tricks that weren't mentioned in the
> setup how to guides.
>
> Oh, and I would also like to know if there was a way to set up a task
> priority where if I had only only application running it would use all the
> processors on the cluster, but if I had two tasks sent to the cluster then
> it would split the load between them and run both simultaneously, but still
> using a maximum for the needed processors.
>
> Thanks again so much,
>
> Berkley
>
> On Feb 8, 2008 9:11 AM, Robert G. Brown <rgb at phy.duke.edu> wrote:
>
>> On Thu, 7 Feb 2008, Berkley Starks wrote:
>>
>>> Hello all,
>>>
>>> I've been a computer user for the past several years working in
>> different
>>> areas of the IT world.  I've recently been commissioned by my university
>> to
>>> set up the first operating Beowulf Cluster.
>>>
>>> I'm am moderately familiar with the Linux OS, having ran it for the past
>>> several years using the distro's of Debian, Ubuntu, Fedora Core, and
>>> Mandriva.
>>>
>>> With setting up this new cluster I would like any advice possible on
>> what OS
>>> to use, how to set it up, and any other pertinent information that I
>> might
>>> need.
>>
>> This question has been answered on-list in detail a few zillion times.
>> I'd suggest consulting (in rough order):
>>
>>   a) The list archives (now that you're a member you can get to them,
>> although they are digested and googleable for the most part anyway).
>>
>>   b) Google.  For example, there is a lovely howto here:
>>
>>     http://www.linux.org/docs/ldp/howto/Parallel-Processing-HOWTO.html
>>
>> that is remarkably current and a good quick place to start.
>>
>>   c) Feel free to browse my free online book here:
>>
>>     http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book.php<http://www.phy.duke.edu/%7Ergb/Beowulf/beowulf_book.php>
>>
>> I'm working on making it paper-printable via lulu, but I need time I
>> don't have and so that project languishes a bit.  You "can" get a paper
>> copy there if you want, but it is pretty much what is on the free
>> website including the holes.
>>
>>> Oh, and the cluster will be used for computational physics.  I am a
>> physics
>>> major making it for the physics department here.  It will need to be
>> able to
>>> use C++ and Fortran at a bare minimum.
>>
>> C, C++ and Fortran are all no problem.  The more important questions
>> are:
>>
>>   a) How coupled are the parallel tasks?  That is, do you want a cluster
>> that can run N independent jobs on N independent nodes (where the jobs
>> don't communicate with each other at all), or do you want a cluster
>> where the N nodes all do work on a common task as part of one massive
>> parallel program?  If the former, you're in luck and cluster design is
>> easy and the cluster purchase will be cheap.
>>
>>   b) If they are coupled, are the tasks "tightly coupled" so each
>> subtask can only advance a little bit before communications are required
>> in order to take the next step?  "Synchronous" so all steps have to be
>> completed on all nodes before any can advance?  Are the messages really
>> big (bandwidth limited) or tiny and frequent (latency limited)?
>>
>> If any of these latter answers are "yes", post a detailed description of
>> the tasks (as best you can) to get some advice on choosing a network, as
>> that's the design parameter that is largely controlled by the answers.
>>
>>    rgb
>>
>>>
>>> Thanks again
>>>
>>
>> --
>> Robert G. Brown                            Phone(cell): 1-919-280-8443
>> Duke University Physics Dept, Box 90305
>> Durham, N.C. 27708-0305
>> Web: http://www.phy.duke.edu/~rgb <http://www.phy.duke.edu/%7Ergb>
>> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php<http://www.phy.duke.edu/%7Ergb/Lilith/Lilith.php>
>> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
>>
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From jlforrest at berkeley.edu  Wed Feb 13 09:58:09 2008
From: jlforrest at berkeley.edu (Jon Forrest)
Date: Wed, 13 Feb 2008 09:58:09 -0800
Subject: [Beowulf] Opinions of  Hyper-threading?
Message-ID: <47B32FB1.60905@berkeley.edu>

I inherited a cluster containing a bunch
of Xeon-based compute nodes. The compute
nodes were configured with hyper-threading
turned on. I'm wondering what you HPC cluster
people think of hyper-threading. I haven't
heard much about it recently since most
modern processors are true multi-core.

The main thing I'd like to know is whether
hyper-threading can do any harm when cpu
bound jobs are run.

Cordially,
-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlforrest at berkeley.edu


From bill at cse.ucdavis.edu  Wed Feb 13 11:00:15 2008
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Wed, 13 Feb 2008 11:00:15 -0800
Subject: [Beowulf] Opinions of  Hyper-threading?
In-Reply-To: <47B32FB1.60905@berkeley.edu>
References: <47B32FB1.60905@berkeley.edu>
Message-ID: <47B33E3F.5000504@cse.ucdavis.edu>


If you don't want handwaving, I'd just test it.  There are jobs that do work 
with HT, and those that don't.

 From the tests I've done it's not particularly reliable.  So the performance 
you get depends on what else the CPU is doing.  So if you have jobs A and B
on a single CPU with 2 HT threads the performance of A and B vary depending on 
the phase of the moon.  So if jobs A and B use 16 CPUs and have to make 
progress in lock step (common in parallel jobs) you get the worst case of
16 CPUs, which is VERY likely to be less than turning HT off.

I have seen occasional improvements in throughput of 5-10% or so.

So without testing I'd vote turn it off.

The best benchmark is your code.


From mathog at caltech.edu  Wed Feb 13 11:00:46 2008
From: mathog at caltech.edu (David Mathog)
Date: Wed, 13 Feb 2008 11:00:46 -0800
Subject: [Beowulf] Re: Setting up a new Beowulf cluster
Message-ID: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>

"Robert G. Brown" <rgb at phy.duke.edu> wrote:

> Your cluster will need a home, and there are good homes and not
> so good homes depending on its scale.

A cluster in its home will be somebody's neighbor.  It may not be a very
nice neighbor.  In general you do not want to put a rack's worth of
computers in a room which is normally inhabited.  Too much noise.  WAY
too much noise.  Did I mention NOISE?  There is no inexpensive way to
quiet down a rack because to first order sound insulation == heat
insulation.  It can be quiet, or it can be cool, but it is really hard
to do both.  Your best bet is to place large numbers of computers
in a machine room of some sort.  A normal framed wall may be enough to
subdue most of the noise for a smallish machine room, so that the rooms
on either side can be used for normal tasks.  (The low frequencies may
still come through, but the resulting dull rumble is not that
obtrusive.)  The noise could in some cases travel through the 
ventilation to adjoining rooms.  It is easy to test for sound properties
during construction - place a radio in the machine room, turn it to
pure static, and crank the volume up until it is obnoxious.  Then see
what it sounds like outside and next door.

Also buy some hearing protection ear muffs for use in the machine
room.  They are much cheaper than hearing aids.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From doseyg at r-networks.net  Wed Feb 13 12:07:56 2008
From: doseyg at r-networks.net (Glen Dosey)
Date: Wed, 13 Feb 2008 15:07:56 -0500 (EST)
Subject: [Beowulf] Opinions of  Hyper-threading?
In-Reply-To: <47B32FB1.60905@berkeley.edu>
References: <47B32FB1.60905@berkeley.edu>
Message-ID: <34676.155.82.73.253.1202933276.squirrel@www.r-networks.net>

This is just based my experience, and YMMV. I haven't had to deal with
this in quite a while (thank goodness).

In general, if you can guarantee in some fashion that jobs are limited and
bound to the number of real processors available and not HyperThreads,
then there is some (almost immeasurable?) benefit to leaving it on to help
handle the various OS stuff running in the background. If you have a
scheduler assigning jobs based on the how many CPUs a system appears to
have, you will want to turn it off. Basically in my humble opinion, it's
safer and easier to turn off unless you can explicitly state when it will
prove useful and then somehow make it behave in that manner.

Here's a relevant link....
http://softwarecommunity.intel.com/articles/eng/2510.htm


> I inherited a cluster containing a bunch
> of Xeon-based compute nodes. The compute
> nodes were configured with hyper-threading
> turned on. I'm wondering what you HPC cluster
> people think of hyper-threading. I haven't
> heard much about it recently since most
> modern processors are true multi-core.
>
> The main thing I'd like to know is whether
> hyper-threading can do any harm when cpu
> bound jobs are run.
>
> Cordially,
> --
> Jon Forrest
> Research Computing Support
> College of Chemistry
> 173 Tan Hall
> University of California Berkeley
> Berkeley, CA
> 94720-1460
> 510-643-1032
> jlforrest at berkeley.edu
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>


From hahn at MCMASTER.CA  Wed Feb 13 13:38:04 2008
From: hahn at MCMASTER.CA (Mark Hahn)
Date: Wed, 13 Feb 2008 16:38:04 -0500 (EST)
Subject: [Beowulf] Opinions of  Hyper-threading?
In-Reply-To: <47B32FB1.60905@berkeley.edu>
References: <47B32FB1.60905@berkeley.edu>
Message-ID: <Pine.LNX.4.64.0802131632260.25631@coffee.psychology.mcmaster.ca>

> turned on. I'm wondering what you HPC cluster
> people think of hyper-threading. I haven't

Intel's P4 hyperthreading was often just hype.
after all, it's really just timeslicing the same CPU 
resources at a finer granularity than normal OS-based preemption.
the chips contain no additional functional units, etc.
when a thread hits a potentially long stall (say, a cache miss)
the scheduler switches to the other thread.

as far as it goes, it makes good sense.  but suppose you have 
two threads, each of whose working set occupies most of the cache.
without HT, each will run until preempted by the OS;
with HT, they'll constantly get reactivated, and both will see 
drastically bad cache hit rates.

> heard much about it recently since most
> modern processors are true multi-core.

Intel dropped it in the last netburst P4.  they make noises about
bringing some form of SMT back in later/future processors.

> The main thing I'd like to know is whether
> hyper-threading can do any harm when cpu
> bound jobs are run.

sure.  in a meaningful sense, HT will hurt to the extent that 
the job is well-tuned...
if it's not compute-bound, HT won't hurt.  if the job is something
like a compiler, often stalled on cache misses, system throughput 
could very well increase if you overcommit.


From hahn at mcmaster.ca  Wed Feb 13 14:16:55 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Wed, 13 Feb 2008 17:16:55 -0500 (EST)
Subject: [Beowulf] Re: Setting up a new Beowulf cluster
In-Reply-To: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
Message-ID: <Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>

>> Your cluster will need a home, and there are good homes and not
>> so good homes depending on its scale.

and design.  a noisy cluster is one that's trying hard to stay cool.
a non-noisy cluster is not trying hard - perhaps because it has 
low-power processors, or because its intake air is cold,
or perhaps it's turned off ;)

there are some cluster nodes that are just unnecessarily noisy - 
they run their fans at 100% all the time, or have too many fans,
or poor design.  in general, you want fewer big fans, preferably 
one large impeller-type fan rather than a dozen tiny muffin fans.
a taller chassis can make it easier to move the requisite air,
but remember that cooling requires moving the air in one side
and out the other.  intra-case circulation is no real help.

the one-side-to-other-side thing is what gives rise to front-to-back 
server chassis and hot/cold-aisle machineroom layouts.  but the 
best machineroom circulation is simple and well-controlled.
don't introduce h/c aisles if you can just have a single row 
with a single hot side and a single cold side.  if you can
arrange for both hot and cold plenums, do, since the goal is to 
contol the airflow, not let it do what it wants.  for instance, 
building a physical partition - drapes or even a wall, between
hot and cold regions is a very good thing.  leaving space above
racks that permits hot air to travel to the rack-front is very bad.

> too much noise.  Did I mention NOISE?  There is no inexpensive way to
> quiet down a rack because to first order sound insulation == heat
> insulation.  It can be quiet, or it can be cool, but it is really hard
> to do both.

if you take it as given that nodes are noisy, I suppose.

but it's easy to imagine a large cluster where nodes are actually passively
cooled or have quite slow fans.  but the machineroom would need exceptionally
good partitioning of hot and cold sides, and (if nodes are fanless) would
need some hefty air-moving hardware somewhere.  in this imagined layout,
you'd really have a hot room and a cold room, with the fronts of the racks in
cold, and the backs in hot.

putting a dozen 15k rpm fans in each node is the noisiest possible way
to move the air.  a few very large fans could be quite quiet, and 
not involve much in the way of either sound or heat insulation.

there are some instances of using heatpipes to get heat out of the node
and into chilled water or refrigerant.  it's a little unclear to me why
this is not done more, especially among the blade-loving community.

> A normal framed wall may be enough to
> subdue most of the noise for a smallish machine room, so that the rooms
> on either side can be used for normal tasks.  (The low frequencies may

the wall between my office and our machineroom is duplex, and it 
works well.  the machineroom is pretty noisy, but most of the noise
leakage is from the door.  across the hall is a conference room,
which has the same kind of duplex wall shared with the machineroom, 
and it's quite nice.  we have those 1-sq-ft ceiling tiles on one wall to
deaden the room a little, but that's mainly to improve the teleconf 
acoustics.

I guess most of my comments were for fairly large clusters.  for small
clusters, I'd probably first try to avoid generating much noise.  it still
makes sense to aim for a simple, well-controlled circulation pattern,
but if you can afford the space to use desktop machines, for instance, 
the result might be quieter than 1U screamers.

then again, a quiet cluster is less impressive 
when it comes time to impress the visitors ;)


From rgb at phy.duke.edu  Wed Feb 13 14:43:57 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed, 13 Feb 2008 17:43:57 -0500 (EST)
Subject: [Beowulf] Re: Setting up a new Beowulf cluster
In-Reply-To: <Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.64.0802131741390.4526@cain.rgb.private.net>

On Wed, 13 Feb 2008, Mark Hahn wrote:

> then again, a quiet cluster is less impressive when it comes time to impress 
> the visitors ;)

Although it is a lot easier to talk to those visitors while showing them
around.  Our server room is at infinity decibels all the time, not from
the cluster nodes but from the AC, which sounds like a 747 as it moves
huge amounts of air around.  It would have been lovely to put the unit
in a room of its own next door, but we couldn't take both spaces.  In
our server room, ear protectors aren't really that crazy.

    rgb

> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From James.P.Lux at jpl.nasa.gov  Wed Feb 13 15:27:53 2008
From: James.P.Lux at jpl.nasa.gov (Jim Lux)
Date: Wed, 13 Feb 2008 15:27:53 -0800
Subject: noise Re: [Beowulf] Re: Setting up a new Beowulf cluster
In-Reply-To: <Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmast er.ca>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
Message-ID: <6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>

At 02:16 PM 2/13/2008, Mark Hahn wrote:
>>>Your cluster will need a home, and there are good homes and not
>>>so good homes depending on its scale.
>
>
>>too much noise.  Did I mention NOISE?  There is no inexpensive way to
>>quiet down a rack because to first order sound insulation == heat
>>insulation. \

Actually, no.. good acoustic isolation is not good thermal 
isolation.  Sure, things like fiberglass batts provide thermal 
insulation and also (slightly) attenuate high frequencies.

What you want (for acoustics) is loss, mass, 
compliance/springyness.  That is, the sound wave hits something, and 
rather than being transmitted somewhere else, it just dies 
there.  The traditional approach is to build something like a 6" 
thick wall with alternating 4" wide studs (i.e. no direct mechanical 
connection between the panel on one side and the panel on the 
opposite side of the wall).  In between the gap, one hangs thin 
sheets of lead, or plastic loaded with metal filings, bituminous 
rubber (aka Bituthane, a synthetic rubber sheet loaded with asphalt 
for mass) etc.

There are these nifty things called "Z brackets" that are used to 
attach the wallboard to the studs in acoustic isolation walls.  They 
help a lot, because they isolate the vibration of the panel from the stud.
(Google found this: 
http://www.wilrep.com/WilrepDataSheets/noisecontrol/RSIC/Clips.htm)

So, you have acoustic wave on one side hits wall.  Wall is hopefully 
not a stiff diaphragm, but is something moderately compliant, and 
lossy (imagine hanging a carpet up), so its movement dissipates some 
energy.  The back side of the wall transmits an acoustic wave to the 
sheet of lead.  It doesn't displace very far, and is quite soft, so 
it just deforms and gets warm. (one would not, for instance, want a 
thin beryllium or titanium membrane.. things that make good speaker 
cones make terrible acoustic isolators).

And so on.  It's basically a cascade of lossy low pass filters 
(inductance = mass, resistance = mechanical loss, capacitance = 
springiness... you want big L and C (low frequency cutoff) and you 
want big R (lots of loss)


Your big sound transmission paths are going to be unavoidable 
mechanical connections (the floor, or sill plate of the wall) and 
unbroken air paths (door gaskets, a/c vents, etc.)

For windows, you use two panes, set at an angle relative to each 
other, of different sizes, in resilient mountings... this reduces the 
coupling from one pane of glass to the other.


>putting a dozen 15k rpm fans in each node is the noisiest possible way
>to move the air.  a few very large fans could be quite quiet, and 
>not involve much in the way of either sound or heat insulation.

Indeed.. the way to move lots of air is big fans turning slowly (and 
think how post-apocalypic sci-fi movie it looks.. Think of movies 
like "Total Recall".  A guideline in the HVAC business is to keep the 
velocity of the air below 1000 feet/min, preferably half that.

And, there are HUGE differences in the noise level for fans with 
identical air moving performance. The amount of mechanical energy 
going into acoustic power is tiny, even if the sound is quite loud.

A helicopter radiates, for instance, about 10 milliwatts of acoustic 
power, and moves on the order of a million cubic feet per minute of air.


>there are some instances of using heatpipes to get heat out of the node
>and into chilled water or refrigerant.  it's a little unclear to me why
>this is not done more, especially among the blade-loving community.


Maintenance hassles.  some of the larger IBM S/360s were liquid 
cooled, and a pain to deal with.  Anything with liquids and 
connectors will leak.


>then again, a quiet cluster is less impressive when it comes time to 
>impress the visitors ;)


That's what the big Tesla Coil or quarter shrinker is for.

>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf


From deadline at eadline.org  Wed Feb 13 17:27:07 2008
From: deadline at eadline.org (Douglas Eadline)
Date: Wed, 13 Feb 2008 20:27:07 -0500 (EST)
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
Message-ID: <60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>

Just saw this mentioned on Slashdot.

High Performance SSH/SCP - HPN-SSH

http://www.psc.edu/networking/projects/hpn-ssh/

The internal flow control buffers in openssh are small
and static. The guys at Pittsburgh SC have created a
patch that can be applied to openssh that dynamically increases
the buffers which dramatically improves performance. They
also multi-threaded the crypto. So if you were wondering
about what to do with that extra core, now you have your
answer.


--
Doug


From rgb at phy.duke.edu  Thu Feb 14 03:04:33 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 14 Feb 2008 06:04:33 -0500 (EST)
Subject: noise Re: [Beowulf] Re: Setting up a new Beowulf cluster
In-Reply-To: <6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
Message-ID: <Pine.LNX.4.64.0802140529420.5266@cain.rgb.private.net>

On Wed, 13 Feb 2008, Jim Lux wrote:

> That's what the big Tesla Coil or quarter shrinker is for.

Mad science.

Oh, yeah.  Let's put a great big tesla coil right in with all those
computers!  Wait, I hear it now...

   Fzzzzssszzzssszzzssst.

(...that's the sound of all those itty bitty gaps on a circuit board or
NIC arcing at the same time...;-)

OK, funny story time, sort of.  Stop me if you've heard this one.

My kids in E&M get to do an extra credit project for a 1/3 of a letter
grade promotion at semester's end, and maybe a decade ago I had a
student who wanted to build a tesla coil for his project and I said,
sure, cool, go for it.  So off he went and with whatever web browsers
were around and pre-google alta vista found some howto sites for
building coils, and a few weeks later ran down a neon sign transformer,
built a saltwater-aquarium-wine-bottle capacitor array, assembled a
fan-quenched spark gap, and hand-wound the coils and added a toroid on
top.

We still had our "old" lab rooms for the intro courses -- no computers,
stained lab benches and tables a big lead sink and gas and air nozzles
in the central bench(es) up front.  Imagine old wooden (oak) chassis lab
equipment in glassed cabinets around the walls, a huge beam balance with
brass weights that was probably worth a kilobuck as an antique on top of
a tall cabinet in the back, that sort of thing.  So my student rolls his
creation on a big cart into this, and I and the class all gather to
watch.

Naturally, we turn off all the lights and darken the shades the better
to see the lightning.  Student hooks it all up, flicks the switch on the
neon sign transformer to power it all up, and bzzzzaaappppppp -- the
spark gap starts going off like a machine gun and footlong purple
lightning starts zapping off the top toroid, impressive as all hell.

And every fluorescent light in the room goes on.

And they were turned OFF, remember.  They were "on" being driven by the
radiated RF power coming off of the thing with no other source, just
like Tesla dreamed.

In addition, as I walked around the room, I noted that pretty much every
metal gap a millimeter or less was arcing.  Little arcs zapping across
the fixtures in the sink, the bolts on the tables, no doubt across the
wires holding up the drop ceiling.  I could imagine arcing occurring
across my teeth if I grinned just right.

After a few minutes of harmless fun and demos (which involved yours
truly taking a 100+ kV "hit" straight in through the >>glass<< of a
fluorescent tube that I inadvertently waved too near the toroid and
drawing down the fire to pass through me to ground through my rubber
soled shoes, which amused the heck out of the kids but which was NOT fun
for me) we powered it off and it went into class history as one of the
coolest projects ever.

Three years ago, a second round of students wanted to build one, and did
so using 1F caps that you can apparently now buy over the counter --
back when the first one was built I used to tell students that a 1F
capacitor would end up being the size of a bench or good sized filing
cabinet, but this is no longer true.  In the meantime, all the lab rooms
were gutted and rebuilt, and each workstation has its own computer.  The
entire building is now filled with computers.  The computers are now all
unshielded twisted pair networked, not thinwire ethernet.  If I were to
turn on a tesla coil inside the building ANYWHERE (unless it were inside
a faraday cage, of course), I'd probably blow $10,000 worth of
equipment, as a tesla coil is sort of a steady state EMP bomb or solar
flare on a table.

We demo'd this one OUTDOORS in the parking lot, figuring that the
building steel would act as enough of a cage to protect the interior,
with a surge protector inline to help keep the primary power cable from
carrying back too much of an RF harmonic onto the building wiring.  I
was a bit worried about the cars nearby -- if you drive an arc at e.g.
the cap on a gasoline supply it can be a bad thing -- but cars tend to
have metal on the outside and again cage off their guts.  No worries --
or at any rate none of the cars exploded or blew their starter coils.

But putting one in a server room, with all of those wires strung around
in loops and connected to electronics that really hates high voltage
even at very low current -- that's just plain funny...:-)

     rgb

(P.S. -- every few years I have to explain to one student or another
that no, they are NOT permitted to build an EMP bomb for their project.
They are driven by high explosive and -- however much fun it would be --
where and how would we test it?  Without, of course, bringing out the
mob with pitchforks and torches afterwards...)

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From rgb at phy.duke.edu  Thu Feb 14 03:20:57 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 14 Feb 2008 06:20:57 -0500 (EST)
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
Message-ID: <Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>

On Wed, 13 Feb 2008, Douglas Eadline wrote:

> Just saw this mentioned on Slashdot.
>
> High Performance SSH/SCP - HPN-SSH
>
> http://www.psc.edu/networking/projects/hpn-ssh/
>
> The internal flow control buffers in openssh are small
> and static. The guys at Pittsburgh SC have created a
> patch that can be applied to openssh that dynamically increases
> the buffers which dramatically improves performance. They
> also multi-threaded the crypto. So if you were wondering
> about what to do with that extra core, now you have your
> answer.

It would be nice if they would revive the old ssh flag for turning off
encryption altogether.  That alone would make let us all finally and
permanently put a stake through the wicked heart of rsh.  It is fine to
have ssh to the hostauth/userauth handshake, because with rsh one might
as well just openly invite crackers into your system as a user (and with
the kernel bug that I'm hoping everybody has just finished updating,
promotion to root is/was appallingly easy from accounts on a system).
For most computational purposes, though, I could care less if anyone
READS the intermediate traffic.

The openssh folks got a bit hypervigilant going from the old ssh top
openssh.  They also removed the symlink feature of ssh inherited from
rsh, and made it a PITA to background tasks and exit with the tasks
running.

Any idea if the openssh people are going to incorporate the patch
permanently?  Any hope of replacing the "Ciphers none" option (even with
a NoCiphers boolean control on the sshd config that allows a local
sysadmin to disable it for the masses in non-cluster environments) and
restoring control to the actual local sysadmin and user base instead of
preempting the choice?

    rgb

>
>
> --
> Doug
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From nixon at nsc.liu.se  Thu Feb 14 04:42:51 2008
From: nixon at nsc.liu.se (Leif Nixon)
Date: Thu, 14 Feb 2008 13:42:51 +0100
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net> (Robert
	G. Brown's message of "Thu\,
	14 Feb 2008 06\:20\:57 -0500 \(EST\)")
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
Message-ID: <m3abm3zp2s.fsf@unna.nsc.liu.se>

"Robert G. Brown" <rgb at phy.duke.edu> writes:

> Any idea if the openssh people are going to incorporate the patch
> permanently?

The OpenSSH people doesn't like Chris' code and haven't had time to
rewrite it, AFAIU. Also they believe that the changes will *decrease*
performance in certain cases.

(We do use Chris' patches.)

-- 
Leif Nixon                       -            Systems expert
------------------------------------------------------------
National Supercomputer Centre    -      Linkoping University
------------------------------------------------------------


From hahn at mcmaster.ca  Thu Feb 14 07:32:36 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Thu, 14 Feb 2008 10:32:36 -0500 (EST)
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
Message-ID: <Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>

> It would be nice if they would revive the old ssh flag for turning off
> encryption altogether.  That alone would make let us all finally and

the PSC folk have a patch for that "HPN SSH".  it works fine.
it's actually smart enough to do the session setup (auth, etc)
using the usual set of ciphers, then switches to unencrypted 
("ssh -oNoneSwitch=yes -oNoneEnabled=yes").

> Any idea if the openssh people are going to incorporate the patch
> permanently?  Any hope of replacing the "Ciphers none" option (even with

I think they've permanently rejected it.

I find that arcfour is a lot faster than the other (non-none) defaults.


From mathog at caltech.edu  Thu Feb 14 08:39:31 2008
From: mathog at caltech.edu (David Mathog)
Date: Thu, 14 Feb 2008 08:39:31 -0800
Subject: [Beowulf] Re: Setting up a new Beowulf cluster
Message-ID: <E1JPh7T-0003ZQ-RJ@mendel.bio.caltech.edu>

Jim Lux <James.P.Lux at jpl.nasa.gov> wrote:

> >>quiet down a rack because to first order sound insulation == heat
> >>insulation. \
> 
> Actually, no.. good acoustic isolation is not good thermal 
> isolation.  Sure, things like fiberglass batts provide thermal 
> insulation and also (slightly) attenuate high frequencies.

I guess I should have used => or some other "implies".  Sound insulators
tend to be good heat insulators, heat insulators are generally not good
sound insulators.

I spent way too long trying to quiet down a rack when it had to live in
a classroom.  Mass loaded vinyl on all 4 sides worked fairly well
to stop the noise coming out that way, but then it just turned into a
big speaker enclosure and directed nearly as much sound out the fan
holes, where it bounced off the ceiling and floor.  And the rack exhaust
fans (2 very high capacity 120mm fans on the top) were not able to keep
it cool when it was fully sound insulated.  The rated capacity
of those two fans was more than the sum of all the little ones in the
nodes, but the air flow was too restricted, I think mostly by the narrow
space between the node's front panels and the front insulator panel.
Thankfully it finally moved to a machine room and the noise problem went
away.

Anyway, it is a much easier to sound insulate a room than it is a single
noisy rack.

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


From rgb at phy.duke.edu  Thu Feb 14 09:33:57 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 14 Feb 2008 12:33:57 -0500 (EST)
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>

On Thu, 14 Feb 2008, Mark Hahn wrote:

>> It would be nice if they would revive the old ssh flag for turning off
>> encryption altogether.  That alone would make let us all finally and
>
> the PSC folk have a patch for that "HPN SSH".  it works fine.
> it's actually smart enough to do the session setup (auth, etc)
> using the usual set of ciphers, then switches to unencrypted ("ssh 
> -oNoneSwitch=yes -oNoneEnabled=yes").
>
>> Any idea if the openssh people are going to incorporate the patch
>> permanently?  Any hope of replacing the "Ciphers none" option (even with
>
> I think they've permanently rejected it.

Grrrr.  Do the PSC folks plan to put their code into e.g. F8 as a
fork/option?  It is so easy to yum install X, so big a pain to hand
build an rpm to install as an alternative and then have to hand maintain
thereafter as e.g. security fixes come out.

    rgb

>
> I find that arcfour is a lot faster than the other (non-none) defaults.
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From ed at eh3.com  Thu Feb 14 10:01:16 2008
From: ed at eh3.com (Ed Hill)
Date: Thu, 14 Feb 2008 13:01:16 -0500
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
Message-ID: <20080214130116.0bff9b7d@localhost.localdomain>

On Thu, 14 Feb 2008 12:33:57 -0500 (EST) "Robert G. Brown" wrote:

> >
> >> Any idea if the openssh people are going to incorporate the patch
> >> permanently?  Any hope of replacing the "Ciphers none" option
> >> (even with
> >
> > I think they've permanently rejected it.
> 
> Grrrr.  Do the PSC folks plan to put their code into e.g. F8 as a
> fork/option?  It is so easy to yum install X, so big a pain to hand
> build an rpm to install as an alternative and then have to hand
> maintain thereafter as e.g. security fixes come out.


Fedora's policy is to closely follow upstream.  Its *very* unlikely
that Fedora will ever accept patches that the official OpenSSH
maintainers reject.

Hopefully the PSC folks (or other parties) can convince the OpenSSH
maintainers to allow some options (dynamic buffers, disabled encryption
of the non-handshake payload, etc.) that result in higher-bandwidth
transfers.

Ed

ps - For ssh transfers over long distances (high-bandwith and high
     latency) one can break the data up into pieces and run multiple 
     ssh transfers in parallel.  Transmitting terabyte data sets from 
     the West Coast to the East Coast over Abilene I've seen >10X
     speedups (that is, >10X the overall throughput) when running 
     15--20 simultaneous ssh sessions.


-- 
Edward H. Hill III, PhD  |  ed at eh3.com  |  http://eh3.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080214/16675c3f/attachment.sig>

From lindahl at pbm.com  Thu Feb 14 10:16:56 2008
From: lindahl at pbm.com (Greg Lindahl)
Date: Thu, 14 Feb 2008 10:16:56 -0800
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
Message-ID: <20080214181655.GA2972@bx9.net>

On Thu, Feb 14, 2008 at 12:33:57PM -0500, Robert G. Brown wrote:

> Grrrr.  Do the PSC folks plan to put their code into e.g. F8 as a
> fork/option?  It is so easy to yum install X, so big a pain to hand
> build an rpm to install as an alternative and then have to hand maintain
> thereafter as e.g. security fixes come out.

Repos like rpmforge are what you're looking for -- no convincing
needed. But that's only for cipher none, the other changes may yet go
in.

-- greg


From hahn at mcmaster.ca  Thu Feb 14 11:07:36 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Thu, 14 Feb 2008 14:07:36 -0500 (EST)
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <20080214130116.0bff9b7d@localhost.localdomain>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
	<20080214130116.0bff9b7d@localhost.localdomain>
Message-ID: <Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>

> Hopefully the PSC folks (or other parties) can convince the OpenSSH
> maintainers to allow some options (dynamic buffers, disabled encryption
> of the non-handshake payload, etc.) that result in higher-bandwidth
> transfers.

the switch-to-none feature is a good one.  I'm a bit skeptical about
the buffering changes, since you can accomplish the same thing with 
sysctls.  (has anyone ever experienced a real case where too-large 
sysctl net memory settings caused problems?  obviously, attempting 
to do long-fat-pipe transfers to a heavily used web server might 
be a problem, since the latter wants tight controls on sock mem use.)

the really cool thing would be if you could associate a default 
setting for socket buffers with a _route_.  heck, a round-port combo.
it seems crazy for apps to be messing with these issues.

> ps - For ssh transfers over long distances (high-bandwith and high
>     latency) one can break the data up into pieces and run multiple
>     ssh transfers in parallel.  Transmitting terabyte data sets from
>     the West Coast to the East Coast over Abilene I've seen >10X
>     speedups (that is, >10X the overall throughput) when running
>     15--20 simultaneous ssh sessions.

this implies inappropriate buffer size settings, no?  what's the bw*delay
product for that path?


From rgb at phy.duke.edu  Thu Feb 14 11:45:33 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Thu, 14 Feb 2008 14:45:33 -0500 (EST)
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
	<20080214130116.0bff9b7d@localhost.localdomain>
	<Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.64.0802141427120.8063@cain.rgb.private.net>

On Thu, 14 Feb 2008, Mark Hahn wrote:

>> Hopefully the PSC folks (or other parties) can convince the OpenSSH
>> maintainers to allow some options (dynamic buffers, disabled encryption
>> of the non-handshake payload, etc.) that result in higher-bandwidth
>> transfers.
>
> the switch-to-none feature is a good one.  I'm a bit skeptical about
> the buffering changes, since you can accomplish the same thing with sysctls. 
> (has anyone ever experienced a real case where too-large sysctl net memory 
> settings caused problems?  obviously, attempting to do long-fat-pipe 
> transfers to a heavily used web server might be a problem, since the latter 
> wants tight controls on sock mem use.)

I think it is hard to improve over the TCP defaults for messages unless
you know something very specific about the message types -- its pretty
easy to rob peter (latency) and pay paul (bandwidth) or vv.  So I'm not
really that excited about the buffering changes either, although if
somebody were to demonstrate that they are a uniform win for (say) 90%
of all traffic patterns and not too big a loss for the losers I could be
convinced.

The main reason I think no encryption is a sane option is the usual one
-- encryption is expensive, on both ends, and it can consume BOTH CPU
AND network in lots of cases -- encryption can easily expand a message
size and in either event requires an actual computation on both ends to
recover the message.  I think being able to turn it off, iff enabled in
/etc/ssh/sshd.config by the site sysadmin, is a very sane thing to want
to be able to do in problems where the cost-benefit of encryption is
poor (that is, unimportant messages that you don't care if the world
reads).  I'm assuming a sane sysadmin would only turn on the feature for
e.g. cluster nodes in a fairly well controlled network anyway -- I can't
see people turning it on on WAN-accessible clients too often.

Still, for many parallel applications it doesn't really matter.  The
"real" IPCs are managed by e.g. PVM or MPI or a socket you completely
control on both ends.  ssh is mostly used just to securely fork off
tasks or daemons across cluster nodes and then goes away and no longer
is used for communications.  With an exception for little demo shorty
parallel programs like my venerable "taskmaster" perl script intended to
just show one way of doing parallel work for EP tasks.

What the openssh people don't seem to "get" is that by FORCING people to
use encryption, they are actually keeping rsh alive and a potential
security risk for all sorts of people in the cluster business for whom
performance is more important than security given their networking
environment and goals.  Otherwise, who would ever install it?

They are not, in other words, increasing the net security of the linux
universe, because ssh with a secure handshake only is WAY more secure
and environmentally friendly (in the sense that one can pass an
environment) than rsh on its best day.  It's not so easy any more to
hijack an established TCP connection.  With rsh for "security", what can
one do?  No password, anybody can spoof in as you on the wire.  With
password, anyone can snoop the wire.  The only security possible is
keeping everybody off the wire, which sucks and isn't always possible.
So they'd make lots of people very happy if they would actually give
cluster people a WAY to make rsh finally go away.

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From hahn at mcmaster.ca  Thu Feb 14 12:34:26 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Thu, 14 Feb 2008 15:34:26 -0500 (EST)
Subject: [Beowulf] Re: High Performance SSH/SCP
In-Reply-To: <47B49F3D.3060307@psc.edu>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
	<20080214130116.0bff9b7d@localhost.localdomain>
	<Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
	<47B49F3D.3060307@psc.edu>
Message-ID: <Pine.LNX.4.64.0802141532530.22547@coffee.psychology.mcmaster.ca>

> A colleague just pointed me to this thread, I'll try to keep an eye on it if 
> there are any questions, or feel free to contact hpn-ssh at psc.edu

thanks for the followup.  can you comment on the question of whether
the HPN changes will enter mainstream openssh?  (the 4.7 buffering 
change makes it sound like this is partly happening, but 'none' support
would also be very nice to have...)

thanks, mark hahn.


From tjrc at sanger.ac.uk  Thu Feb 14 15:54:24 2008
From: tjrc at sanger.ac.uk (Tim Cutts)
Date: Thu, 14 Feb 2008 23:54:24 +0000
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <Pine.LNX.4.64.0802141427120.8063@cain.rgb.private.net>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
	<20080214130116.0bff9b7d@localhost.localdomain>
	<Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141427120.8063@cain.rgb.private.net>
Message-ID: <87316AB4-25DB-4591-8545-7164617317AF@sanger.ac.uk>


On 14 Feb 2008, at 7:45 pm, Robert G. Brown wrote:

> What the openssh people don't seem to "get" is that by FORCING  
> people to
> use encryption, they are actually keeping rsh alive and a potential
> security risk for all sorts of people in the cluster business for whom
> performance is more important than security given their networking
> environment and goals.  Otherwise, who would ever install it?

Hear, hear.  The openssh folks aren't alone in this; it's a common  
ailment afflicting authors of "security" software.  They think they  
know better than the sysadmin.  It's for your own good, now take your  
medicine.  Personally, I'm with you - give the sysadmin the choice.   
I've had similar arguments in the past with the author of rssh, a  
restricted shell useful for cvs servers and the like.  He refused to  
add support for allowing the user to change their password, because  
his view was that password authentication is evil and all users should  
be forced to use key authentication at all times.  Oh great, so now I  
have users who ssh in using a private key for authentication over  
which I have no control - I have no idea whether it's held securely,  
whether it has a decent passphrase, or anything.  At least if they  
were using passwords I could periodically run a cracker on the passwd  
file and check their password is sane.  It's a similar scenario.  The  
authors' high and mighty principles don't actually necessarily make my  
systems any more secure at all, quite possibly the reverse.  Quite  
apart from the extra workload it puts on me.  The average scientist  
doesn't really want to have to learn about ssh-agent and all that stuff.

Tim


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From xclski at yahoo.com  Sun Feb 10 15:11:38 2008
From: xclski at yahoo.com (Ellis Wilson)
Date: Sun, 10 Feb 2008 15:11:38 -0800 (PST)
Subject: [Beowulf] getting kubuntu to perform as a cluster os
Message-ID: <67356.76593.qm@web37901.mail.mud.yahoo.com>

Geoff Jacobs wrote:
> Jon Aquilina wrote:
>   
>> i would use the server but im not extremely versed
with command line
>> commands except the simple sudo apt-get install
update upgrade
>> dist-upgrade auto clean, etc
>>     
>
> Hmm... I would suspect some command line is going to
be required no
> matter what. I would recommend taking some time to
beef up your skills
> in that area.
>
> http://www.linuxcommand.org/learning_the_shell.php
>
> However, it should be possible to run (for example)
any *Ubuntu with a
> server kernel if that increases your comfort.
>
> BTW, feel free to contact me off list if you need
any command line pointers.
>
>   
While I certainly don't intend to start any form of a
distro-war over 
the subject, it seems to me that kubuntu is somewhat
on the polar 
opposite side from what someone would see as desirable
for a cluster 
distribution.  There are a number of more typical
distributions (notably 
not using KDE and all of its related resource hogging)
that should work 
fine.  I personally happen to be on the other side of
the spectrum, 
avoiding X even on the master node and installing
everything either 
using Gentoo or simply compiling everything up from
scratch if the 
opportunity allows itself (and is a reasonably static
installation).  I 
recognize this is not common and likely wouldn't even
try to defend it 
being noticeably more efficient (I'm just heinously
specific about my 
installations), but do argue that using not only X11
but also KDE on top 
would be wasteful of resources for both RAM and CPU. 
For small clusters 
that are highly network bound perhaps this doesn't
matter, but in that 
case you probably should be looking to an eight or
sixteen core system 
instead of individual networked pc's.  But if you did
that again it 
would matter since your problem would become processor
and ram bound 
(unless your problem was so ridiculously parallel that
it somehow 
overwhelmed the buses inside your computer, but I
would find that 
surprising).

Ellis


      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping


From frederico_35 at hotmail.com  Mon Feb 11 06:05:36 2008
From: frederico_35 at hotmail.com (Frederico Aquino Carneiro)
Date: Mon, 11 Feb 2008 14:05:36 +0000
Subject: [Beowulf] vmware perfomance
Message-ID: <BLU116-W3897D21D0A761CBC8577AEF12A0@phx.gbl>


Hi! I am new in clustering, and I want to begin with the Beowulf Cluster, but I have one doubt: vmware enjoy the performance of the cluster? I mean, using the Beowulf cluster i will have a better perfomance with the vmware?? Will vmware work faster and better?
Thank You!!
_________________________________________________________________
Confira v?deos com not?cias do NY Times, gols direto do Lance, videocassetadas e muito mais no MSN Video!
http://video.msn.com/?mkt=pt-br
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080211/c07fccf8/attachment.html>

From xclski at yahoo.com  Mon Feb 11 07:50:33 2008
From: xclski at yahoo.com (Ellis Wilson)
Date: Mon, 11 Feb 2008 07:50:33 -0800 (PST)
Subject: [Beowulf] Setting up a new Beowulf cluster
Message-ID: <852353.32953.qm@web37909.mail.mud.yahoo.com>

Bruno Coutinho wrote:
>      > physics requires such resources.  It will
also be used heavily
>     for Monte
>      > Carlo Simulations and just about any other
form of computational
>      > physics.  The two named are definite projects
that are already on the

Well, as previously stated, RGB is definitely your
guide.  In fact in 
his book I remember mention of his utilization of
computer clusters for 
just that, Monte Carlo Simulations.

>      > Being new to the Beowulf world, I am just
mainly looking for some
>     advice
>      > as to what distro to use (I would never dream
of setting up a
>     cluster on
>      > windows) and if there were any little tricks
that weren't
>     mentioned in
>      > the setup how to guides.

A few pointers in the right direction which may be
helpful, me being 
relatively new to parallel computing also and at a
University in your 
scaling situation.  These basic sources are super
helpful and can be 
found on Google:

Robert G. Browns book: This in fact I stumbled upon
and is the reason 
why I got into the subject of Beowulfs and parallel
computing.  This 
will immerse you in a vat of the theory involved
(which is super 
important and far more impacting to the practical
aspects than in most 
fields) and gives a taste of the practical.

clustermonkey.net: This site also has some work by RGB
as well as many 
other extremely valuable members of the Beowulf/HPC
community which help 
you get going quickly.  RGB's articles on starting
your own compute 
cluster are rather practically and theoretically
balanced, using really 
neat sidebars to give you some "experiments" to carry
out.

This very list:  I've read a couple thousand emails
since I joined the 
list and only recently decided anything I had to say
was worthwhile. 
Sitting back and reading as those more experienced
debate (and Rant, if 
your lucky) has seriously improved my knowledge in
this area and 
provided exposure to the many tools you can utilize in
your work.

Also, and this is simply my (less than two years
experience) 
perspective, but IMHO I would use the variety of Linux
you are most 
comfortable with in the beginning using a very basic
desktop/windows 
manager (such as xfce or fluxbox).  This will allow
you to get up and 
running quickly without killing all your resources by
using KDE (or dare 
I say it, BERYL).  You said you will expand every
year, so you have some 
time beforehand to learn about the toolset you'd like
to use, which in 
the end will guide your final decision on most
efficient distribution 
anyhow.

Ellis


      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs


From eagles051387 at gmail.com  Tue Feb 12 05:48:55 2008
From: eagles051387 at gmail.com (Jon Aquilina)
Date: Tue, 12 Feb 2008 14:48:55 +0100
Subject: [Beowulf] centos5 as cluster os
Message-ID: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>

whats everyones take on centos as a cluster os.

-- 
Jonathan Aquilina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080212/f1df2375/attachment.html>

From prentice at ias.edu  Thu Feb 14 05:20:03 2008
From: prentice at ias.edu (Prentice Bisbal)
Date: Thu, 14 Feb 2008 08:20:03 -0500
Subject: noise Re: [Beowulf] Re: Setting up a new Beowulf cluster
In-Reply-To: <Pine.LNX.4.64.0802140529420.5266@cain.rgb.private.net>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<Pine.LNX.4.64.0802140529420.5266@cain.rgb.private.net>
Message-ID: <47B44003.6050800@ias.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This thread has gone horribly off topic. There's more noise about noise
than there is about the original question.

Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Robert G. Brown wrote:
> On Wed, 13 Feb 2008, Jim Lux wrote:
> 
>> That's what the big Tesla Coil or quarter shrinker is for.
> 
> Mad science.
> 
> Oh, yeah.  Let's put a great big tesla coil right in with all those
> computers!  Wait, I hear it now...
> 
>   Fzzzzssszzzssszzzssst.
> 
> (...that's the sound of all those itty bitty gaps on a circuit board or
> NIC arcing at the same time...;-)
> 
> OK, funny story time, sort of.  Stop me if you've heard this one.
> 
> My kids in E&M get to do an extra credit project for a 1/3 of a letter
> grade promotion at semester's end, and maybe a decade ago I had a
> student who wanted to build a tesla coil for his project and I said,
> sure, cool, go for it.  So off he went and with whatever web browsers
> were around and pre-google alta vista found some howto sites for
> building coils, and a few weeks later ran down a neon sign transformer,
> built a saltwater-aquarium-wine-bottle capacitor array, assembled a
> fan-quenched spark gap, and hand-wound the coils and added a toroid on
> top.
> 
> We still had our "old" lab rooms for the intro courses -- no computers,
> stained lab benches and tables a big lead sink and gas and air nozzles
> in the central bench(es) up front.  Imagine old wooden (oak) chassis lab
> equipment in glassed cabinets around the walls, a huge beam balance with
> brass weights that was probably worth a kilobuck as an antique on top of
> a tall cabinet in the back, that sort of thing.  So my student rolls his
> creation on a big cart into this, and I and the class all gather to
> watch.
> 
> Naturally, we turn off all the lights and darken the shades the better
> to see the lightning.  Student hooks it all up, flicks the switch on the
> neon sign transformer to power it all up, and bzzzzaaappppppp -- the
> spark gap starts going off like a machine gun and footlong purple
> lightning starts zapping off the top toroid, impressive as all hell.
> 
> And every fluorescent light in the room goes on.
> 
> And they were turned OFF, remember.  They were "on" being driven by the
> radiated RF power coming off of the thing with no other source, just
> like Tesla dreamed.
> 
> In addition, as I walked around the room, I noted that pretty much every
> metal gap a millimeter or less was arcing.  Little arcs zapping across
> the fixtures in the sink, the bolts on the tables, no doubt across the
> wires holding up the drop ceiling.  I could imagine arcing occurring
> across my teeth if I grinned just right.
> 
> After a few minutes of harmless fun and demos (which involved yours
> truly taking a 100+ kV "hit" straight in through the >>glass<< of a
> fluorescent tube that I inadvertently waved too near the toroid and
> drawing down the fire to pass through me to ground through my rubber
> soled shoes, which amused the heck out of the kids but which was NOT fun
> for me) we powered it off and it went into class history as one of the
> coolest projects ever.
> 
> Three years ago, a second round of students wanted to build one, and did
> so using 1F caps that you can apparently now buy over the counter --
> back when the first one was built I used to tell students that a 1F
> capacitor would end up being the size of a bench or good sized filing
> cabinet, but this is no longer true.  In the meantime, all the lab rooms
> were gutted and rebuilt, and each workstation has its own computer.  The
> entire building is now filled with computers.  The computers are now all
> unshielded twisted pair networked, not thinwire ethernet.  If I were to
> turn on a tesla coil inside the building ANYWHERE (unless it were inside
> a faraday cage, of course), I'd probably blow $10,000 worth of
> equipment, as a tesla coil is sort of a steady state EMP bomb or solar
> flare on a table.
> 
> We demo'd this one OUTDOORS in the parking lot, figuring that the
> building steel would act as enough of a cage to protect the interior,
> with a surge protector inline to help keep the primary power cable from
> carrying back too much of an RF harmonic onto the building wiring.  I
> was a bit worried about the cars nearby -- if you drive an arc at e.g.
> the cap on a gasoline supply it can be a bad thing -- but cars tend to
> have metal on the outside and again cage off their guts.  No worries --
> or at any rate none of the cars exploded or blew their starter coils.
> 
> But putting one in a server room, with all of those wires strung around
> in loops and connected to electronics that really hates high voltage
> even at very low current -- that's just plain funny...:-)
> 
>     rgb
> 
> (P.S. -- every few years I have to explain to one student or another
> that no, they are NOT permitted to build an EMP bomb for their project.
> They are driven by high explosive and -- however much fun it would be --
> where and how would we test it?  Without, of course, bringing out the
> mob with pitchforks and torches afterwards...)
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHtEAD2n4m8G8ypgARAsNwAJkBNHmRv08KlOJ7hKgYR6PsIzKiwACfZ6+Q
glcfL56pIpueiZeUKB0voR4=
=220d
-----END PGP SIGNATURE-----


From ben at psc.edu  Thu Feb 14 12:06:21 2008
From: ben at psc.edu (Benjamin Bennett)
Date: Thu, 14 Feb 2008 15:06:21 -0500
Subject: [Beowulf] Re: High Performance SSH/SCP
In-Reply-To: <Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>	<20080214130116.0bff9b7d@localhost.localdomain>
	<Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
Message-ID: <47B49F3D.3060307@psc.edu>

Mark Hahn wrote:
[snip]
> the switch-to-none feature is a good one.  I'm a bit skeptical about
> the buffering changes, since you can accomplish the same thing with 
> sysctls.  (has anyone ever experienced a real case where too-large 
> sysctl net memory settings caused problems?  obviously, attempting to do 
> long-fat-pipe transfers to a heavily used web server might be a problem, 
> since the latter wants tight controls on sock mem use.)
> 
> the really cool thing would be if you could associate a default setting 
> for socket buffers with a _route_.  heck, a round-port combo.
> it seems crazy for apps to be messing with these issues.

The SSH protocol has its own, per channel, flow control on top of TCP to 
multiplex multiple SSH channels over a single TCP stream.

The stock OpenSSH code has static window sizes, IIRC 64KiB in <= 4.6, 
and 1-2MiB in 4.7

HPN-SSH dynamic windows does not adjust the TCP buffer size, but these 
SSH channel windows so that they're not a limiting factor when you have 
an auto-tuning kernel, or have manually adjusted TCP buffer size above 
these static values.

This is much better described in our paper [1] and the presentation of 
that paper [2].

A colleague just pointed me to this thread, I'll try to keep an eye on 
it if there are any questions, or feel free to contact hpn-ssh at psc.edu

--ben


[1] http://www.psc.edu/networking/projects/hpn-ssh/papers/a14-rapier.pdf
[2] 
http://www.psc.edu/networking/projects/hpn-ssh/papers/MG08-rapier-bennett.ppt


From rgb at phy.duke.edu  Fri Feb 15 04:54:10 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 15 Feb 2008 07:54:10 -0500 (EST)
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <87316AB4-25DB-4591-8545-7164617317AF@sanger.ac.uk>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
	<20080214130116.0bff9b7d@localhost.localdomain>
	<Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141427120.8063@cain.rgb.private.net>
	<87316AB4-25DB-4591-8545-7164617317AF@sanger.ac.uk>
Message-ID: <Pine.LNX.4.64.0802150736180.10077@cain.rgb.private.net>

On Thu, 14 Feb 2008, Tim Cutts wrote:

> their password is sane.  It's a similar scenario.  The authors' high and 
> mighty principles don't actually necessarily make my systems any more secure 
> at all, quite possibly the reverse.  Quite apart from the extra workload it 
> puts on me.  The average scientist doesn't really want to have to learn about 
> ssh-agent and all that stuff.

Amen to that (about ssh-agent), brother.

And all the rest.  Most of the sysadmins I know (and I know a LOT of
them) are really, really smart.  I'm talking rocket scientists gone bad,
so to speak, turned to the dark world and away from the light.  Just
kidding;-)

They have to solve complex problems in order to make the environment
they manage "work" with whatever mix of users, systems, and tasks that
constitute productivity at their place of employment.  In many cases
the solutions they implement are -- correctly -- solutions to
cost-benefit analyses that optimize productivity AT THE RISK of certain
security compromises.

For example, who actually shuts down their entire network when the word
comes in that e.g. the linux kernel has an exploit that allows any user
to root at will?  Only sites that have to maintain NSA-level security
and integrity of data, maybe banks and the like.  Everywhere else the
sysadmin crosses their mental fingers that they (being in touch with
various private channels that quietly get the word out) know about it
before their users, gets a patched kernel in all seemly haste, and then
wait for the next suitable moment to reboot each system after the next
update.  It spreads out the fix for a day, maybe even for a few days,
sure, but it also doesn't cost their organization days worth of work
times the number of employees who rely on the computers.  Which can
easily have a cash value in the tens of thousands of dollars.

Similarly, there are all sorts of reasons one might want to set up a
particular network differently from those based on the assumption "this
system is exposed to every evil cracker in the Universe and must be so
hardened that it can withstand any possible attack". Mind you, the
latter is a GREAT default configuration.  But one has to trust the
judgement of a professional sysadmin to trump the one-size-fits-all
mentality.  If the systems are all going to sit inside a locked room
such that one has to physically be inside the room and sitting at a
console to access them, WAN-level security is sort of moot and may be
counterproductive.  Or e.g. diskless cluster nodes inside a firewall --
there's nothing there to steal, a nasty bottleneck (at best) to get to
it, and if the bottleneck/firewall is itself compromised, nothing
including ssh is going to save the nodes anyway, as the master serves
their "disk(s)".

So I'm all for giving sysadmin's powerful tools and choices.  Otherwise,
hey, they're rocket scientists.  They'll just work around the obstacles
anyway.  They'll have to work HARDER, and they'll be grumbly and bitter
as a consequence, but they'll find and install rsh, they'll hack the
source, they'll find an alternative implementation.  And then they'll go
back to their homes that night, pull a rocket out of a storage tube in
their basement, and target the idiot who stands between them and the
stress-free accomplishment of their work.

I warned you...;-)

   rgb

>
> Tim
>
>
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From robl at mcs.anl.gov  Fri Feb 15 05:49:21 2008
From: robl at mcs.anl.gov (Robert Latham)
Date: Fri, 15 Feb 2008 07:49:21 -0600
Subject: [Beowulf] sgi and linux networks
Message-ID: <20080215134920.GT26857@mcs.anl.gov>

http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&STORY=/www/story/02-14-2008/0004756539&EDATE=

For SGI's sake I hope this works out better than the Cray purchase.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B


From jlb17 at duke.edu  Fri Feb 15 06:09:26 2008
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Fri, 15 Feb 2008 09:09:26 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
Message-ID: <alpine.LRH.1.00.0802150904560.15568@hogwarts.egr.duke.edu>

On Tue, 12 Feb 2008 at 2:48pm, Jon Aquilina wrote

> whats everyones take on centos as a cluster os.

It depends on your needs.  If you use a lot of commercial software, CentOS 
can be great as commercial software tends to like RHEL more than Fedora. 
If you're using in-house code and don't always need the latest and 
greatest libraries, it's also a fine choice.  And if you're low on admin 
time and/or have an odd infrastructure that makes cluster OS upgrades 
tough, then CentOS' long support lifetime is quite nice to have.

But if you have users who always want to use the latest and greatest tools 
(gcc, gsl, etc), then you may want to lean towards Fedora.

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


From landman at scalableinformatics.com  Fri Feb 15 06:23:45 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 15 Feb 2008 09:23:45 -0500
Subject: [Beowulf] sgi and linux networks
In-Reply-To: <20080215134920.GT26857@mcs.anl.gov>
References: <20080215134920.GT26857@mcs.anl.gov>
Message-ID: <47B5A071.4040600@scalableinformatics.com>


Robert Latham wrote:
> http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&STORY=/www/story/02-14-2008/0004756539&EDATE=
> 
> 
> For SGI's sake I hope this works out better than the Cray purchase.

I remember driving in to work (at SGI) the morning it was announced.  My 
first reaction was "why"?

The Cray purchase drove the company to distraction.  This was a bad time 
to be driven to distraction.

LNXI is much smaller than Cray was at the time.  I don't think there are 
that many cultural differences ... a number of ex-SGIers were over at LNXI.

This is less of a "merger" and more of an acquisition of assets, and 
hiring of some people.

One less good HPC company in the world.

> 
> ==rob
> 

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From rgb at phy.duke.edu  Fri Feb 15 06:43:52 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 15 Feb 2008 09:43:52 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802150932120.10077@cain.rgb.private.net>

On Tue, 12 Feb 2008, Jon Aquilina wrote:

> whats everyones take on centos as a cluster os.

It's fine, especially right after it is released.  It is supported for
long enough to last the lifetime of a cluster, which is good -- set up
your repo mirror, set up yum, and everything just auto-updates.  Very
low maintenance, that is.

The only negative I've encountered is its advantage.  BECAUSE it is
"frozen" and has a long lifetime, it can outlive the useful life of its
libraries as things advance.  My own experience of this involved the
GSL, which was in its "infancy" in one Centos release and rapidly
surpassed it in areas of importance to me (e.g. RNGs and ODE solvers).
Yet Centos as stuck at a much earlier version of it that literally
lacked some tools I was building into things I was working on, which
then failed to back-port or run there.

This sort of problem has a number of obvious solutions, and they're not
TOO obtrusive, but one should be aware of them as they do factor into
the maintenance.  You'll save work on automated maintenance and the need
to continually reinstall relative to e.g. Fedora, but you'll likely have
to take backport some stuff FROM Fedora-current if your users discover
it and like/need it.

The one other issue that can come up is hardware related -- obviously if
Centos installs on your hardware it installs and you're done.  IF
however you add new nodes every year, and those nodes have different
motherboards or network devices or etc, it is possible that you'll hit a
motherboard that just doesn't work with the older kernel being
maintained in Centos.

This problem is most visible in the case of laptops, not so much in
clusters; if you can run Centos on a laptop you're just plain lucky.  I
use Fedora exclusively there because EVEN Fedora lags the latest
wireless devices and other laptop-oriented hardware and software tools,
but I can usually get it to work, and even if I have to struggle for a
release, the next one usually works flawlessly.  Its rapid release cycle
is actually beneficial to laptops, roughly neutral to desktops, and
probably detrimental to servers and cluster nodes.  But it can, as I
said, surface in heterogeneous clusters that rotate hardware in and out
every year.  In those clusters, to keep all the nodes running the same
thing, you'll probably want to upgrade no more slowly than Centos/RHEL
itself is released (ideally with a 3 month lag between bleeding edge
release and your update, to give the first worst bugs time to shake
out).

rgb

>
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From hahn at mcmaster.ca  Fri Feb 15 07:12:51 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 15 Feb 2008 10:12:51 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>

> whats everyones take on centos as a cluster os.

works fine for me, but I also don't think distros are very important.
the critical things are:

 	- must have a decent package system.  yum is; I'm not familiar enough
 	with urpmi or apt to know them.  I think both provide appropriate
 	management of dependencies.

 	- basic organization has to be familiar, at least to your admins.
 	I'm comfortable with the sysvinit approach, and would rather not
 	figure out what ubuntu has done to init.  I suspect that an ubuntu
 	person can use it to pare down unused services as well.

this, of course, assumes that you will want to pare down and keep updated.
that's not necessarily the only way to run a cluster, though lack of updates
would be disasterous if you have any way for users to run random stuff.
(eg at least 2 local root exploits for linux since oct.)


From hahn at mcmaster.ca  Fri Feb 15 07:17:33 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 15 Feb 2008 10:17:33 -0500 (EST)
Subject: [Beowulf] vmware perfomance
In-Reply-To: <BLU116-W3897D21D0A761CBC8577AEF12A0@phx.gbl>
References: <BLU116-W3897D21D0A761CBC8577AEF12A0@phx.gbl>
Message-ID: <Pine.LNX.4.64.0802151014490.31809@coffee.psychology.mcmaster.ca>

> Hi! I am new in clustering, and I want to begin with the Beowulf Cluster,
> but I have one doubt: vmware enjoy the performance of the cluster? I mean,
> using the Beowulf cluster i will have a better perfomance with the vmware??
> Will vmware work faster and better?

certainly not.  _nothing_ works better on a cluster - that's not the point.
the point of HPC clusters is to make it easy to use many (same-speed)
processes at the same time, often cooperating in the same job (MPI).
the basic assumption of HPC clusters is that all processes are, or should be,
compute-bound.

vmware is completely orthogonal to this, and is mainly targetted at 
non-compute-bound processes like webservers.  by virtualizing such 
"sparsely-executing" programs, they can be overlapped and share a smaller
number of resources.  the ideal host for vm hosting would be as big an SMP
as you can get your hands on.


From hahn at mcmaster.ca  Fri Feb 15 07:20:07 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 15 Feb 2008 10:20:07 -0500 (EST)
Subject: [Beowulf] sgi and linux networks
In-Reply-To: <20080215134920.GT26857@mcs.anl.gov>
References: <20080215134920.GT26857@mcs.anl.gov>
Message-ID: <Pine.LNX.4.64.0802151018080.31809@coffee.psychology.mcmaster.ca>

> http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&STORY=/www/story/02-14-2008/0004756539&EDATE=
>
> For SGI's sake I hope this works out better than the Cray purchase.

could this be driven entirely by Linux Networx holding some large contracts 
with gov labs?  I've never seen anything from LN that was drastically 
different from what's available through other means.  SGI's cluster offerings
are similarly commodity-based.


From nixon at nsc.liu.se  Fri Feb 15 07:25:03 2008
From: nixon at nsc.liu.se (Leif Nixon)
Date: Fri, 15 Feb 2008 16:25:03 +0100
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <Pine.LNX.4.64.0802150736180.10077@cain.rgb.private.net> (Robert
	G. Brown's message of "Fri\,
	15 Feb 2008 07\:54\:10 -0500 \(EST\)")
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
	<20080214130116.0bff9b7d@localhost.localdomain>
	<Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141427120.8063@cain.rgb.private.net>
	<87316AB4-25DB-4591-8545-7164617317AF@sanger.ac.uk>
	<Pine.LNX.4.64.0802150736180.10077@cain.rgb.private.net>
Message-ID: <m34pcai6nk.fsf@unna.nsc.liu.se>

"Robert G. Brown" <rgb at phy.duke.edu> writes:

> For example, who actually shuts down their entire network when the word
> comes in that e.g. the linux kernel has an exploit that allows any user
> to root at will? 

We actually touched /etc/nologin on Monday morning.

-- 
Leif Nixon                       -            Systems expert
------------------------------------------------------------
National Supercomputer Centre    -      Linkoping University
------------------------------------------------------------


From ascheinine at tuffmail.us  Fri Feb 15 07:35:46 2008
From: ascheinine at tuffmail.us (Alan Louis Scheinine)
Date: Fri, 15 Feb 2008 16:35:46 +0100
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <Pine.LNX.4.64.0802150932120.10077@cain.rgb.private.net>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802150932120.10077@cain.rgb.private.net>
Message-ID: <47B5B152.6030505@tuffmail.us>

I use Centos.  For years and years I have compiled
from source Gnu Scientific Library and many, many
other software packages.  I write notes while doing
the compiling from scratch so the next compilation
of the same package is rapid.  I also compile gcc (g++, etc.)
from scratch.  The result is that a user can stick with
the old gcc and old package until he or she feels ready
to move-on.  Both the new and old packages are available.

Personally I cannot imagine using software applications
as found in any distribution.  For any user request, I
compile from scratch from the software WWW site. Doesn't
everybody do this?

best regards,
Alan Scheinine


From hahn at mcmaster.ca  Fri Feb 15 07:38:15 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 15 Feb 2008 10:38:15 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <alpine.LRH.1.00.0802150904560.15568@hogwarts.egr.duke.edu>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<alpine.LRH.1.00.0802150904560.15568@hogwarts.egr.duke.edu>
Message-ID: <Pine.LNX.4.64.0802151020240.31809@coffee.psychology.mcmaster.ca>

> But if you have users who always want to use the latest and greatest tools 
> (gcc, gsl, etc), then you may want to lean towards Fedora.

or compile them yourself.  if you really care about them,
you probably want to anyway.  significant updates don't happen _that_
frequently, though this is obviously less viable for a one-man-cluster
than a center with several admins.

I think fedora could be made to work OK in a cluster, despite its update
rate.  it would do so easier on a cluster which had lots of short jobs.
(our clusters frequently have multi-week jobs, so we tend to update system
stuff less frequently, and centos/rh and derived systems like xc work well.)

I think centos+site-updates is a good model.  if nothing else, users may not
want updates/disruptions as fast as fedora's comes.


From rgb at phy.duke.edu  Fri Feb 15 07:36:49 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 15 Feb 2008 10:36:49 -0500 (EST)
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <m34pcai6nk.fsf@unna.nsc.liu.se>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
	<20080214130116.0bff9b7d@localhost.localdomain>
	<Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141427120.8063@cain.rgb.private.net>
	<87316AB4-25DB-4591-8545-7164617317AF@sanger.ac.uk>
	<Pine.LNX.4.64.0802150736180.10077@cain.rgb.private.net>
	<m34pcai6nk.fsf@unna.nsc.liu.se>
Message-ID: <Pine.LNX.4.64.0802151033000.10077@cain.rgb.private.net>

On Fri, 15 Feb 2008, Leif Nixon wrote:

> "Robert G. Brown" <rgb at phy.duke.edu> writes:
>
>> For example, who actually shuts down their entire network when the word
>> comes in that e.g. the linux kernel has an exploit that allows any user
>> to root at will?
>
> We actually touched /etc/nologin on Monday morning.

Sure, and that's a reasonable choice.  It's a cost benefit based choice,
and only you know the value of your data and probability of risk.  For
us, doing that would have been infinitely disruptive and expensive;
overnight was soon enough.

I didn't mean to imply that if one did this one was in any way foolish,
only that wouldn't it suck if LINUS could press a button somewhere and
touch /etc/nologin for ALL the linux boxes in the universe so that they
wouldn't work until they were patched?

None of us really want big brother making our security decisions or
"forcing" us to use some particular security tool or profile.  Choice is
good.  It would be simply lovely if ssh were a bit less fascist, or at
least could be configured to be non-fascist for environments where that
makes sense.  Fascist by default is just peachy.

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From landman at scalableinformatics.com  Fri Feb 15 07:52:34 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 15 Feb 2008 10:52:34 -0500
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
Message-ID: <47B5B542.7000605@scalableinformatics.com>


Mark Hahn wrote:
>> whats everyones take on centos as a cluster os.
> 
> works fine for me, but I also don't think distros are very important.
> the critical things are:
> 
>     - must have a decent package system.  yum is; I'm not familiar enough
>     with urpmi or apt to know them.  I think both provide appropriate
>     management of dependencies.

Yum is good, so is apt.  I still have a problem with yum wanting to 
install i386 binaries as well as the x86_64 ones.  Haven't learned how 
to stop that yet (probably simple too).

There is much I do not like about rpm.  However it has a few nice 
features.  I can't live without

	rpm -qa
	rpm -ql package
	rpm -qf file

and am going through withdrawl as apt does not seem to provide these (or 
if they do, it isn't at all obvious how/where).

>     - basic organization has to be familiar, at least to your admins.
>     I'm comfortable with the sysvinit approach, and would rather not
>     figure out what ubuntu has done to init.  I suspect that an ubuntu
>     person can use it to pare down unused services as well.

Not trying to convert anyone here.  Still learning the Ubuntu version. 
Oddly enough it looks *saner* (as in better thought out) than sysvinit 
(which I have used since the early 90s with Irix and others).  I 
actually understand it.

> this, of course, assumes that you will want to pare down and keep updated.
> that's not necessarily the only way to run a cluster, though lack of 
> updates
> would be disasterous if you have any way for users to run random stuff.
> (eg at least 2 local root exploits for linux since oct.)

I favor the minimalist approach.  I get chastised for it every now and 
then (is the labor worth the benefit, is the most reasonable question)?

As few things as possible installed.  Keep it simple.  Fewer things 
means less of an attack surface, a smaller management base, and 
hopefully smaller emergent complexity.

That said, tools like Rocks/Perceus/Scyld/... make cluster standup 
*easy* for particular cases (supported hardware, existing VNFS, ...) 
when you want to do something quick.  Rocks does not currently recommend 
you do yum or up2date updates*.  I imagine similar for Scyld.

You can always roll your own cluster ala Mark, RGB, and others.


* I have been bashed/castigated in 2 fora recently for daring to suggest 
that some technology may have alternatives that one might wish to 
consider, or there may be known issues, or whatever.  Shooting the 
messenger.  Not a wise move.  You don't have to believe me, though I do 
recommend that you make a backup of your Rocks system if you do choose 
to run yum.  You can run yum safely on it, though it takes some work. 
And the Rocks folks have recently formed a user group to help make sure 
it is safe going forward (cudos to the Rocks folks for doing this).


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From jlb17 at duke.edu  Fri Feb 15 08:05:36 2008
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Fri, 15 Feb 2008 11:05:36 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B5B542.7000605@scalableinformatics.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
Message-ID: <alpine.LRH.1.00.0802151104260.20735@hogwarts.egr.duke.edu>

On Fri, 15 Feb 2008 at 10:52am, Joe Landman wrote

> Mark Hahn wrote:
>>> whats everyones take on centos as a cluster os.
>> 
>> works fine for me, but I also don't think distros are very important.
>> the critical things are:
>>
>>     - must have a decent package system.  yum is; I'm not familiar enough
>>     with urpmi or apt to know them.  I think both provide appropriate
>>     management of dependencies.
>
> Yum is good, so is apt.  I still have a problem with yum wanting to install 
> i386 binaries as well as the x86_64 ones.  Haven't learned how to stop that 
> yet (probably simple too).

It is:
yum install gsl.x86_64

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


From landman at scalableinformatics.com  Fri Feb 15 08:07:21 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 15 Feb 2008 11:07:21 -0500
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <alpine.LRH.1.00.0802151104260.20735@hogwarts.egr.duke.edu>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<alpine.LRH.1.00.0802151104260.20735@hogwarts.egr.duke.edu>
Message-ID: <47B5B8B9.70302@scalableinformatics.com>


Joshua Baker-LePain wrote:

>> Yum is good, so is apt.  I still have a problem with yum wanting to 
>> install i386 binaries as well as the x86_64 ones.  Haven't learned how 
>> to stop that yet (probably simple too).
> 
> It is:
> yum install gsl.x86_64

I meant automatically picking up updates, or on new installations.  I 
thought I saw something about arch-only settings.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From jlb17 at duke.edu  Fri Feb 15 08:07:41 2008
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Fri, 15 Feb 2008 11:07:41 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <Pine.LNX.4.64.0802150932120.10077@cain.rgb.private.net>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802150932120.10077@cain.rgb.private.net>
Message-ID: <alpine.LRH.1.00.0802151106150.20735@hogwarts.egr.duke.edu>

On Fri, 15 Feb 2008 at 9:43am, Robert G. Brown wrote

> The one other issue that can come up is hardware related -- obviously if
> Centos installs on your hardware it installs and you're done.  IF
> however you add new nodes every year, and those nodes have different
> motherboards or network devices or etc, it is possible that you'll hit a
> motherboard that just doesn't work with the older kernel being
> maintained in Centos.

Keep in mind, though, that RH backports new hardware support into that 
"older" kernel.  The 2.6.9 in RHEL/CentOS 4.6 these days supports far more 
hardware than stock 2.6.9, e.g..

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


From jlb17 at duke.edu  Fri Feb 15 08:12:10 2008
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Fri, 15 Feb 2008 11:12:10 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B5B8B9.70302@scalableinformatics.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<alpine.LRH.1.00.0802151104260.20735@hogwarts.egr.duke.edu>
	<47B5B8B9.70302@scalableinformatics.com>
Message-ID: <alpine.LRH.1.00.0802151107540.20735@hogwarts.egr.duke.edu>

On Fri, 15 Feb 2008 at 11:07am, Joe Landman wrote

>
>
> Joshua Baker-LePain wrote:
>
>>> Yum is good, so is apt.  I still have a problem with yum wanting to 
>>> install i386 binaries as well as the x86_64 ones.  Haven't learned how to 
>>> stop that yet (probably simple too).
>> 
>> It is:
>> yum install gsl.x86_64
>
> I meant automatically picking up updates, or on new installations.  I thought 
> I saw something about arch-only settings.

You can also specify architecture of specific packages in the kickstart 
file.  However, yes, it's pretty impossible to kickstart a system without 
getting some 32bit packages.  What I do here is just put a
"yum -y remove \*.i[36]86 in the %post section of the kickstart.  Voila 
-- no more 32bit packages.  And if there are no 32bit packages installed, 
it "shouldn't" pull any in at update time.

There may well be a yum plugin floating about to enforce this more 
strictly, but I haven't looked, to be honest...

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF


From hahn at mcmaster.ca  Fri Feb 15 08:30:19 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 15 Feb 2008 11:30:19 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <alpine.LRH.1.00.0802151104260.20735@hogwarts.egr.duke.edu>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<alpine.LRH.1.00.0802151104260.20735@hogwarts.egr.duke.edu>
Message-ID: <Pine.LNX.4.64.0802151130030.16218@coffee.psychology.mcmaster.ca>

> It is:
> yum install gsl.x86_64

also, setting exactarch=1, right?


From tjrc at sanger.ac.uk  Fri Feb 15 09:33:18 2008
From: tjrc at sanger.ac.uk (Tim Cutts)
Date: Fri, 15 Feb 2008 17:33:18 +0000
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B5B542.7000605@scalableinformatics.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
Message-ID: <0853B432-4B14-40AA-B246-CC302B81FEFD@sanger.ac.uk>


On 15 Feb 2008, at 3:52 pm, Joe Landman wrote:

>
>
> Mark Hahn wrote:
>>> whats everyones take on centos as a cluster os.
>> works fine for me, but I also don't think distros are very important.
>> the critical things are:
>>    - must have a decent package system.  yum is; I'm not familiar  
>> enough
>>    with urpmi or apt to know them.  I think both provide appropriate
>>    management of dependencies.
>
> Yum is good, so is apt.  I still have a problem with yum wanting to  
> install i386 binaries as well as the x86_64 ones.  Haven't learned  
> how to stop that yet (probably simple too).
>
> There is much I do not like about rpm.  However it has a few nice  
> features.  I can't live without
>
> 	rpm -qa
> 	rpm -ql package
> 	rpm -qf file
>
> and am going through withdrawl as apt does not seem to provide these  
> (or if they do, it isn't at all obvious how/where).

You'll have to bear with me, since I don't know much about rpm.  In  
most cases there is equivalent functionality at both the dpkg level  
and apt level.

rpm -qa : lists all installed packages, right?

dpkg -l

works at the dpkg level, aptitude can do the same thing:

aptitude search '~i'

I tend to use aptitude these days, and don't often touch dpkg  
directly.  aptitude's search expressions are odd, but quite powerful,  
and allow you to do some useful things.  For example, following a  
sarge to etch upgrade, I wanted to remove all old sarge kernel  
packages (which are called kernel-image-*), regardless of which sarge  
package a machine was using, which was easily done with:

aptitude remove '~i~nkernel-image'

rpm -ql package : lists files installed by the package, right?

dpkg -L package

rpm -qf file : asks which package supplied a particular file?

dpkg -S file

Other useful Debian/Ubuntu package management commands:

apt-cache : queries the apt cache, and can report things like  
dependency information.  For example,

apt-cache rdepends libfoo

tells you all the packages which depend on libfoo (installed or  
otherwise).

apt-file : not installed by default, but phenomenally useful - it's  
dpkg -S for stuff that isn't installed yet.  So if you want to ask  
"what package do I need to install to supply this obscure header  
file", apt-file can tell you.

Then of course there's aptitude, which is slowly replacing both apt- 
get and dselect.

Tim


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From peter.st.john at gmail.com  Fri Feb 15 09:38:21 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 15 Feb 2008 12:38:21 -0500
Subject: noise Re: [Beowulf] Re: Setting up a new Beowulf cluster
In-Reply-To: <47B44003.6050800@ias.edu>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<Pine.LNX.4.64.0802140529420.5266@cain.rgb.private.net>
	<47B44003.6050800@ias.edu>
Message-ID: <e4d4fd070802150938v4581de3av997f00a60573b4df@mail.gmail.com>

Prentice,

Yes that strayed off topic (somewhat) but since it was some extraordinarily
good prose I have to defend it. Lux's exposition was like a short-course; I
can hardly drive a nail straight, myself, and I disbelieve in any object
smaller than a billiard ball: but having read that, I feel like I could
build a sound-stage in a pinch. Surely the original poster will not have so
much difficulty with accoustics over the short term, but it's certainly
relevant to the readership at large.

RGB certainly digressed, nobody really needs a tesla coil to impress
visitiors. *But that was as good as good science fiction.* It took me a day
before I realized it had made me nostalgic for Doc Smith. Well worth one or
two pages in my life.

I don't believe that every page of every textbook need be as dry as *Principia
Mathematica *(which is virtually all hieroglyphic; it's worth glancing
through a copy, I'm sure they have one on campus there) and surely this
mail-list need not be. I found this recent forking refreshing, educational,
and fun.

Please give my warm regards to Prof. Matlock if you see him about town. Go
Blue Devils :-)

Peter

On Thu, Feb 14, 2008 at 8:20 AM, Prentice Bisbal <prentice at ias.edu> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> This thread has gone horribly off topic. There's more noise about noise
> than there is about the original question.
>
> Prentice Bisbal
> Linux Software Support Specialist/System Administrator
> School of Natural Sciences
> Institute for Advanced Study
> Princeton, NJ
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080215/8775dc71/attachment.html>

From landman at scalableinformatics.com  Fri Feb 15 09:45:45 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 15 Feb 2008 12:45:45 -0500
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <0853B432-4B14-40AA-B246-CC302B81FEFD@sanger.ac.uk>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<0853B432-4B14-40AA-B246-CC302B81FEFD@sanger.ac.uk>
Message-ID: <47B5CFC9.6060105@scalableinformatics.com>


Tim Cutts wrote:

> 
> rpm -qa : lists all installed packages, right?

yes

> 
> dpkg -l

oooooo  (nice!)


> 
> works at the dpkg level, aptitude can do the same thing:
> 
> aptitude search '~i'

ok (I don't like the aptitude "tui").  I wind up using synaptic over X 
which means installing lots more junk :(

> 
> I tend to use aptitude these days, and don't often touch dpkg directly.  
> aptitude's search expressions are odd, but quite powerful, and allow you 
> to do some useful things.  For example, following a sarge to etch 
> upgrade, I wanted to remove all old sarge kernel packages (which are 
> called kernel-image-*), regardless of which sarge package a machine was 
> using, which was easily done with:
> 
> aptitude remove '~i~nkernel-image'
> 
> rpm -ql package : lists files installed by the package, right?
> 
> dpkg -L package

Yahooo!!!! that was one of the important ones!

> 
> rpm -qf file : asks which package supplied a particular file?
> 
> dpkg -S file

Sweet ... but it takes quite a long time.  Weird.

> 
> Other useful Debian/Ubuntu package management commands:
> 
> apt-cache : queries the apt cache, and can report things like dependency 
> information.  For example,

I played with apt-cache.  Not something I really want to see (meta data).

> 
> apt-cache rdepends libfoo
> 
> tells you all the packages which depend on libfoo (installed or otherwise).

Cool

> 
> apt-file : not installed by default, but phenomenally useful - it's dpkg 
> -S for stuff that isn't installed yet.  So if you want to ask "what 
> package do I need to install to supply this obscure header file", 
> apt-file can tell you.

Will have to play with it.

> 
> Then of course there's aptitude, which is slowly replacing both apt-get 
> and dselect.

Ok, will have to learn aptitude.  I can't stand its tui.

> 
> Tim
> 
> 

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From tjrc at sanger.ac.uk  Fri Feb 15 09:57:22 2008
From: tjrc at sanger.ac.uk (Tim Cutts)
Date: Fri, 15 Feb 2008 17:57:22 +0000
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B5CFC9.6060105@scalableinformatics.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<0853B432-4B14-40AA-B246-CC302B81FEFD@sanger.ac.uk>
	<47B5CFC9.6060105@scalableinformatics.com>
Message-ID: <3F034AC3-445A-43D0-AA9B-056BDE3BC396@sanger.ac.uk>


On 15 Feb 2008, at 5:45 pm, Joe Landman wrote:

> Ok, will have to learn aptitude.  I can't stand its tui.

I didn't like it much to start with, I can sympathise.  There are  
still things about it I hate, but mainly only in its full screen  
guise.  The CLI functions are pretty much the same as apt-get, and  
fairly sane.  It should come with a *big* health warning in Sarge and  
earlier.  The Etch version is fine, though.  In Lenny, apt-get and  
aptitude are sort of merging with one another.

If you want a really good book to get to grips with how Debian and its  
derivatives work, look no further than "The Debian System" by Martin  
Krafft:

http://www.amazon.com/Debian-System-Concepts-Techniques/dp/1593270690

I've been a Debian Developer for more than 10 years, but I bought that  
book last year and it's still teaching me useful stuff.  Several of us  
in my group have bought it now, and we all swear by it.  Pretty much  
everything it says about Debian applies to Ubuntu as well.

Tim


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From bill at cse.ucdavis.edu  Fri Feb 15 10:50:16 2008
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Fri, 15 Feb 2008 10:50:16 -0800
Subject: [Beowulf] vmware perfomance
In-Reply-To: <BLU116-W3897D21D0A761CBC8577AEF12A0@phx.gbl>
References: <BLU116-W3897D21D0A761CBC8577AEF12A0@phx.gbl>
Message-ID: <47B5DEE8.2060606@cse.ucdavis.edu>

Frederico Aquino Carneiro wrote:
> Hi! I am new in clustering, and I want to begin with the Beowulf Cluster,
> but I have one doubt: vmware enjoy the performance of the cluster? I mean,
> using the Beowulf cluster i will have a better perfomance with the vmware??
> Will vmware work faster and better?

By it's very nature a beowulf cluster is about running applications faster 
with a collection of commodity hardware than you can by buying the old school 
larger servers from SGI, Sun, HP, and friends.   Price/performance is key.

Part of that is using open source software where you can to keep prices down, 
not to mention the ability to share the design and implementation with other 
folks.  Because most of the hardware and software isn't targeted towards 
beowulf users (a very small minority of the market) often changes are required.

For that reason I'd consider other virtualization software like kvm or xen.

Given the nasty real world realities like barges that run through power lines 
and users/codes with long runtimes that do not use/have checkpointing I can 
see virtualization being rather handy even though it does slow down compute 
bound processes.

Has anyone measured the performance difference of a compute bound job with xen 
or vmware, I'd imagine it's well under 10%, but I haven't measured it personally.


From rgb at phy.duke.edu  Fri Feb 15 11:10:38 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 15 Feb 2008 14:10:38 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <Pine.LNX.4.64.0802151020240.31809@coffee.psychology.mcmaster.ca>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<alpine.LRH.1.00.0802150904560.15568@hogwarts.egr.duke.edu>
	<Pine.LNX.4.64.0802151020240.31809@coffee.psychology.mcmaster.ca>
Message-ID: <Pine.LNX.4.64.0802151403080.13709@cain.rgb.private.net>

On Fri, 15 Feb 2008, Mark Hahn wrote:

>> But if you have users who always want to use the latest and greatest tools 
>> (gcc, gsl, etc), then you may want to lean towards Fedora.
>
> or compile them yourself.  if you really care about them,
> you probably want to anyway.  significant updates don't happen _that_
> frequently, though this is obviously less viable for a one-man-cluster
> than a center with several admins.
>
> I think fedora could be made to work OK in a cluster, despite its update
> rate.  it would do so easier on a cluster which had lots of short jobs.
> (our clusters frequently have multi-week jobs, so we tend to update system
> stuff less frequently, and centos/rh and derived systems like xc work well.)
>
> I think centos+site-updates is a good model.  if nothing else, users may not
> want updates/disruptions as fast as fedora's comes.

Either one works, and we've tried either one.  As we've both said, the
tradeoffs are the amount of work required to build and maintain
"current" packages you may rely on versus frequency of update.  You can
easily run Fedora for a year, though -- even a year plus as there are
sites appearing that do Fedora afterlife support -- but a Fedora user
usually doesn't want to.  They're interested in the cool stuff that
constantly appears, the bug fixes, and so on.  But it's not like you
update/upgrade so quickly after a new release that you have to interrupt
jobs on even a month time scale.  You just have to pick a time over a
three to six month interval that is convenient for the upgrade, and
that's usually pretty easy.

There are exceptions.  For example, FC 6 users had damn well all be
updating ASAP to F8 right now (if they didn't yesterday or the day
before) as I don't believe the kernel fix is being backported that far.
And I don't know what's going on with Debian -- I'm guessing that folks
on the apt-update stream are autofixed but there may be isolated folks
that need to hand-update their kernels and reboot.  We still had a few
machines running FC 6 as of Monday that we were going to do in a planned
way e.g. over spring break.  Oops.  Sorry guys, time to do them
overnight...

rgb

> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From rgb at phy.duke.edu  Fri Feb 15 11:20:39 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 15 Feb 2008 14:20:39 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B5B152.6030505@tuffmail.us>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802150932120.10077@cain.rgb.private.net>
	<47B5B152.6030505@tuffmail.us>
Message-ID: <Pine.LNX.4.64.0802151411200.13709@cain.rgb.private.net>

On Fri, 15 Feb 2008, Alan Louis Scheinine wrote:

> Personally I cannot imagine using software applications
> as found in any distribution.  For any user request, I
> compile from scratch from the software WWW site. Doesn't
> everybody do this?

Definitely not.  I almost never compile from scratch.  The whole point
of using fedora is that precious ability to be able to enter:

  yum install \*pvm\*

and have it be so, and subsequently be autoupdated with fixes and
security patches, no further human effort required.

Of course, then I encounter things like xpvm being broken (at least for
me) in F8.  I may reluctantly have to grab source and debug, or more
likely I'll whine on this list and see if anybody else has done it for
me.

I won't say that I "hate" having to compile things from source (or to
fix things in general) but it is certainly true that I dislike it and
really prefer not to have to.

   rgb

>
> best regards,
> Alan Scheinine
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From rgb at phy.duke.edu  Fri Feb 15 11:32:41 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 15 Feb 2008 14:32:41 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B5B542.7000605@scalableinformatics.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
Message-ID: <Pine.LNX.4.64.0802151421170.13709@cain.rgb.private.net>

On Fri, 15 Feb 2008, Joe Landman wrote:

>
>
> Mark Hahn wrote:
>>> whats everyones take on centos as a cluster os.
>> 
>> works fine for me, but I also don't think distros are very important.
>> the critical things are:
>>
>>     - must have a decent package system.  yum is; I'm not familiar enough
>>     with urpmi or apt to know them.  I think both provide appropriate
>>     management of dependencies.
>
> Yum is good, so is apt.  I still have a problem with yum wanting to install 
> i386 binaries as well as the x86_64 ones.  Haven't learned how to stop that 
> yet (probably simple too).

Usually when it does this it is installing the i386 libraries, not the
binaries per se.  That's for backwards compatibility.

Space is cheap.  I just ignore it.  Besides, I do sometimes forget and
run an i386 binary -- easy to do from an NFS mount.

> There is much I do not like about rpm.  However it has a few nice features. 
> I can't live without
>
> 	rpm -qa
> 	rpm -ql package
> 	rpm -qf file
>
> and am going through withdrawl as apt does not seem to provide these (or if 
> they do, it isn't at all obvious how/where).

Yeah.  What you said.

> As few things as possible installed.  Keep it simple.  Fewer things means 
> less of an attack surface, a smaller management base, and hopefully smaller 
> emergent complexity.

Awwww, but then you don't have any fun!  And this last exploit merely
required the right binary built from source, not one on your system
anyway.  Minimalism is again a matter of cost benefit.  Different people
or organizations will have different comfort zones or goals.  Minimalism
on the desktop means giving up a lot of possibly useful stuff.
Minimalism in a cluster means having to spend more time putting stuff
back when it turns out that you need it after all.  Both of these are
costs; you have to balance them agains the perceived risk benefit, which
in turn depends on your estimate of the risk of attack, the likely
window of opportunity for an attack, your degree of vigilance, the cost
of putting things right again.

I personally prefer high vigilance (as it has historically ALWAYS been
the case for me that vigilance reveals cracking attempts or successes,
and there are ALWAYS going to be holes I don't get closed, at least not
right away or maybe in time) coupled with a robust and easily restored
backup and installation system.  If a host gets cracked, reinstall it
via kickstart/PXE and forget it.  No local data on a host.  Backup
everything.  Protect the servers with far greater vigilance than nodes
or clients.  Then don't worry so much about the periphery.

But there are places where cracking has a much higher up-front cost, or
a higher risk.  So I don't argue that this recipe is right for all.

> * I have been bashed/castigated in 2 fora recently for daring to suggest that 
> some technology may have alternatives that one might wish to consider, or 
> there may be known issues, or whatever.  Shooting the messenger.  Not a wise 
> move.  You don't have to believe me, though I do recommend that you make a 
> backup of your Rocks system if you do choose to run yum.  You can run yum 
> safely on it, though it takes some work. And the Rocks folks have recently 
> formed a user group to help make sure it is safe going forward (cudos to the 
> Rocks folks for doing this).

What?  You said technology has alternatives?

Well no WONDER you got bashed.  I'd have bashed you here if only I'd
known.   Look:

<bash>

There.  Now it's three out of three;-)

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From landman at scalableinformatics.com  Fri Feb 15 11:45:45 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 15 Feb 2008 14:45:45 -0500
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <Pine.LNX.4.64.0802151421170.13709@cain.rgb.private.net>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<Pine.LNX.4.64.0802151421170.13709@cain.rgb.private.net>
Message-ID: <47B5EBE9.4050203@scalableinformatics.com>


Robert G. Brown wrote:

>> As few things as possible installed.  Keep it simple.  Fewer things 
>> means less of an attack surface, a smaller management base, and 
>> hopefully smaller emergent complexity.
> 
> Awwww, but then you don't have any fun!  And this last exploit merely

er ... ah ... we must have different definitions of what "fun" is.

I just spent yesterday discovering that Yahoo/ATT.net was the source of 
a DoS attack against us, and firewalling the offenders off made me 
happy.  Small attack surface.  Sort of like an electronic version of 
"300".  Channel em, and if you don't like em, well ...

> required the right binary built from source, not one on your system
> anyway.  Minimalism is again a matter of cost benefit.  Different people
> or organizations will have different comfort zones or goals.  Minimalism
> on the desktop means giving up a lot of possibly useful stuff.
> Minimalism in a cluster means having to spend more time putting stuff
> back when it turns out that you need it after all.  Both of these are

Anyone using their cluster for TeX?  I used it to write a thesis.  A 
distributed make environment (yeah, I built my thesis with a make file), 
would have helped ...

> costs; you have to balance them agains the perceived risk benefit, which
> in turn depends on your estimate of the risk of attack, the likely
> window of opportunity for an attack, your degree of vigilance, the cost
> of putting things right again.
> 
> I personally prefer high vigilance (as it has historically ALWAYS been
> the case for me that vigilance reveals cracking attempts or successes,
> and there are ALWAYS going to be holes I don't get closed, at least not
> right away or maybe in time) coupled with a robust and easily restored
> backup and installation system.  If a host gets cracked, reinstall it

Oddly enough we really are saying similar things.

a) we won't get everything
b) check the logs
c) apply the patches
d) prepare for the worst ...


> via kickstart/PXE and forget it.  No local data on a host.  Backup

Heh...  I think I may have gone past you on this one.  No os on the 
host.  PXE boot it.  No more installs.  Put up a unionfs/aufs and let em 
write all over / and /etc and ...  and then see what happened.  :)


> everything.  Protect the servers with far greater vigilance than nodes
> or clients.  Then don't worry so much about the periphery.

Nodes are expendible/disposable.  Its the rest of it that is hard to 
replace.  So make it easier.

> 
> But there are places where cracking has a much higher up-front cost, or
> a higher risk.  So I don't argue that this recipe is right for all.
> 
>> * I have been bashed/castigated in 2 fora recently for daring to 
>> suggest that some technology may have alternatives that one might wish 
>> to consider, or there may be known issues, or whatever.  Shooting the 
>> messenger.  Not a wise move.  You don't have to believe me, though I 
>> do recommend that you make a backup of your Rocks system if you do 
>> choose to run yum.  You can run yum safely on it, though it takes some 
>> work. And the Rocks folks have recently formed a user group to help 
>> make sure it is safe going forward (cudos to the Rocks folks for doing 
>> this).
> 
> What?  You said technology has alternatives?
> 
> Well no WONDER you got bashed.  I'd have bashed you here if only I'd
> known.   Look:
> 
> <bash>

[thaWackaaaaa]

  <owie!>

> 
> There.  Now it's three out of three;-)

D'oh!


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From peter.st.john at gmail.com  Fri Feb 15 11:56:13 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 15 Feb 2008 14:56:13 -0500
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B5EBE9.4050203@scalableinformatics.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<Pine.LNX.4.64.0802151421170.13709@cain.rgb.private.net>
	<47B5EBE9.4050203@scalableinformatics.com>
Message-ID: <e4d4fd070802151156r323c1e02h11ba70fdd87266ea@mail.gmail.com>

>
<bash>

There.  Now it's three out of three;-)
   rgb
<

I prefer tcsh.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080215/7cfc1647/attachment.html>

From rgb at phy.duke.edu  Fri Feb 15 11:55:49 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 15 Feb 2008 14:55:49 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B5EBE9.4050203@scalableinformatics.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<Pine.LNX.4.64.0802151421170.13709@cain.rgb.private.net>
	<47B5EBE9.4050203@scalableinformatics.com>
Message-ID: <Pine.LNX.4.64.0802151446570.13709@cain.rgb.private.net>

On Fri, 15 Feb 2008, Joe Landman wrote:

> Anyone using their cluster for TeX?  I used it to write a thesis.  A 
> distributed make environment (yeah, I built my thesis with a make file), 
> would have helped ...

Ooo, Joe, your age is showing.  I compile my BOOKS in latex
interactively from inside jove in real time.  Hold on, let me time it.

The Book of Lilith, at 224 pages, on my LAPTOP (not the fastest machine
I own by any means) now takes 1.17 seconds to run latex on -- twice (the
second time to resolve the forward references in the TOC, of course).
Converting it to PDF takes a lot longer -- about 1.5 seconds to
postscript, then maybe 4 seconds to pdf from ps.  Don't know what dvi to
pdf would be as this book's makefile doesn't go that way.  A full build
from clean takes 6.1 seconds (that's dvi, ps and pdf).

Not a lot of benefit in parallelizing that, I'd say.

> Oddly enough we really are saying similar things.
>
> a) we won't get everything
> b) check the logs
> c) apply the patches
> d) prepare for the worst ...

Oddly?  Ah, I see, you think I'm crazy...;-)

>> via kickstart/PXE and forget it.  No local data on a host.  Backup
>
> Heh...  I think I may have gone past you on this one.  No os on the host. 
> PXE boot it.  No more installs.  Put up a unionfs/aufs and let em write all 
> over / and /etc and ...  and then see what happened.  :)

Absolutely.  We still leave local data, but truthfully that just speeds
the boot some and lowers server load some.  A tradeoff.  We still buy
(or at least have) diskful nodes, so what not use them?

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From rgb at phy.duke.edu  Fri Feb 15 11:56:18 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 15 Feb 2008 14:56:18 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <e4d4fd070802151156r323c1e02h11ba70fdd87266ea@mail.gmail.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com> 
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca> 
	<47B5B542.7000605@scalableinformatics.com>
	<Pine.LNX.4.64.0802151421170.13709@cain.rgb.private.net>
	<47B5EBE9.4050203@scalableinformatics.com>
	<e4d4fd070802151156r323c1e02h11ba70fdd87266ea@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802151456080.13709@cain.rgb.private.net>

On Fri, 15 Feb 2008, Peter St. John wrote:

>>
> <bash>
>
> There.  Now it's three out of three;-)
>   rgb
> <
>
> I prefer tcsh.

<groan!>

rgb

>
> Peter
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From mark.kosmowski at gmail.com  Fri Feb 15 04:38:21 2008
From: mark.kosmowski at gmail.com (Mark Kosmowski)
Date: Fri, 15 Feb 2008 07:38:21 -0500
Subject: [Beowulf] Re: Setting up a new Beowulf cluster
Message-ID: <c84311bb0802150438l1e43d393qc5bbca3e816de169@mail.gmail.com>

Regarding this thread going off-topic - I was amused by RGB's static
anecdote during a rather busy day at work, so am thankful it was shared.

To add my own on-topic contribution, I do solid-state chemistry calculations
and ran into the 32-bit memory limitations of Linux early on.  This led me
to attempt to learn mpi (not wholly successfully) about 5 years ago and then
to jump onto the Opteron architecture as budget permitted.  I do not know if
the OP's work will require more than 1 - 2 Gb per process like mine does
(I'm doing my graduate research on my personal humble 3 node dual-Opteron
cluster - I added two nodes about a year ago and my earlier floundering with
mpi has paid off - everything works now).

Even if the OP's work does need 64-bit memory space, the fileserver / backup
machine could still be 32-bit and not used as a compute node.  Don't
overlook periodic DVD burning as an archiving possibility.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080215/067430e0/attachment.html>

From vernard at venger.net  Fri Feb 15 06:04:51 2008
From: vernard at venger.net (Vernard Martin)
Date: Fri, 15 Feb 2008 09:04:51 -0500
Subject: [Beowulf] vmware perfomance
In-Reply-To: <BLU116-W3897D21D0A761CBC8577AEF12A0@phx.gbl>
References: <BLU116-W3897D21D0A761CBC8577AEF12A0@phx.gbl>
Message-ID: <47B59C03.6020900@venger.net>

Frederico Aquino Carneiro wrote:
> Hi! I am new in clustering, and I want to begin with the Beowulf 
> Cluster, but I have one doubt: vmware enjoy the performance of the 
> cluster? I mean, using the Beowulf cluster i will have a better 
> perfomance with the vmware?? Will vmware work faster and better?
to my knowledge, VMWare will not takea advantage of Beowulf clustering 
technologies to run "faster". It might be useful for some sort of 
failover mechanism but in that case, you don't need Beowulf. You can 
just use the VMWare Virtual infrastructure product to give you that 
capability.


From prentice at ias.edu  Fri Feb 15 06:47:29 2008
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 15 Feb 2008 09:47:29 -0500
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <Pine.LNX.4.64.0802141504570.8063@cain.rgb.private.net>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>
	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
	<20080214130116.0bff9b7d@localhost.localdomain>
	<Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141427120.8063@cain.rgb.private.net>
	<47B49F7E.5040404@ias.edu>
	<Pine.LNX.4.64.0802141504570.8063@cain.rgb.private.net>
Message-ID: <47B5A601.5070803@ias.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert G. Brown wrote:
>  Rsh "and" anything else is difficulty squared,
> and kerberos isn't the universally implemented tool it was a decade ago,
> largely superceded by ssh and/or ssl connections.  So finding experts to
> help you make it work if you're a newbie isn't going to be that easy.

I don't think there's anything difficult about setting up rsh, ssh or
kerberos for anyone who know how to read a manual. A newbie shouldn't be
setting up a cluster in the first place. That's advanced kung-fu best
left to the black belts. Letting a neophyte build and run an HPC cluster
is some kind of oxymoron.

Yes, I know that professors usually tell some green graduate student to
go build a cluster for the dept, but that's a completely different topic
outside the scope of this list...

- --
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHtaYB2n4m8G8ypgARAgOUAJ4yqIPsXEGu+dddYemrDN6JlGsQXgCeNtij
mGmDDUKh3xeOcp8yGOtlKa0=
=Upwj
-----END PGP SIGNATURE-----


From prentice at ias.edu  Fri Feb 15 06:53:34 2008
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 15 Feb 2008 09:53:34 -0500
Subject: [Beowulf] sgi and linux networks
In-Reply-To: <20080215134920.GT26857@mcs.anl.gov>
References: <20080215134920.GT26857@mcs.anl.gov>
Message-ID: <47B5A76E.8000806@ias.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Meant to send this to the list, not just Robert:

Robert Latham wrote:
> http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&STORY=/www/story/02-14-2008/0004756539&EDATE=
> 
> For SGI's sake I hope this works out better than the Cray purchase.
> 
> ==rob
> 

When will SGI become RIP? That company has had one foot in the grave for
10 years now!

- --
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHtadu2n4m8G8ypgARArJtAKCSpTwJLxDKWqvrrawBod3DeQOxDgCgrC2i
xhFevbqVJLchA/KjDrKHtcE=
=zBeS
-----END PGP SIGNATURE-----


From tod at gust.sr.unh.edu  Fri Feb 15 12:00:44 2008
From: tod at gust.sr.unh.edu (Tod Hagan)
Date: Fri, 15 Feb 2008 15:00:44 -0500
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B5B152.6030505@tuffmail.us>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802150932120.10077@cain.rgb.private.net>
	<47B5B152.6030505@tuffmail.us>
Message-ID: <1203105644.5872.30.camel@trop.sr.unh.edu>

On Fri, 2008-02-15 at 16:35 +0100, Alan Louis Scheinine wrote:
> I use Centos.  For years and years I have compiled
> from source Gnu Scientific Library and many, many
> other software packages.  I write notes while doing
> the compiling from scratch so the next compilation
> of the same package is rapid.

Turn your notes into code -- I write scripts to build the packages which
as a side effect document things such as the arguments to configure.
It's overkill for simple './configure ; make ; make install' packages,
but has served well for complex packages.

Scripting the build process paid off handsomely for me recently when I
needed to build multiple versions of libraries using different versions
of compilers in an attempt to track down a problems affected by compiler
version.

> Personally I cannot imagine using software applications
> as found in any distribution.  For any user request, I
> compile from scratch from the software WWW site. Doesn't
> everybody do this?

Certainly with RHEL/Centos there isn't much choice, as even the epel
repo doesn't add enough packages to get anywhere near what Ubuntu
offers, for instance. And while I do want or need to build my own
versions of MPICH and certain other libraries and packages needed by the
users on the cluster, I have no interest in building programs for image
viewers such as qiv and ggv just because they're not included in the
distro, or worse, dropped in a new version (as was ggv from RHEL).

Tod

-- 
Tod Hagan
Information Technologist
AIRMAP/Climate Change Research Center
Institute for the Study of Earth, Oceans, and Space
University of New Hampshire
Durham, NH 03824
Phone: 603-862-3116


From gerry.creager at tamu.edu  Fri Feb 15 12:20:00 2008
From: gerry.creager at tamu.edu (Gerry Creager)
Date: Fri, 15 Feb 2008 14:20:00 -0600
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <e4d4fd070802151156r323c1e02h11ba70fdd87266ea@mail.gmail.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>	<47B5B542.7000605@scalableinformatics.com>	<Pine.LNX.4.64.0802151421170.13709@cain.rgb.private.net>	<47B5EBE9.4050203@scalableinformatics.com>
	<e4d4fd070802151156r323c1e02h11ba70fdd87266ea@mail.gmail.com>
Message-ID: <47B5F3F0.4000708@tamu.edu>

Peter St. John wrote:
>  >
> <bash>
> 
> There.  Now it's three out of three;-)
> 
>    rgb
> <
>  
> I prefer tcsh.
>  
> Peter

I no longer have a preference.  I write the shell script in whatever I 
happened to invoke at the top.  EXCEPT on our IM p575/AiX system where 
our system admins claim nothing but ksh can be used.  I refuse to use it 
on general principles, and show them scripts in anything else to prove 
they're wrong...

gerry
-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843


From john.leidel at gmail.com  Fri Feb 15 12:25:16 2008
From: john.leidel at gmail.com (John Leidel)
Date: Fri, 15 Feb 2008 14:25:16 -0600
Subject: [Beowulf] sgi and linux networks
In-Reply-To: <Pine.LNX.4.64.0802151018080.31809@coffee.psychology.mcmaster.ca>
References: <20080215134920.GT26857@mcs.anl.gov>
	<Pine.LNX.4.64.0802151018080.31809@coffee.psychology.mcmaster.ca>
Message-ID: <27f776af0802151225ke0adca5ped48624ec623e112@mail.gmail.com>

SGI is not actually picking up the support contracts from their federal
contracts.  It would be up to the customer to purchase [extend] the
contracts to SGI.  I certainly agree with Joe.  This is most certainly an
acquisition of assets, not a merger.  SGI has traditionally been a house of
large compute [for one realm or another].  They're probably tired of being
underbid on machines from cluster manufacturers.  Their answer is, of
course, the ICE product.  Acquiring LNXI technology [specifically in
software] helps SGI to augment this.

On Fri, Feb 15, 2008 at 9:20 AM, Mark Hahn <hahn at mcmaster.ca> wrote:

> >
> http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&STORY=/www/story/02-14-2008/0004756539&EDATE=
> >
> > For SGI's sake I hope this works out better than the Cray purchase.
>
> could this be driven entirely by Linux Networx holding some large
> contracts
> with gov labs?  I've never seen anything from LN that was drastically
> different from what's available through other means.  SGI's cluster
> offerings
> are similarly commodity-based.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080215/46d8f966/attachment.html>

From peter.st.john at gmail.com  Fri Feb 15 13:14:43 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Fri, 15 Feb 2008 16:14:43 -0500
Subject: [Beowulf] vmware perfomance
In-Reply-To: <47B59C03.6020900@venger.net>
References: <BLU116-W3897D21D0A761CBC8577AEF12A0@phx.gbl>
	<47B59C03.6020900@venger.net>
Message-ID: <e4d4fd070802151314mf9a229cu3e9eceac70676545@mail.gmail.com>

Actually, I'd thought about this myself some: suppose I have a small
cluster, and a head node that I also use as a workstation (so I'd think of
the compute nodes as devices serving on my workstation, not of the headnode
as dedicated to fileserving as you'd have at the departmental level). So
then one might wonder, could VMWare running on the head make use of nodes
for some workstation applications; e.g. run an application on a node while
the head CPU does mostly GUI? I dunno, I've never used VMware (or kvm etc)
Peter

On Fri, Feb 15, 2008 at 9:04 AM, Vernard Martin <vernard at venger.net> wrote:

> Frederico Aquino Carneiro wrote:
> > Hi! I am new in clustering, and I want to begin with the Beowulf
> > Cluster, but I have one doubt: vmware enjoy the performance of the
> > cluster? I mean, using the Beowulf cluster i will have a better
> > perfomance with the vmware?? Will vmware work faster and better?
> to my knowledge, VMWare will not takea advantage of Beowulf clustering
> technologies to run "faster". It might be useful for some sort of
> failover mechanism but in that case, you don't need Beowulf. You can
> just use the VMWare Virtual infrastructure product to give you that
> capability.
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080215/3a7fb70f/attachment.html>

From hahn at mcmaster.ca  Fri Feb 15 13:20:58 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 15 Feb 2008 16:20:58 -0500 (EST)
Subject: [Beowulf] sgi and linux networks
In-Reply-To: <27f776af0802151225ke0adca5ped48624ec623e112@mail.gmail.com>
References: <20080215134920.GT26857@mcs.anl.gov> 
	<Pine.LNX.4.64.0802151018080.31809@coffee.psychology.mcmaster.ca>
	<27f776af0802151225ke0adca5ped48624ec623e112@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802151618550.16218@coffee.psychology.mcmaster.ca>

> large compute [for one realm or another].  They're probably tired of being
> underbid on machines from cluster manufacturers.  Their answer is, of
> course, the ICE product.  Acquiring LNXI technology [specifically in
> software] helps SGI to augment this.

I don't know much about LNXI's technology, but had the impression
they just shipped versions of open-source stuff customized to their 
products.  maybe with some basic eyecandy for people allertic to CLI.

what do you think are the standout software provided by LNXI?
(again, I do not know, and am honestly asking...)

thanks, mark hahn.


From orion at cora.nwra.com  Fri Feb 15 13:23:09 2008
From: orion at cora.nwra.com (Orion Poplawski)
Date: Fri, 15 Feb 2008 14:23:09 -0700
Subject: [Beowulf] yum
In-Reply-To: <alpine.LRH.1.00.0802151104260.20735@hogwarts.egr.duke.edu>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>	<47B5B542.7000605@scalableinformatics.com>
	<alpine.LRH.1.00.0802151104260.20735@hogwarts.egr.duke.edu>
Message-ID: <47B602BD.9060504@cora.nwra.com>

Joshua Baker-LePain wrote:
> On Fri, 15 Feb 2008 at 10:52am, Joe Landman wrote
> 
>> Mark Hahn wrote:
>>>> whats everyones take on centos as a cluster os.
>>>
>>> works fine for me, but I also don't think distros are very important.
>>> the critical things are:
>>>
>>>     - must have a decent package system.  yum is; I'm not familiar 
>>> enough
>>>     with urpmi or apt to know them.  I think both provide appropriate
>>>     management of dependencies.
>>
>> Yum is good, so is apt.  I still have a problem with yum wanting to 
>> install i386 binaries as well as the x86_64 ones.  Haven't learned how 
>> to stop that yet (probably simple too).
> 
> It is:
> yum install gsl.x86_64
> 

Also try:

yum install yum-basearchonly


-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion at cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com


From gdjacobs at gmail.com  Fri Feb 15 13:28:23 2008
From: gdjacobs at gmail.com (Geoff Jacobs)
Date: Fri, 15 Feb 2008 15:28:23 -0600
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B5B542.7000605@scalableinformatics.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
Message-ID: <47B603F7.7000709@gmail.com>

Joe Landman wrote:
> 
> 
> Mark Hahn wrote:
>>> whats everyones take on centos as a cluster os.
>>
>> works fine for me, but I also don't think distros are very important.
>> the critical things are:
>>
>>     - must have a decent package system.  yum is; I'm not familiar enough
>>     with urpmi or apt to know them.  I think both provide appropriate
>>     management of dependencies.
> 
> Yum is good, so is apt.  I still have a problem with yum wanting to
> install i386 binaries as well as the x86_64 ones.  Haven't learned how
> to stop that yet (probably simple too).
> 
> There is much I do not like about rpm.  However it has a few nice
> features.  I can't live without
> 
>     rpm -qa
>     rpm -ql package
>     rpm -qf file
> 
> and am going through withdrawl as apt does not seem to provide these (or
> if they do, it isn't at all obvious how/where).
> 

I am here with your fix:
http://diablo.ucsc.edu/~wgscott/debian/apt-dpkg-ref.html

Of particular interest
rpm -qa equates to  dpkg -l
rpm -ql equates to -L <packagename>
rpm -qf equates to dpkg -S <filename>

Trust me, Debian based distros are the next best thing to crack cocaine.

-- 
Geoffrey D. Jacobs


From hahn at mcmaster.ca  Fri Feb 15 13:33:13 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 15 Feb 2008 16:33:13 -0500 (EST)
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <Pine.LNX.4.64.0802151421170.13709@cain.rgb.private.net>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>
	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<Pine.LNX.4.64.0802151421170.13709@cain.rgb.private.net>
Message-ID: <Pine.LNX.4.64.0802151627030.16218@coffee.psychology.mcmaster.ca>

> But there are places where cracking has a much higher up-front cost, or
> a higher risk.  So I don't argue that this recipe is right for all.

I'd argue that your approach is limited to fairly small sites.
that is, a large site (I'm mainly thinking of number and diversity
of users) needs to be hardened, since a crack _could_ do a lot of damage.
if the only cost is downtime, it's not really an issue - you can recover
quickly from a crack with either approach.

ironically, we had some uninvited visitors in december - almost certainly
got in via passwords sniffed at a nearby organization, and then probably
used the ia32-emulation local-root-elevation.  luckily, their main goal
seemed to be launching Brazilian spam (for which our network is not really
all that suitable.)  not a lot of their effort went towards staying 
inconspicuous, or in scoping out the extent of our resources.  and, happily,
no actual damage that we've found.

come to think of it, odd that security/cracking experiences have never 
been much talked about on this list...

regards, mark.


From gdjacobs at gmail.com  Fri Feb 15 13:41:58 2008
From: gdjacobs at gmail.com (Geoff Jacobs)
Date: Fri, 15 Feb 2008 15:41:58 -0600
Subject: [Beowulf] sgi and linux networks
In-Reply-To: <Pine.LNX.4.64.0802151618550.16218@coffee.psychology.mcmaster.ca>
References: <20080215134920.GT26857@mcs.anl.gov>
	<Pine.LNX.4.64.0802151018080.31809@coffee.psychology.mcmaster.ca>	<27f776af0802151225ke0adca5ped48624ec623e112@mail.gmail.com>
	<Pine.LNX.4.64.0802151618550.16218@coffee.psychology.mcmaster.ca>
Message-ID: <47B60726.7090907@gmail.com>

Mark Hahn wrote:
>> large compute [for one realm or another].  They're probably tired of
>> being
>> underbid on machines from cluster manufacturers.  Their answer is, of
>> course, the ICE product.  Acquiring LNXI technology [specifically in
>> software] helps SGI to augment this.
> 
> I don't know much about LNXI's technology, but had the impression
> they just shipped versions of open-source stuff customized to their
> products.  maybe with some basic eyecandy for people allertic to CLI.

Unfortunately, not. MAC locked binaries all the way. Actually, it's
really annoying having to open a trouble ticket if you happen to have
one of those ARIMA boards die a horrible heat death, just because the
license needs readjusting.

Like most proprietary software, the LNXI stuff seems to live in it's own
little universe.

-- 
Geoffrey D. Jacobs


From mark.kosmowski at gmail.com  Fri Feb 15 13:38:04 2008
From: mark.kosmowski at gmail.com (Mark Kosmowski)
Date: Fri, 15 Feb 2008 16:38:04 -0500
Subject: [Beowulf] High Performance SSH/SCP
Message-ID: <c84311bb0802151338p6ddcbde0rc881772d504e5c28@mail.gmail.com>

>
>
>
> Robert G. Brown wrote:
> >  Rsh "and" anything else is difficulty squared,
> > and kerberos isn't the universally implemented tool it was a decade ago,
> > largely superceded by ssh and/or ssl connections.  So finding experts to
> > help you make it work if you're a newbie isn't going to be that easy.
>
> I don't think there's anything difficult about setting up rsh, ssh or
> kerberos for anyone who know how to read a manual. A newbie shouldn't be
> setting up a cluster in the first place. That's advanced kung-fu best
> left to the black belts. Letting a neophyte build and run an HPC cluster
> is some kind of oxymoron.
>
> Yes, I know that professors usually tell some green graduate student to
> go build a cluster for the dept, but that's a completely different topic
> outside the scope of this list...


I'm either not as much of a newbie / neophyte as I think I am or I missed
the memo about this list being for pro's only.

As far as the scope of this list, as clustering becomes more and more
prolific, there are going to be more and more newbies.  This list is, like
it or not, one of the "franchise" clustering information portals.  When I
was first building my first (and only) personal cluster I stopped by here
and at ClusterMonkey primarily to get my feet wet.  Most of the things were
(and, frankly, still are) quite above my head, but I was made aware of many
other resources and things to think about - this increased understanding is
helpful even if I don't implement many of the ideas dicussed here.  The end
result is a functioning cluster - mayhps not nearly as elegant as many of
the clusters many of the others on the list maintain, but I get data
nonetheless.

I think it would be a disservice to the community to turn away cluster
newbies from this list.  At the very least encouragement and resource links
should be provided.  Appropriately experienced list members with a bit of
time are also free to take discussions off to private email if that is more
appropriate than the list in general.  After all, the world is replete with
examples of complete newbie's coming up with ideas to revolutionize the
fields to which they are new.

Mark Kosmowski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080215/be506309/attachment.html>

From prentice at ias.edu  Fri Feb 15 13:54:41 2008
From: prentice at ias.edu (Prentice Bisbal)
Date: Fri, 15 Feb 2008 16:54:41 -0500
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <c84311bb0802151338p6ddcbde0rc881772d504e5c28@mail.gmail.com>
References: <c84311bb0802151338p6ddcbde0rc881772d504e5c28@mail.gmail.com>
Message-ID: <47B60A21.1000008@ias.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Mark Kosmowski wrote:
> 
> 
>     Robert G. Brown wrote:
>     >  Rsh "and" anything else is difficulty squared,
>     > and kerberos isn't the universally implemented tool it was a
>     decade ago,
>     > largely superceded by ssh and/or ssl connections.  So finding
>     experts to
>     > help you make it work if you're a newbie isn't going to be that easy.
> 
>     I don't think there's anything difficult about setting up rsh, ssh or
>     kerberos for anyone who know how to read a manual. A newbie shouldn't be
>     setting up a cluster in the first place. That's advanced kung-fu best
>     left to the black belts. Letting a neophyte build and run an HPC cluster
>     is some kind of oxymoron.
> 
>     Yes, I know that professors usually tell some green graduate student to
>     go build a cluster for the dept, but that's a completely different topic
>     outside the scope of this list...
> 
>  
>  
> I'm either not as much of a newbie / neophyte as I think I am or I
> missed the memo about this list being for pro's only.
>  
> As far as the scope of this list, as clustering becomes more and more
> prolific, there are going to be more and more newbies.  This list is,
> like it or not, one of the "franchise" clustering information portals. 
> When I was first building my first (and only) personal cluster I stopped
> by here and at ClusterMonkey primarily to get my feet wet.  Most of the
> things were (and, frankly, still are) quite above my head, but I was
> made aware of many other resources and things to think about - this
> increased understanding is helpful even if I don't implement many of the
> ideas dicussed here.  The end result is a functioning cluster - mayhps
> not nearly as elegant as many of the clusters many of the others on the
> list maintain, but I get data nonetheless.
>  
> I think it would be a disservice to the community to turn away cluster
> newbies from this list.  At the very least encouragement and resource
> links should be provided.  Appropriately experienced list members with a
> bit of time are also free to take discussions off to private email if
> that is more appropriate than the list in general.  After all, the world
> is replete with examples of complete newbie's coming up with ideas to
> revolutionize the fields to which they are new.
>  
> Mark Kosmowski

Let me rephrase my response to RGB's statement:

I think the difficulties of setting up rsh/ssh/and kerberos are greatly
exaggerated. SSH usually works out of the box, except for the
password-less login that requires generating keys. That part is
relatively simple, and is documented all over the web.

4 years ago, I set up kerberos for the very first time, without any
prior experience. I read through the relevant chapters of the O' Reilly
Kerberos book and had it up and running in only a couple of days. Most
of that time was spent reading. I disagree with RGB's equation

rsh + anything = (difficulty)^2

I found that once kerberos is set up, using kerberized rsh is
essentially invisible, therefore

rsh + kerberos = difficulty

which is a first-order relationship.

If anything, setting up rsh is the most difficult one. Why? Since rsh is
so insecure, the distro producers/vendors have created many hurdle you
must hop to get it working (correct file and owner permissions, etc.)
Ironic.

- --
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHtgoh2n4m8G8ypgARApKbAKDTM4dDBTbEsPGZ96ATimhh93akEACgqhfW
XKuset0dI1xR9rAq3OY38fM=
=KQGF
-----END PGP SIGNATURE-----


From john.hearns at streamline-computing.com  Fri Feb 15 14:03:08 2008
From: john.hearns at streamline-computing.com (John Hearns)
Date: Fri, 15 Feb 2008 22:03:08 +0000
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <c84311bb0802151338p6ddcbde0rc881772d504e5c28@mail.gmail.com>
References: <c84311bb0802151338p6ddcbde0rc881772d504e5c28@mail.gmail.com>
Message-ID: <1203112998.6419.84.camel@Vigor13>

On Fri, 2008-02-15 at 16:38 -0500, Mark Kosmowski wrote:

>         
>         I don't think there's anything difficult about setting up rsh,
>         ssh or
>         kerberos for anyone who know how to read a manual. A newbie
>         shouldn't be
>         setting up a cluster in the first place. That's advanced
>         kung-fu best
>         left to the black belts. Letting a neophyte build and run an
>         HPC cluster
>         is some kind of oxymoron.

>  
> I think it would be a disservice to the community to turn away cluster
> newbies from this list.  At the very least encouragement and resource
> links should be provided.  
> 

No, no, no Mark. You have it entirely wrong.

Building clusters must be the sole province of a respected and revered
class of worker. Our unique skills must be preserved from overwork, and
lest we become overheated or heaven forbid thirsty during our exacting
labours a bevy of handmaidens (or handsome hunks, depending on your
proclivity) must be on hand to constantly bring us cool beer.
If we are ever seen to appear to have eyes closed and feet up this is a
planning phase for the next masterful and elegant Beowulf configuration.


I myself never travel to site except in the company motorhome with at
least a personal chef and two waitresses. My rider specifies a day bed
next to the machine room, and a bowl of M&Ms.
(Dream on - Ed)


From rgb at phy.duke.edu  Fri Feb 15 14:06:22 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 15 Feb 2008 17:06:22 -0500 (EST)
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <c84311bb0802151338p6ddcbde0rc881772d504e5c28@mail.gmail.com>
References: <c84311bb0802151338p6ddcbde0rc881772d504e5c28@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802151650120.13709@cain.rgb.private.net>

On Fri, 15 Feb 2008, Mark Kosmowski wrote:

>>
>>
>>
>> Robert G. Brown wrote:
>>>  Rsh "and" anything else is difficulty squared,
>>> and kerberos isn't the universally implemented tool it was a decade ago,
>>> largely superceded by ssh and/or ssl connections.  So finding experts to
>>> help you make it work if you're a newbie isn't going to be that easy.
>>
>> I don't think there's anything difficult about setting up rsh, ssh or
>> kerberos for anyone who know how to read a manual. A newbie shouldn't be
>> setting up a cluster in the first place. That's advanced kung-fu best
>> left to the black belts. Letting a neophyte build and run an HPC cluster
>> is some kind of oxymoron.
>>
>> Yes, I know that professors usually tell some green graduate student to
>> go build a cluster for the dept, but that's a completely different topic
>> outside the scope of this list...
>
>
>
> I'm either not as much of a newbie / neophyte as I think I am or I missed
> the memo about this list being for pro's only.

I'm not certain I agree with the principle premise anyway, so don't
worry about it.  I myself am a mere amateur by this standard, as almost
every cluster I ever heard of was built (originally) by a neophyte.
Very few places, almost none at Universities, hire a "cluster expert" to
come build a cluster.

Although that this could be said at all is a testament to the immense
success of the COTS cluster design and this list.  After all, I'd guess
that 90% or so of all "cluster professionals" in the universe got their
start, as newbies, right here.  There are at this point a tiny handful
of schools that have cluster computing programs, but a whole lot of pros
are created through a sort of "dynamic interactive apprenticeship" on
this list.  Some of them jumped in with some sort of sysadmin
experience, many came in from the research side in various sciences,
driven by the universal Hunger for Cycles that drives us all...;-)

At any given time, I usually have anywhere between two and five
"students" around the world who are communicating with me offlist while
trying to set up their own first cluster, so I actually have some reason
for making these statements.  I have three right now who are high school
students, seriously.  Bright ones.  One west coast and two east coast.
As well as a couple in India where I'm not so sure what kind of program
they are in.  Some I communicate with just once or twice and they're off
and running, others I talk with over months, explaining this, helping
them solve that, getting them to where they can actually install,
manage, and run jobs in parallel on a tiny cluster (and then they're
usually on their own and flying free).

Obviously the high school students aren't professional sysadmins (or
professional anything).  Sometimes they are learning linux from scratch.
They learn fast.

> I think it would be a disservice to the community to turn away cluster
> newbies from this list.  At the very least encouragement and resource links
> should be provided.  Appropriately experienced list members with a bit of
> time are also free to take discussions off to private email if that is more
> appropriate than the list in general.  After all, the world is replete with
> examples of complete newbie's coming up with ideas to revolutionize the
> fields to which they are new.

Don't sweat it.  Newbies are all welcome, and we (most of us) started
out as newbies once upon a time and were made welcome ourselves.  I
couldn't begin to count the number of times I've answered the "how do I
get started" question on this list.  And I'm hardly alone -- there are
at least thirty other people on this list who chime right in, and I'd
bet money that several of them help out random students offlist as well.

I'm just as happy to answer questions on or off list, and they don't
have to be about ubertech stuff.  You never know when you're going to
learn something useful.  Even the "useless" stuff on Tesla coils is
relevant to someone wondering why equipment racks sometimes come with
metal grids on all sides, and why those racks (that are typically a lot
more expensive) are still purchased.  Is it vanity?  Do we like our
cluster to look cool?  Well, maybe a little.  But a GOOD reason to do it
is if you're in an electrically noisy environment, which can happen in
both industry and really, really easily in a physics department...

I personally learn important EE things from Jim Lux every time he opens
his digital mouth, and I already know a LOT about E&M (and usually
"should" know the things he kindly corrects me on:-).

    rgb

>
> Mark Kosmowski
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From hahn at mcmaster.ca  Fri Feb 15 14:25:26 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Fri, 15 Feb 2008 17:25:26 -0500 (EST)
Subject: [Beowulf] vmware perfomance
In-Reply-To: <e4d4fd070802151314mf9a229cu3e9eceac70676545@mail.gmail.com>
References: <BLU116-W3897D21D0A761CBC8577AEF12A0@phx.gbl>
	<47B59C03.6020900@venger.net>
	<e4d4fd070802151314mf9a229cu3e9eceac70676545@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802151635410.16218@coffee.psychology.mcmaster.ca>

> then one might wonder, could VMWare running on the head make use of nodes
> for some workstation applications; e.g. run an application on a node while
> the head CPU does mostly GUI? I dunno, I've never used VMware (or kvm etc)

you don't need vmware for that: run X on the head node and X clients.
or do it like ParaView, where the gui is an X client that communicates
via a socket to compute/render backend process(es).

vmware just virtualizes: allows you to multiplex multiple virtual machines
on one physical machine.  in general, the virtual machines will be slower
than if they were running on bare metal (to the degree that they involve
the hypervisor and its bounding OS.)  if the app in the VM is computebound
with a fairly static memory working set, the loss in performance will be 
minimal.

in a clustering environment, the appeal of VM's is that it's a complete
container for a job, so can be moved around.  I haven't heard of people 
doing a VM containing multiple MPI jobs, running across multiple physical
nodes, but there's no reason it couldn't happen.  if your userbase demands
specific OS images, VM's might be the ticket (my experience is that users
mostly don't care about the OS/distro as long as it works, and thankfully
MS Windows is a bit of a cognitive mismatch to HPC.)

I'm skeptical how much sense VM's in HPC make, though.  yes, it would be 
nice to have a container for MPI jobs: checkpoints for free, ability to do
migration.  both these factors depend on the scale of your jobs: if all your
jobs are 4k cpus and up, even a modest node failure rate is going to make
agressive checkpointing necessary (versus jobs averaging 64p which are 
almost never taken down by a node failure.)  similarly if your workload is
all serial jobs, there's probably no need at all for migration (versus a 
workload with high variance in job size, length, priority, etc).

regards, mark hahn.


From rgb at phy.duke.edu  Fri Feb 15 14:26:26 2008
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Fri, 15 Feb 2008 17:26:26 -0500 (EST)
Subject: [Beowulf] High Performance SSH/SCP
In-Reply-To: <47B60A21.1000008@ias.edu>
References: <c84311bb0802151338p6ddcbde0rc881772d504e5c28@mail.gmail.com>
	<47B60A21.1000008@ias.edu>
Message-ID: <Pine.LNX.4.64.0802151708370.13709@cain.rgb.private.net>

On Fri, 15 Feb 2008, Prentice Bisbal wrote:

> 4 years ago, I set up kerberos for the very first time, without any
> prior experience. I read through the relevant chapters of the O' Reilly
> Kerberos book and had it up and running in only a couple of days. Most
> of that time was spent reading. I disagree with RGB's equation
>
> rsh + anything = (difficulty)^2
>
> I found that once kerberos is set up, using kerberized rsh is
> essentially invisible, therefore
>
> rsh + kerberos = difficulty
>
> which is a first-order relationship.
>
> If anything, setting up rsh is the most difficult one. Why? Since rsh is
> so insecure, the distro producers/vendors have created many hurdle you
> must hop to get it working (correct file and owner permissions, etc.)
> Ironic.

My equation was empirical, and it isn't really squared, and the
experience was with real newbies.  You have to understand, a lot of
people who get into cluster computing really have NO zero nada Unix
experience.  Some of them ask me how to set up a Windows cluster
initially, until I point out that I don't do Windows (and here's why).
I summarized that a bit in the last post, though, and won't
recapitulate.

Also, things just plain don't always work.  Not even for me on my own
systems, and I'm a 22 year Unix sysadmin (just kidding, last post, about
the amateur bit:-).  Just as a currently relevant example:

(Following yum install \*pvm\*, Fedora 8, same laptop on which pvm
WORKED when I was asking about it last week.)

...
Total download size: 2.6 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): pvm-3.4.5-7.fc6.1. 100% |=========================| 2.1 MB 00:02 
(2/2): pvm-gui-3.4.5-7.fc 100% |=========================| 549 kB 00:00 
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
   Installing: pvm ######################### [1/2]
   Installing: pvm-gui ######################### [2/2]

Installed: pvm-gui.i386 0:3.4.5-7.fc6.1
Dependency Installed: pvm.i386 0:3.4.5-7.fc6.1
Complete!
rgb at cain|B:1031#z

[3]+  Stopped                 su
rgb at cain|B:1038>xpvm
libpvm [pid14741] /tmp/pvmd.1337: No such file or directory
libpvm [pid14741]: Can't Start PVM: Can't start pvmd

OK, this is trying to start up xpvm (or pvm) on a brand, shiny,
so-new-it-hurts fresh yum install on Fedora 8 on a system where it
worked last week. It quit.  I haven't GOTTEN to where rsh or ssh matters
(and besides, I have had a working PVM environment for about fifteen
years now).  So what's wrong?

If this isn't bad enough, I installed PVM two days ago on nine F8 boxes
and IT worked perfectly, but xpvm failed (and I discovered that it
wouldn't work on my laptop where it was just working).  Why?  It will no
doubt be a long and painful process to figure it out.  To first order,
the PVM RPM installs the binaries in bin/LINUX under intel, but the
/usr/bin/xpvm script is forming LINUX$(ARCH) and it should be LINUXI386
-- this is an actual PVM bug -- but there is still more wrong.

Now imagine me helping a newbie get started with PVM.  I've used it for
years, works perfectly.  They tell me that they run pvm and get an error
message like this one.  EVEN if I've seen this before and figured it out
once, I've long since forgotten.  It has re-broken.  Or it's new.  Is it
PVM?  Is it paths?  Is it rsh or ssh?  Bug in the distro RPM?

When things work like they're supposed to, it is as you say extremely
simple and the online guides work fine.  When it doesn't, you have to
debug EVERY ERROR PATHWAY, and adding layers adds infinite pain.
Infinite pain "squared", if you're trying to do so via email with
somebody on the other end of the line that is using an xterm for the
first time and who might be working as root for all that you know.

So sure, as I said one can certainly do what you describe if you know
what you're doing, are self-starting, a manual reader, and a problem
solver.  YMMV painwise, even if you're an expert, depending on whether
things WORK they way they are supposed to and how hard it is to discover
where your setup deviates from the assumptions in the documentation.
But if you fail on any of these, the pain index rapidly escalates to
"quit and write novels instead", or "get new job as prison guard" or
even "perform self-appendectomy with a rusty spoon".  (You laugh, but
I'd bet even money that a show of hands among the real old hand cluster
people on list would turn up a dozen or more with some nasty scars on
their lower abdomen;-)

    rgb

>
> - --
> Prentice Bisbal
> Linux Software Support Specialist/System Administrator
> School of Natural Sciences
> Institute for Advanced Study
> Princeton, NJ
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
>
> iD8DBQFHtgoh2n4m8G8ypgARApKbAKDTM4dDBTbEsPGZ96ATimhh93akEACgqhfW
> XKuset0dI1xR9rAq3OY38fM=
> =KQGF
> -----END PGP SIGNATURE-----
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


From james.p.lux at jpl.nasa.gov  Fri Feb 15 15:24:39 2008
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Fri, 15 Feb 2008 15:24:39 -0800
Subject: [Beowulf] vmware perfomance
In-Reply-To: <Pine.LNX.4.64.0802151635410.16218@coffee.psychology.mcmaster.ca>
References: <BLU116-W3897D21D0A761CBC8577AEF12A0@phx.gbl>
	<47B59C03.6020900@venger.net>
	<e4d4fd070802151314mf9a229cu3e9eceac70676545@mail.gmail.com>
	<Pine.LNX.4.64.0802151635410.16218@coffee.psychology.mcmaster.ca>
Message-ID: <20080215152439.pdrdpmrv1gggc4g0@webmail.jpl.nasa.gov>

Quoting Mark Hahn <hahn at mcmaster.ca>, on Fri 15 Feb 2008 02:25:26 PM PST:

>
> I'm skeptical how much sense VM's in HPC make, though.  yes, it would
> be nice to have a container for MPI jobs: checkpoints for free, ability
> to do
> migration.  both these factors depend on the scale of your jobs: if all your
> jobs are 4k cpus and up, even a modest node failure rate is going to make
> agressive checkpointing necessary (versus jobs averaging 64p which are
> almost never taken down by a node failure.)  similarly if your workload
> is
> all serial jobs, there's probably no need at all for migration (versus
> a workload with high variance in job size, length, priority, etc).


Perhaps the added overhead of using VMs to do "user transparent  
checkpointing" is worth it in the same sense that most folks are  
willing to tolerate the overhead of using a compiler and linker  
instead of working in hex,octal, or binary machine code.  Rather than  
force a researcher to figure out how to do checkpointing, you buy a  
few dozen more nodes to make up for the extra work.

You spend more on hardware and less on bodies, and since the hardware  
is always getting cheaper (per quanta of "work") the trade gets more  
attractive with time.

{Leaving aside interesting philosophical discussions having to do with  
incremental cost of labor, especially ones own, vs capital and  
operating costs of the iron.  I've also noticed that even though we've  
gone through many many Moore's Law doublings, with, probably a 5000  
fold increase in computational horsepower on an engineer's desk every  
20 years, design and analysis methodologies change much slower.  In  
the RF world, state of the art in design tools in 1960 was a paper  
Smith chart and a slide rule, and a healthy dose of simplified  
analytical approximations.  State of the art in 1980 was simple  
computer tools that essentially automated the pencil and paper  
techniques, as well as some numerical analysis things (e.g. SPICE for  
circuit simulation, which solves matrix equations and does numerical  
integration, or early electromagnetics codes)  State of the art in  
2000 (and today, really) is integrated modeling tools with much larger  
matrices and tighter integration between FEM codes and circuit theory  
type analysis (that is, you might model the packaging with an EM code  
but you'd use a behavioral model for the semiconductor device, rather  
than using Maxwell's equations all the way down to the atomic level)

However, even with such nifty tools, a huge number of engineers still  
use paper and pencil style analysis.  Granted, they use Excel instead  
of their trusty HP45 and a quad pad.. but the style of analysis and  
design is the same. They even teach classes in "RF Design with Excel"  
(which I view as anathema)  Why isn't everyone using the new tools  
(which hugely improve productivity and quality of the resulting design)?

Capital investment is required (gotta invest in the iron, and the seat  
license)
Familiarity (if you learned to design 20 years ago, you're comfortable  
with the methodology, you're aware of the limits, and you are  
satisfied with the precision and accuracy of the results of that  
methodology...)
The latter is another aspect of capital investment.. it takes time to  
get used to a new way of doing things, time that the engineer may not  
have, in an environment that stresses getting the product out the door  
(or, in the case of where I work, getting to the launch pad in time  
for the every two year launch opportunity for Mars).

So, against this background, giving up even 80% of the computational  
horsepower, in exchange for allowing one to use a tool that might make  
you 10 times more productive is a good trade.  Sometimes, I think that  
folks developing automatic parallelizers and similar tools are working  
too hard to make it perfect.  If I can take a chunk of software that  
takes, say, 1 day (requiring periodic interactions, e.g., it's not a  
batch overnight thing) to run now, and get it to run in 10 minutes,  
that's a huge improvement.  Put it in numbers.  Say it costs me $3000  
for a computer to run it in a day.  If I can run it in 10 minutes  
(e.g. about 50 times faster), and I do one run a day, I don't care if  
it takes 100 processors to run 50x faster, as opposed to only 50.  The  
extra 50 processors costs me, say, $200K (extra overhead for  
connectivity, facilities, etc.), which is a small fraction of the time  
saved, because I've essentially replaced 50 engineers with 1. (putting  
those 49 engineers out on the street, where they will inevitably cause  
problems..idle hands, playgrounds, and so forth)

In fact, you could have some hideously inefficient scheme that takes  
1000 processors to go 10 times faster, and it's probably still a good  
deal.

Jim Lux


From atp at piskorski.com  Fri Feb 15 15:57:49 2008
From: atp at piskorski.com (Andrew Piskorski)
Date: Fri, 15 Feb 2008 18:57:49 -0500
Subject: [Beowulf] wajig for Ubuntu/Debian package management
In-Reply-To: <47B5CFC9.6060105@scalableinformatics.com>
References: <47B5CFC9.6060105@scalableinformatics.com>
Message-ID: <20080215235749.GA60161@piskorski.com>

On Fri, Feb 15, 2008 at 12:45:45PM -0500, Joe Landman wrote:
> Subject: Re: [Beowulf] centos5 as cluster os
> Tim Cutts wrote:

> >apt-file : not installed by default, but phenomenally useful - it's dpkg 
> >-S for stuff that isn't installed yet.  So if you want to ask "what 
> >package do I need to install to supply this obscure header file", 
> >apt-file can tell you.
> 
> Will have to play with it.

Those are good hints...

Debian and Ubuntu have excellent package management functionality and
and repositories (as good or better than any other major Linux or Unix
distribution, AFAIK), but strangely, by default they have no
consistent API or command set for using it.  Therefore, I recommend
trying out wajig:

  sudo apt-get install wajig
  http://www.togaware.com/linux/survivor/Wajig_Overview.html

It's really just a wrapper (in Python) around all the same underlying
command-line tools, but in my limited use of it so far, it seems
noticeably more convenient than the traditional bizarre, non-orthogonal
mishmash of apt-get, dpkg, apt-cache, apt-file, etc.
(I don't know if/how it compares to aptitude, I've never used that.)

The Debian package management tools don't seem to have any sane
programming API, but so far I haven't really needed that (and wajig
manages to do without).

Their only other major flaw that I'm aware of, is that, just like all
the rpm based tools, you can only have one single version of a binary
package installed at a time (yuck!).  Perhaps one day, the sort of
tools the NixOS and DragonFly BSD folks are working on will fix that.

  http://nix.cs.uu.nl/index.html
  http://lambda-the-ultimate.org/node/2176
  http://www.dragonflybsd.org/docs/goals.shtml#packages

-- 
Andrew Piskorski <atp at piskorski.com>
http://www.piskorski.com/


From landman at scalableinformatics.com  Fri Feb 15 16:02:28 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Fri, 15 Feb 2008 19:02:28 -0500
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B603F7.7000709@gmail.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<47B603F7.7000709@gmail.com>
Message-ID: <47B62814.3070106@scalableinformatics.com>


Geoff Jacobs wrote:

> Of particular interest
> rpm -qa equates to  dpkg -l
> rpm -ql equates to -L <packagename>
> rpm -qf equates to dpkg -S <filename>

Ahhh... Rosetta stones ....

> Trust me, Debian based distros are the next best thing to crack cocaine.

I will take your word on that particular comparison ...

What sold me was when I needed to build a new kernel to add support for 
something.

On Centos:  A nightmare (not that there is anything wrong with Centos, 
its just what Redhat does to the kernel build is enough to make a grown 
hacker cry), can't easily generate rpms for it.  Make rpm sort of does a 
generic RPM without really packaging up headers, sources, module links, 
... correctly.  There is no real kernel make package and trying to 
insert a modern up-to-date kernel into the .spec is an exercise in masochism

On SuSE it is even worse.

For all intents and purposes, if you need to deviate far from the 
supplied kernel version, you are basically toast unless you do *lots* of 
things by hand.  This makes things like Fedora look nice as they build 
the modern kernels for you, albeit not necessarily with the options you 
want.

With Ubuntu (Debian for all intensive porpoises) you pull your kernel 
source, make changes, patch what you need/want, build your config (all 
of which you have to do on the others anyway to make sure it will build 
correctly) and

	CONCURRENCY=4 make-kpkg buildpackage

and whammo, a working, correctly built, linked, set up .deb .  You 
didn't even have to think hard.

It just works.

It is a shame that make-kpkg doesn't have an RPM target.  I guess I 
could use alien to convert it, but ... they just make life to darned easy.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From gdjacobs at gmail.com  Fri Feb 15 18:04:21 2008
From: gdjacobs at gmail.com (Geoff Jacobs)
Date: Fri, 15 Feb 2008 20:04:21 -0600
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B62814.3070106@scalableinformatics.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<47B603F7.7000709@gmail.com>
	<47B62814.3070106@scalableinformatics.com>
Message-ID: <47B644A5.1040703@gmail.com>

Joe Landman wrote:
> 
> 
> Geoff Jacobs wrote:
> 
>> Of particular interest
>> rpm -qa equates to  dpkg -l
>> rpm -ql equates to -L <packagename>
>> rpm -qf equates to dpkg -S <filename>

A little edit... I believe I meant dpkg -L <packagename> above.

> Ahhh... Rosetta stones ....
> 
>> Trust me, Debian based distros are the next best thing to crack cocaine.
> 
> I will take your word on that particular comparison ...

I don't really know why that metaphor is popular. It's not like many
people can attest to the addictiveness of crack.

Tetris, on the other hand...

> What sold me was when I needed to build a new kernel to add support for
> something.
> 
> On Centos:  A nightmare (not that there is anything wrong with Centos,
> its just what Redhat does to the kernel build is enough to make a grown
> hacker cry), can't easily generate rpms for it.  Make rpm sort of does a
> generic RPM without really packaging up headers, sources, module links,
> ... correctly.  There is no real kernel make package and trying to
> insert a modern up-to-date kernel into the .spec is an exercise in
> masochism
> 
> On SuSE it is even worse.
> 
> For all intents and purposes, if you need to deviate far from the
> supplied kernel version, you are basically toast unless you do *lots* of
> things by hand.  This makes things like Fedora look nice as they build
> the modern kernels for you, albeit not necessarily with the options you
> want.
> 
> With Ubuntu (Debian for all intensive porpoises) you pull your kernel
> source, make changes, patch what you need/want, build your config (all
> of which you have to do on the others anyway to make sure it will build
> correctly) and
> 
>     CONCURRENCY=4 make-kpkg buildpackage
> 
> and whammo, a working, correctly built, linked, set up .deb .  You
> didn't even have to think hard.
> 
> It just works.
> 
> It is a shame that make-kpkg doesn't have an RPM target.  I guess I
> could use alien to convert it, but ... they just make life to darned easy.

This sort of logic is prevalent throughout Debian. Files and libraries
tend to exist where developers and sysadmins want them to be.
Ultimately, this was the reason I abandoned RH and went with Debian ~7
years ago.

Another big advantage with Debian was the ability to install within 200
MB of hard disk. It was sometimes tricky bringing the size of RH down to
the same level. The installation of one useful utility tended to trigger
the installation of a complete X environment (RH would often bundle the
X and console versions together).

This space consideration was far more important when I was working with
cast-off p100 boards running some awful RH variant, as opposed to now
with universal fast ethernet and cheap hard disk space. It is
indicative, however, of the thoughtful work behind Debian and it's brethren.

Oh, and Suse and Redhat are not masochistic, at least in comparison
(just overweight). No, Slackware is the distribution which could most be
described as masochistic.

-- 
Geoffrey D. Jacobs


From john.leidel at gmail.com  Fri Feb 15 21:23:17 2008
From: john.leidel at gmail.com (John Leidel)
Date: Fri, 15 Feb 2008 23:23:17 -0600
Subject: [Beowulf] sgi and linux networks
In-Reply-To: <47B60726.7090907@gmail.com>
References: <20080215134920.GT26857@mcs.anl.gov>
	<Pine.LNX.4.64.0802151018080.31809@coffee.psychology.mcmaster.ca>
	<27f776af0802151225ke0adca5ped48624ec623e112@mail.gmail.com>
	<Pine.LNX.4.64.0802151618550.16218@coffee.psychology.mcmaster.ca>
	<47B60726.7090907@gmail.com>
Message-ID: <1203139397.7281.24.camel@e521.site>

Indeed Mark, they have [had] their own cluster management stack called
Clusterworx.  It was completely their own.

On Fri, 2008-02-15 at 15:41 -0600, Geoff Jacobs wrote:
> Mark Hahn wrote:
> >> large compute [for one realm or another].  They're probably tired of
> >> being
> >> underbid on machines from cluster manufacturers.  Their answer is, of
> >> course, the ICE product.  Acquiring LNXI technology [specifically in
> >> software] helps SGI to augment this.
> > 
> > I don't know much about LNXI's technology, but had the impression
> > they just shipped versions of open-source stuff customized to their
> > products.  maybe with some basic eyecandy for people allertic to CLI.
> 
> Unfortunately, not. MAC locked binaries all the way. Actually, it's
> really annoying having to open a trouble ticket if you happen to have
> one of those ARIMA boards die a horrible heat death, just because the
> license needs readjusting.
> 
> Like most proprietary software, the LNXI stuff seems to live in it's own
> little universe.
> 


From tjrc at sanger.ac.uk  Sat Feb 16 00:32:05 2008
From: tjrc at sanger.ac.uk (Tim Cutts)
Date: Sat, 16 Feb 2008 08:32:05 +0000
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47B62814.3070106@scalableinformatics.com>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>
	<47B5B542.7000605@scalableinformatics.com>
	<47B603F7.7000709@gmail.com>
	<47B62814.3070106@scalableinformatics.com>
Message-ID: <E51CA0D8-6838-4A11-9A96-B784752C05D9@sanger.ac.uk>


On 16 Feb 2008, at 12:02 am, Joe Landman wrote:

> With Ubuntu (Debian for all intensive porpoises) you pull your  
> kernel source, make changes, patch what you need/want, build your  
> config (all of which you have to do on the others anyway to make  
> sure it will build correctly) and
>
> 	CONCURRENCY=4 make-kpkg buildpackage
>
> and whammo, a working, correctly built, linked, set up .deb .  You  
> didn't even have to think hard.
>
> It just works.

Yes, make-kpkg is a real winner.  Combine it with debarchiver running  
on a server somewhere to automatically build your own APT repository  
(as we do) and you can build a kernel package, upload it to your local  
repository with dput, and then apt-get install it on all your  
machines.  Lovely.

Tim


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From tjrc at sanger.ac.uk  Sat Feb 16 00:44:13 2008
From: tjrc at sanger.ac.uk (Tim Cutts)
Date: Sat, 16 Feb 2008 08:44:13 +0000
Subject: [Beowulf] Re: wajig for Ubuntu/Debian package management
In-Reply-To: <20080215235749.GA60161@piskorski.com>
References: <47B5CFC9.6060105@scalableinformatics.com>
	<20080215235749.GA60161@piskorski.com>
Message-ID: <2C491310-D8EB-46BC-80A1-FBFF8A10FE80@sanger.ac.uk>


On 15 Feb 2008, at 11:57 pm, Andrew Piskorski wrote:

> Those are good hints...
>
> Debian and Ubuntu have excellent package management functionality and
> and repositories (as good or better than any other major Linux or Unix
> distribution, AFAIK), but strangely, by default they have no
> consistent API or command set for using it.  Therefore, I recommend
> trying out wajig:
>
>  sudo apt-get install wajig
>  http://www.togaware.com/linux/survivor/Wajig_Overview.html
>
> It's really just a wrapper (in Python) around all the same underlying
> command-line tools, but in my limited use of it so far, it seems
> noticeably more convenient than the traditional bizarre, non- 
> orthogonal
> mishmash of apt-get, dpkg, apt-cache, apt-file, etc.
> (I don't know if/how it compares to aptitude, I've never used that.)

aptitude has pretty much the same CLI as apt-get, and all of the APT  
family are fairly consistent.  dpkg is different, of course, but then  
it's much older.  It was also written by Ian Jackson, who's a very  
nice guy (and a friend of mine) but who has a tendency to write user  
interfaces which don't correspond to the brains of most people.  :-)   
Just kidding, Ian, if you read this.  dpkg, actually, I find quite  
nice (nicer than rpm anyway, but that's probably just because I'm used  
to it).  dselect, Ian's wrapper around dpkg, really is a UI nightmare  
though, and should really be quietly taken out into the woods and  
shot.  I think there are plans to remove it from Debian, but it hasn't  
happened yet.  dselect's hideousness is a large part of why aptitude  
exists.

> The Debian package management tools don't seem to have any sane
> programming API, but so far I haven't really needed that (and wajig
> manages to do without).

There are some perl modules, I think, but they're not well  
documented.  I've never needed them either.

>
> Their only other major flaw that I'm aware of, is that, just like all
> the rpm based tools, you can only have one single version of a binary
> package installed at a time (yuck!).

That's not true.  Look at the gcc or emacs packages.  Or automake.  Or  
autoconf.  Many, many versions, all of which can coexist on your  
system at the same time, and you can use update-alternatives to choose  
which one is the default.  The trick is that the packages have to have  
different names, so in the case of gcc there are gcc-3.3, gcc-3.4,  
gcc-4.1 etc.  In the more obvious cases where people are going to want  
to have the choice, this has already been done.  Of course there are  
plenty of cases where it hasn't, and then yes, you're right, you can't  
have more than one at once.  But Debian does have the infrastructure  
to support it.


>  Perhaps one day, the sort of
> tools the NixOS and DragonFly BSD folks are working on will fix that.
>
>  http://nix.cs.uu.nl/index.html
>  http://lambda-the-ultimate.org/node/2176
>  http://www.dragonflybsd.org/docs/goals.shtml#packages
>
> -- 
> Andrew Piskorski <atp at piskorski.com>
> http://www.piskorski.com/
>
>


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From nixon at nsc.liu.se  Mon Feb 18 03:10:36 2008
From: nixon at nsc.liu.se (Leif Nixon)
Date: Mon, 18 Feb 2008 12:10:36 +0100
Subject: [Beowulf] Re: wajig for Ubuntu/Debian package management
In-Reply-To: <2C491310-D8EB-46BC-80A1-FBFF8A10FE80@sanger.ac.uk> (Tim Cutts's
	message of "Sat\, 16 Feb 2008 08\:44\:13 +0000")
References: <47B5CFC9.6060105@scalableinformatics.com>
	<20080215235749.GA60161@piskorski.com>
	<2C491310-D8EB-46BC-80A1-FBFF8A10FE80@sanger.ac.uk>
Message-ID: <m33arqed03.fsf@unna.nsc.liu.se>

Tim Cutts <tjrc at sanger.ac.uk> writes:

> On 15 Feb 2008, at 11:57 pm, Andrew Piskorski wrote:
>
>> Their only other major flaw that I'm aware of, is that, just like all
>> the rpm based tools, you can only have one single version of a binary
>> package installed at a time (yuck!).
>
> That's not true.  Look at the gcc or emacs packages.  Or automake.  Or
> autoconf.  Many, many versions, all of which can coexist on your
> system at the same time, and you can use update-alternatives to choose
> which one is the default.  The trick is that the packages have to have
> different names, so in the case of gcc there are gcc-3.3, gcc-3.4,
> gcc-4.1 etc.

But that's not the same package, then. Hm, can you have a package
depend on "any version of gcc", without listing the various gcc
packages explicitly?

Andrew, which tools don't let you have more than one version of an rpm
installed?

-- 
Leif Nixon                       -            Systems expert
------------------------------------------------------------
National Supercomputer Centre    -      Linkoping University
------------------------------------------------------------


From tjrc at sanger.ac.uk  Mon Feb 18 04:26:29 2008
From: tjrc at sanger.ac.uk (Tim Cutts)
Date: Mon, 18 Feb 2008 12:26:29 +0000
Subject: [Beowulf] Re: wajig for Ubuntu/Debian package management
In-Reply-To: <m33arqed03.fsf@unna.nsc.liu.se>
References: <47B5CFC9.6060105@scalableinformatics.com>
	<20080215235749.GA60161@piskorski.com>
	<2C491310-D8EB-46BC-80A1-FBFF8A10FE80@sanger.ac.uk>
	<m33arqed03.fsf@unna.nsc.liu.se>
Message-ID: <7FCE2E1D-824D-47F4-A3B6-28240B9E27D2@sanger.ac.uk>


On 18 Feb 2008, at 11:10 am, Leif Nixon wrote:

> Tim Cutts <tjrc at sanger.ac.uk> writes:
>
>> On 15 Feb 2008, at 11:57 pm, Andrew Piskorski wrote:
>>
>>> Their only other major flaw that I'm aware of, is that, just like  
>>> all
>>> the rpm based tools, you can only have one single version of a  
>>> binary
>>> package installed at a time (yuck!).
>>
>> That's not true.  Look at the gcc or emacs packages.  Or automake.   
>> Or
>> autoconf.  Many, many versions, all of which can coexist on your
>> system at the same time, and you can use update-alternatives to  
>> choose
>> which one is the default.  The trick is that the packages have to  
>> have
>> different names, so in the case of gcc there are gcc-3.3, gcc-3.4,
>> gcc-4.1 etc.
>
> But that's not the same package, then. Hm, can you have a package
> depend on "any version of gcc", without listing the various gcc
> packages explicitly?

Yes.  That's what "Provides" does in a package's control file.  So,  
for example, postfix, exim and sendmail all provide "mail-transport- 
agent".  Anything that needs an MTA then depends on "mail-transport- 
agent" rather than a specific MTA package.  The same thing happens for  
C compilers.  If you to aptitude show for gcc-4.1, for example:

Package: gcc-4.1
New: yes
State: installed
Automatically installed: yes
Version: 4.1.1-21
Priority: optional
Section: devel
Maintainer: Debian GCC Maintainers <debian-gcc at lists.debian.org>
Uncompressed Size: 1323k
Depends: gcc-4.1-base (= 4.1.1-21), cpp-4.1 (= 4.1.1-21), binutils (>=  
2.16.1cvs20051214), libgcc1 (>= 1:4.1.1-21), libssp0, libc6 (>= 2.3.6-6)
Recommends: libc6-dev (>= 2.3.6-7), libmudflap0-dev (>= 4.1.1-21)
Suggests: gcc-4.1-doc (>= 4.1.1), gcc-4.1-locales (>= 4.1.1), libc6- 
dev-amd64, lib64gcc1 (>= 1:4.1.1-21), lib64ssp0
Conflicts: gcj-4.1 (< 4.1.1), libssp0-dev (< 4.1.1-6)
Replaces: gcj-4.1 (< 4.1.1), libssp0-dev (< 4.1.1-6)
Provides: c-compiler, libssp0-dev
Description: The GNU C compiler
  This is the GNU C compiler, a fairly portable optimizing compiler  
for C.

Tags: devel::compiler, devel::lang:c, implemented-in::c,  
interface::commandline, role::program, suite::gnu, works- 
with::software:source

You can see it provides a virtual package "c-compiler", and that's  
what something should depend on if it doesn't actually care which gcc  
package is installed on the machine, as long as there is one.

Tim


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From nixon at nsc.liu.se  Mon Feb 18 05:24:31 2008
From: nixon at nsc.liu.se (Leif Nixon)
Date: Mon, 18 Feb 2008 14:24:31 +0100
Subject: [Beowulf] Re: wajig for Ubuntu/Debian package management
In-Reply-To: <7FCE2E1D-824D-47F4-A3B6-28240B9E27D2@sanger.ac.uk> (Tim Cutts's
	message of "Mon\, 18 Feb 2008 12\:26\:29 +0000")
References: <47B5CFC9.6060105@scalableinformatics.com>
	<20080215235749.GA60161@piskorski.com>
	<2C491310-D8EB-46BC-80A1-FBFF8A10FE80@sanger.ac.uk>
	<m33arqed03.fsf@unna.nsc.liu.se>
	<7FCE2E1D-824D-47F4-A3B6-28240B9E27D2@sanger.ac.uk>
Message-ID: <m3hcg6cs8g.fsf@unna.nsc.liu.se>

Tim Cutts <tjrc at sanger.ac.uk> writes:

> On 18 Feb 2008, at 11:10 am, Leif Nixon wrote:
>
>> But that's not the same package, then. Hm, can you have a package
>> depend on "any version of gcc", without listing the various gcc
>> packages explicitly?
>
> Yes.  That's what "Provides" does in a package's control file.  So,
> for example, postfix, exim and sendmail all provide "mail-transport- 
> agent".

Ah, of course. Same thing in rpm-land.

One final question: is it possible to verify the integrity of
installed files? (For example, I can run "rpm -V glibc" to check the
md5 sum of all files in the glibc package against the package
database.)

-- 
Leif Nixon                       -            Systems expert
------------------------------------------------------------
National Supercomputer Centre    -      Linkoping University
------------------------------------------------------------


From tod at gust.sr.unh.edu  Mon Feb 18 15:21:22 2008
From: tod at gust.sr.unh.edu (Tod Hagan)
Date: Mon, 18 Feb 2008 18:21:22 -0500
Subject: [Beowulf] Re: wajig for Ubuntu/Debian package management
In-Reply-To: <m3hcg6cs8g.fsf@unna.nsc.liu.se>
References: <47B5CFC9.6060105@scalableinformatics.com>
	<20080215235749.GA60161@piskorski.com>
	<2C491310-D8EB-46BC-80A1-FBFF8A10FE80@sanger.ac.uk>
	<m33arqed03.fsf@unna.nsc.liu.se>
	<7FCE2E1D-824D-47F4-A3B6-28240B9E27D2@sanger.ac.uk>
	<m3hcg6cs8g.fsf@unna.nsc.liu.se>
Message-ID: <1203376882.30515.10.camel@trop.sr.unh.edu>

On Mon, 2008-02-18 at 14:24 +0100, Leif Nixon wrote:
> One final question: is it possible to verify the integrity of
> installed files? (For example, I can run "rpm -V glibc" to check the
> md5 sum of all files in the glibc package against the package
> database.)

Yes, with a utility called debsums that's a separate package.

Tod

-- 
Tod Hagan
Information Technologist
AIRMAP/Climate Change Research Center
Institute for the Study of Earth, Oceans, and Space
University of New Hampshire
Durham, NH 03824
Phone: 603-862-3116


From xclski at yahoo.com  Sat Feb 16 17:00:32 2008
From: xclski at yahoo.com (Ellis Wilson)
Date: Sat, 16 Feb 2008 17:00:32 -0800 (PST)
Subject: [Beowulf] High Performance SSH/SCP
Message-ID: <813403.75808.qm@web37910.mail.mud.yahoo.com>

Prentice Bisbal wrote:
> Yes, I know that professors usually tell some green
graduate student to
> go build a cluster for the dept, but that's a
completely different topic
> outside the scope of this list...

I think you will find a number of persons on this list
hold the 
definition of the Beowulf to be quite different than
the elitist concept 
you propose.  While having extensive knowledge and
experience will 
necessarily produce a much more efficient and well
designed cluster, the 
acquisition of such is only possible through a
starting point.  What 
better starting point than with commodity hardware and
free and open 
software?  Such are fundamentals of the Beowulf.

Ellis


      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping


From cbergstrom at netsyncro.com  Sun Feb 17 02:00:43 2008
From: cbergstrom at netsyncro.com (C. =?ISO-8859-1?Q?Bergstr=F6m?=)
Date: Sun, 17 Feb 2008 11:00:43 +0100
Subject: [Beowulf] JVM Clustering
In-Reply-To: <200802011744.55007.kilian@stanford.edu>
References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A683@mtiexch01.mti.com>
	<47A374CF.9030901@obs.unige.ch>
	<Pine.LNX.4.64.0802011457030.32049@coffee.psychology.mcmaster.ca>
	<200802011744.55007.kilian@stanford.edu>
Message-ID: <1203242443.17194.361.camel@chaos>

Hi all..

First post to the list so lets hope I get this right...

With the recent discussion about the relatively low cost of SDR IB it
made me wonder if I could solve these two very broad goals

1) Distributed jvm to run very large reports/statistical analysis
2) load balancing/hpc type solution for the app container

Based on my initial searching online I've found jessica2 [1] which seems
to solve the most difficult parts, but certainly open to
ideas/suggestions on better approaches.  I'm also somewhat familiar with
tc (terracotta) which will mostly do what I need. (When active-active
support comes out later this year.)  I do feel much more comfortable
merging the C code of jessica2 against the latest version of kaffe a lot
more than trying to debug java bytecode..  Something like jessica2 also
has long term application maintenance benefits over tc.

----
I'm also interested to hear what others are doing for inexpensive/power
efficient nodes that have pci-e, but this is probably a future
discussion.
----

I just want to send a very big and warm thanks to all those that make
this list so great!


/Christopher


[1] http://i.cs.hku.hk/~clwang/projects/JESSICA2.html


From rapier at psc.edu  Mon Feb 18 10:39:33 2008
From: rapier at psc.edu (Chris Rapier)
Date: Mon, 18 Feb 2008 13:39:33 -0500
Subject: [Beowulf] Re: High Performance SSH/SCP
In-Reply-To: <47B49F3D.3060307@psc.edu>
References: <E1JPMqc-00034j-Hf@mendel.bio.caltech.edu>	<Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>	<20080214130116.0bff9b7d@localhost.localdomain>
	<Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
	<47B49F3D.3060307@psc.edu>
Message-ID: <47B9D0E5.5030302@psc.edu>

Hi,

I'm Chris Rapier, the PI and lead on the HPN-SSH project.

As Ben Bennett said,
> A colleague just pointed me to this thread, I'll try to keep an eye on 
> it if there are any questions, or feel free to contact hpn-ssh at psc.edu

I wanted to address an issue that has been brought up a couple of times. 
Many people have asked when the HPN patches, or a subset of them, will 
be making its way into OpenSSH. Honestly, I really can't tell you. I've 
been trying to work with them for some years now but, for the most part, 
they've declined to incorporate any of the patches we've offered. To be 
perfectly honest, I don't see them taking that step without some greater 
motivation that I can personally provide.

We've never had a desire to fork our code from them. We simply don't 
have the resources something like that would require. Instead we like to 
think of ourselves as gently spooning up against OpenSSH. We'll keep our 
patches available for people and let them make their own choices. In the 
near future we do hope to have some package distributions available for 
easy downloading.

Also, I want to point out that we wrote HPN-SSH for this community. If 
you have any comments, suggestions, or critiques *please* let us know.

Chris Rapier
PSC


From oborbria at physics.isu.edu  Mon Feb 18 12:13:16 2008
From: oborbria at physics.isu.edu (Brian Oborn)
Date: Mon, 18 Feb 2008 13:13:16 -0700
Subject: [Beowulf] Re: wajig for Ubuntu/Debian package management
In-Reply-To: <m3hcg6cs8g.fsf@unna.nsc.liu.se>
References: <47B5CFC9.6060105@scalableinformatics.com>	<20080215235749.GA60161@piskorski.com>	<2C491310-D8EB-46BC-80A1-FBFF8A10FE80@sanger.ac.uk>	<m33arqed03.fsf@unna.nsc.liu.se>	<7FCE2E1D-824D-47F4-A3B6-28240B9E27D2@sanger.ac.uk>
	<m3hcg6cs8g.fsf@unna.nsc.liu.se>
Message-ID: <47B9E6DC.4000508@physics.isu.edu>

>
> One final question: is it possible to verify the integrity of
> installed files? (For example, I can run "rpm -V glibc" to check the
> md5 sum of all files in the glibc package against the package
> database.
The debsums package does this 
(http://packages.debian.org/stable/admin/debsums). I don't think it's as 
comprehensive as rpm -V, but if you're really concerned you should be 
using tripwire or the like.

Brian Oborn


From pascal.charest at gmail.com  Mon Feb 18 12:58:10 2008
From: pascal.charest at gmail.com (Pascal Charest)
Date: Mon, 18 Feb 2008 15:58:10 -0500
Subject: Fwd: [Beowulf] Re: wajig for Ubuntu/Debian package management
In-Reply-To: <ba4e02df0802181257j65ab13dfpdebdf8302735c203@mail.gmail.com>
References: <47B5CFC9.6060105@scalableinformatics.com>
	<20080215235749.GA60161@piskorski.com>
	<2C491310-D8EB-46BC-80A1-FBFF8A10FE80@sanger.ac.uk>
	<m33arqed03.fsf@unna.nsc.liu.se>
	<7FCE2E1D-824D-47F4-A3B6-28240B9E27D2@sanger.ac.uk>
	<m3hcg6cs8g.fsf@unna.nsc.liu.se>
	<ba4e02df0802181257j65ab13dfpdebdf8302735c203@mail.gmail.com>
Message-ID: <ba4e02df0802181258h1353070alddd9fbdd2ee5d69@mail.gmail.com>

Jeez, still forgot to reply to the mailling list. Sorry.


---------- Forwarded message ----------
From: Pascal Charest <pascal.charest at gmail.com>
Date: Feb 18, 2008 3:57 PM
Subject: Re: [Beowulf] Re: wajig for Ubuntu/Debian package management
To: Leif Nixon <nixon at nsc.liu.se>


> Ah, of course. Same thing in rpm-land.
>
> One final question: is it possible to verify the integrity of
> installed files? (For example, I can run "rpm -V glibc" to check the
> md5 sum of all files in the glibc package against the package
> database.)
>
>

Hi,

I guess this would be a good cheat sheet for going from apt to rpm (or
vice-versa): http://nakedape.cc/wiki/PackageManagerCheatsheet

Well... not so good since it doesn't answer your last question. I
guess that verifying the validity of installed files of a specific
installed .deb package is best done through the "debsums" command
(from the package of the same name.

Here is a link to the debian repository :
http://packages.debian.org/stable/admin/debsums

Pascal

--
Pascal Charest, Free software consultant (GNU/linux)
http://blog.pacharest.com


From bjtstarks at gmail.com  Tue Feb 19 07:00:02 2008
From: bjtstarks at gmail.com (Berkley Starks)
Date: Tue, 19 Feb 2008 08:00:02 -0700
Subject: [Beowulf] Re: Setting up a new Beowulf cluster
In-Reply-To: <E1JPh7T-0003ZQ-RJ@mendel.bio.caltech.edu>
References: <E1JPh7T-0003ZQ-RJ@mendel.bio.caltech.edu>
Message-ID: <5721d9d70802190700m2a356e5fo3d637308e2d9a34d@mail.gmail.com>

Thank you all for the help and support here.  With what has been presented
here, and sound considerations, we have decided on a home for our Beowulf
cluster.  The room is already sound proofed, and well air conditioned.  As
for people worrying about noise, it will be housed with out vacuum chamber,
so those going into the room and doing stuff are already used to a little
bit of noise.

The floor is rated to hold more than enough computers and the AC in there is
phenomenal.  I just finished meeting with campus physical facilities the
other day and have got the budget requestioned and approved to allow us
independent AC control of the room.

Right now we are seeing how much money can be appropriated for the actual
construction of the cluster.

Thank you all so much for your input and support so far.  It has helped a
lot.

Berkley Starks

On Feb 14, 2008 9:39 AM, David Mathog <mathog at caltech.edu> wrote:

> Jim Lux <James.P.Lux at jpl.nasa.gov> wrote:
>
> > >>quiet down a rack because to first order sound insulation == heat
> > >>insulation. \
> >
> > Actually, no.. good acoustic isolation is not good thermal
> > isolation.  Sure, things like fiberglass batts provide thermal
> > insulation and also (slightly) attenuate high frequencies.
>
> I guess I should have used => or some other "implies".  Sound insulators
> tend to be good heat insulators, heat insulators are generally not good
> sound insulators.
>
> I spent way too long trying to quiet down a rack when it had to live in
> a classroom.  Mass loaded vinyl on all 4 sides worked fairly well
> to stop the noise coming out that way, but then it just turned into a
> big speaker enclosure and directed nearly as much sound out the fan
> holes, where it bounced off the ceiling and floor.  And the rack exhaust
> fans (2 very high capacity 120mm fans on the top) were not able to keep
> it cool when it was fully sound insulated.  The rated capacity
> of those two fans was more than the sum of all the little ones in the
> nodes, but the air flow was too restricted, I think mostly by the narrow
> space between the node's front panels and the front insulator panel.
> Thankfully it finally moved to a machine room and the noise problem went
> away.
>
> Anyway, it is a much easier to sound insulate a room than it is a single
> noisy rack.
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080219/96411c01/attachment.html>

From mkinet at ulb.ac.be  Wed Feb 20 02:26:06 2008
From: mkinet at ulb.ac.be (Maxime Kinet)
Date: Wed, 20 Feb 2008 11:26:06 +0100
Subject: [Beowulf] need for an advice on nfs and diskless clients
Message-ID: <93756E65-EAFF-497D-9F17-1ABB095357D7@ulb.ac.be>

Hi,

I'm setting up  a cluster, made of diskless workstations, booting  
through NFS. For that purpose I created a copy of the node's  
filesystem in a /tftpboot directory. That is, I have a full filesytem  
for each single node : /tftpboot/192.168.1.1, /tftpboot/192.168.1.2, / 
tftpboot/192.168.1.3, etc.

I know that there are several different options to make several  
clients mounting their filesystems, and that this one might not be the  
most efficient one, but I choosed that one just because I knew already  
how to do it. Now everything works fine, except that I would like to  
spare some diskspace on the master partition. Since each filesystem is  
around 500 Mb, I quickly reach more than 10Gb just for the /tftpboot  
directory.

A quick solution I founded is to mount the largests directory (which  
are the same for each node) of the / (namely /lib and /var) from a  
separate common directory. Could this cause some problem in the way  
the node's are working? Is there any other easy/fast way to reduce the  
amount of data contained in the /tftpboot/192.168.1.* directories?

I also heard about ClusterNFS which would do exactly what I need, but  
then I would have to start everything from the beggining and I want  
everything to work as soon as possible.

Thanks a lot for answers/advices,

------------------
Maxime Kinet
Universit? Libre de Bruxelles
Physique Statistique et Plasmas, CP 231
Campus Plaine - Boulevard du Triomphe,
1050 Bruxelles.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080220/49483017/attachment.html>

From balahindustani at gmail.com  Wed Feb 20 03:10:48 2008
From: balahindustani at gmail.com (bala)
Date: Wed, 20 Feb 2008 16:40:48 +0530
Subject: [Beowulf] Using LSF
Message-ID: <d02797620802200310w4a513492h171edb0e3843357c@mail.gmail.com>

Hi all,

I am stuck with a problem related to LSF.

I want to submit a MPI job with the critera the process with
rank 1 should start on node2,
rank 2 should start on node 4,
rank 3 should start on node 1 and so on.
This order is pretty much random.

Accomplishing this was not a problem till we didnot have a constraint to
submit the jobs through a job scheduler (LSF in this case). We used the
machinefile option with mpirun to order the nodes on which the processes has
to be started.

But i am not able to do this with the current setup where LSF is used for
scheduling and SLURM for resource management.
I have tried a few of the options like using the -m options to bsub for
specifying the preference and so on. But of no success.

Any help is much appreciated.

-- 
Best Regards,
Balamurugan. R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080220/dcc62164/attachment.html>

From lindahl at pbm.com  Wed Feb 20 13:14:30 2008
From: lindahl at pbm.com (Greg Lindahl)
Date: Wed, 20 Feb 2008 13:14:30 -0800
Subject: [Beowulf] Re: High Performance SSH/SCP
In-Reply-To: <47B9D0E5.5030302@psc.edu>
References: <Pine.LNX.4.64.0802131644450.25631@coffee.psychology.mcmaster.ca>
	<6.2.3.4.2.20080213150355.03131fe8@mail.jpl.nasa.gov>
	<60169.192.168.1.1.1202952427.squirrel@mail.eadline.org>
	<Pine.LNX.4.64.0802140606170.5266@cain.rgb.private.net>
	<Pine.LNX.4.64.0802141029320.19893@coffee.psychology.mcmaster.ca>
	<Pine.LNX.4.64.0802141232330.6565@cain.rgb.private.net>
	<20080214130116.0bff9b7d@localhost.localdomain>
	<Pine.LNX.4.64.0802141403080.22547@coffee.psychology.mcmaster.ca>
	<47B49F3D.3060307@psc.edu> <47B9D0E5.5030302@psc.edu>
Message-ID: <20080220211430.GA15039@bx9.net>

On Mon, Feb 18, 2008 at 01:39:33PM -0500, Chris Rapier wrote:

> In the near future we do hope to have some package distributions
> available for easy downloading.

I'm looking forward to that day -- while it is extra work, it will
increase usage quite a bit.

-- greg


From peter.st.john at gmail.com  Wed Feb 20 13:24:56 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Wed, 20 Feb 2008 16:24:56 -0500
Subject: [Beowulf] need for an advice on nfs and diskless clients
In-Reply-To: <93756E65-EAFF-497D-9F17-1ABB095357D7@ulb.ac.be>
References: <93756E65-EAFF-497D-9F17-1ABB095357D7@ulb.ac.be>
Message-ID: <e4d4fd070802201324u782e11a5s55292b1570e09acf@mail.gmail.com>

Maxime,
Would it be feasible to use "ln" to create symbolic links to populate
/tftproot/192.168.1.N, for each node N,  from say /tftproot/generic ?
Peter

On Wed, Feb 20, 2008 at 5:26 AM, Maxime Kinet <mkinet at ulb.ac.be> wrote:

> Hi,
> I'm setting up  a cluster, made of diskless workstations, booting through
> NFS. For that purpose I created a copy of the node's filesystem in a
> /tftpboot directory. That is, I have a full filesytem for each single node :
> /tftpboot/192.168.1.1, /tftpboot/192.168.1.2, /tftpboot/192.168.1.3, etc.
>
> I know that there are several different options to make several clients
> mounting their filesystems, and that this one might not be the most
> efficient one, but I choosed that one just because I knew already how to do
> it. Now everything works fine, except that I would like to spare some
> diskspace on the master partition. Since each filesystem is around 500 Mb, I
> quickly reach more than 10Gb just for the /tftpboot directory.
>
> A quick solution I founded is to mount the largests directory (which are
> the same for each node) of the / (namely /lib and /var) from a separate
> common directory. Could this cause some problem in the way the node's are
> working? Is there any other easy/fast way to reduce the amount of data
> contained in the /tftpboot/192.168.1.* directories?
>
> I also heard about ClusterNFS which would do exactly what I need, but then
> I would have to start everything from the beggining and I want everything to
> work as soon as possible.
>
> Thanks a lot for answers/advices,
>
>  ------------------
> Maxime Kinet
> Universit? Libre de Bruxelles
> Physique Statistique et Plasmas, CP 231
> Campus Plaine - Boulevard du Triomphe,
> 1050 Bruxelles.
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080220/8077f20f/attachment.html>

From landman at scalableinformatics.com  Wed Feb 20 13:27:47 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Wed, 20 Feb 2008 16:27:47 -0500
Subject: [Beowulf] need for an advice on nfs and diskless clients
In-Reply-To: <93756E65-EAFF-497D-9F17-1ABB095357D7@ulb.ac.be>
References: <93756E65-EAFF-497D-9F17-1ABB095357D7@ulb.ac.be>
Message-ID: <47BC9B53.1030203@scalableinformatics.com>

Greetings Maxime:

Maxime Kinet wrote:

> I know that there are several different options to make several clients 
> mounting their filesystems, and that this one might not be the most 
> efficient one, but I choosed that one just because I knew already how to 
> do it. Now everything works fine, except that I would like to spare some 
> diskspace on the master partition. Since each filesystem is around 500 
> Mb, I quickly reach more than 10Gb just for the /tftpboot directory.

Have you looked at using UnionFS or AuFS to make a single shared file 
system for the nodes, with a writable/localized component?  I presume 
that most of the nodes will share most of the (static) files.  /var and 
a few others (/proc /sys /etc ...) should likely be localized, but this 
is fairly easy to do with UnionFS/AuFS.

Also, have you looked at Perceus?  They have a good system for running 
out of a RAM disk, which you can leverage here as well.  There are some 
diskless based cluster efforts that you could also explore (onesis).


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From john.hearns at streamline-computing.com  Wed Feb 20 15:04:46 2008
From: john.hearns at streamline-computing.com (John Hearns)
Date: Wed, 20 Feb 2008 23:04:46 +0000
Subject: [Beowulf] need for an advice on nfs and diskless clients
In-Reply-To: <93756E65-EAFF-497D-9F17-1ABB095357D7@ulb.ac.be>
References: <93756E65-EAFF-497D-9F17-1ABB095357D7@ulb.ac.be>
Message-ID: <1203548696.8805.2.camel@Vigor13>

On Wed, 2008-02-20 at 11:26 +0100, Maxime Kinet wrote:
> Hi,
> 
> 
> I'm setting up  a cluster, made of diskless workstations, booting
> through NFS. For that purpose I created a copy of the node's
> filesystem in a /tftpboot directory. That is, I have a full filesytem
> for each single
> node : /tftpboot/192.168.1.1, /tftpboot/192.168.1.2, /tftpboot/192.168.1.3, 

Why?
Why do you need a separate filesystem for every node?
The approach is to have the SAME filesystem for every node.


> 
> ------------------
> Maxime Kinet
> Universit? Libre de Bruxelles
> Physique Statistique et Plasmas, CP 231

Do you intend to be at the FOSDEM conference this weekend at ULB?
If anyone else on the Beowulf list is there, see you at the Delirium
Tremens on Friday night. And mine's a Leffe Brune.


From cbergstrom at netsyncro.com  Wed Feb 20 13:01:32 2008
From: cbergstrom at netsyncro.com (C. =?ISO-8859-1?Q?Bergstr=F6m?=)
Date: Wed, 20 Feb 2008 22:01:32 +0100
Subject: [Beowulf] JVM Clustering
In-Reply-To: <200802011744.55007.kilian@stanford.edu>
References: <9FA59C95FFCBB34EA5E42C1A8573784FE5A683@mtiexch01.mti.com>
	<47A374CF.9030901@obs.unige.ch>
	<Pine.LNX.4.64.0802011457030.32049@coffee.psychology.mcmaster.ca>
	<200802011744.55007.kilian@stanford.edu>
Message-ID: <1203541292.6611.180.camel@chaos>

Hi all..

First post to the list so lets hope I get this right...

With the recent discussion about the relatively low cost of SDR IB it
made me wonder if I could solve these two very broad goals

1) Distributed jvm to run very large reports/statistical analysis
2) load balancing/hpc type solution for the app container

Based on my initial searching online I've found jessica2 [1] which seems
to solve the most difficult parts, but certainly open to
ideas/suggestions on better approaches.  I'm also somewhat familiar with
tc (terracotta) which will mostly do what I need. (When active-active
support comes out later this year.)  I do feel much more comfortable
merging the C code of jessica2 against the latest version of kaffe a lot
more than trying to debug java bytecode..  Something like jessica2 also
has long term application maintenance benefits over tc.

----
I'm also interested to hear what others are doing for inexpensive/power
efficient nodes that have pci-e, but this is probably a future
discussion.
----

I just want to send a very big and warm thanks to all those that make
this list so great!


/Christopher


[1] http://i.cs.hku.hk/~clwang/projects/JESSICA2.html


From hahn at mcmaster.ca  Wed Feb 20 19:55:15 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Wed, 20 Feb 2008 22:55:15 -0500 (EST)
Subject: [Beowulf] Using LSF
In-Reply-To: <d02797620802200310w4a513492h171edb0e3843357c@mail.gmail.com>
References: <d02797620802200310w4a513492h171edb0e3843357c@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802202249001.23485@coffee.psychology.mcmaster.ca>

> submit the jobs through a job scheduler (LSF in this case). We used the
> machinefile option with mpirun to order the nodes on which the processes has
> to be started.
>
> But i am not able to do this with the current setup where LSF is used for
> scheduling and SLURM for resource management.
> I have tried a few of the options like using the -m options to bsub for
> specifying the preference and so on. But of no success.

this sounds like our HP-XC systems.  but I'm a bit mystified:
you can get the node assignment from LSF, and then use srun -m hostfile
to force slurm to set up the rank-node mappings as you like.
(note: not -m to LSF.)  did you try that?


From balahindustani at gmail.com  Wed Feb 20 20:55:59 2008
From: balahindustani at gmail.com (bala)
Date: Thu, 21 Feb 2008 10:25:59 +0530
Subject: [Beowulf] Using LSF
In-Reply-To: <Pine.LNX.4.64.0802202249001.23485@coffee.psychology.mcmaster.ca>
References: <d02797620802200310w4a513492h171edb0e3843357c@mail.gmail.com>
	<Pine.LNX.4.64.0802202249001.23485@coffee.psychology.mcmaster.ca>
Message-ID: <d02797620802202055q78edf91fye3d2df3008fb1cd6@mail.gmail.com>

On 2/21/08, Mark Hahn <hahn at mcmaster.ca> wrote:
>
> > submit the jobs through a job scheduler (LSF in this case). We used the
> > machinefile option with mpirun to order the nodes on which the processes
> has
> > to be started.
> >
> > But i am not able to do this with the current setup where LSF is used
> for
> > scheduling and SLURM for resource management.
> > I have tried a few of the options like using the -m options to bsub for
> > specifying the preference and so on. But of no success.
>
> this sounds like our HP-XC systems.  but I'm a bit mystified:
> you can get the node assignment from LSF, and then use srun -m hostfile
> to force slurm to set up the rank-node mappings as you like.
> (note: not -m to LSF.)  did you try that?
>

yes it is a HP-XC system and  I have tried using -m option to srun also.
*This is what I tried with a sample MPI Program that prints rank on node*

*#include "stdio.h"
#include "mpi.h"*

*int main(int argc, char *argv[]) {*

*int ierr,rank,size,len;
char name[100];*

*MPI_Init(&argc, &argv);*

*MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Get_processor_name(name,&len);*

*printf("This is %d out of %d: %s \n", rank,size,name);
MPI_Finalize();*

*return 0;*

*}*

This was submitted to LSF using

* bsub -n 4 -e errfile -ext "SLURM[nodelist=n2,n1,n4,n3]"
/opt/hpmpi/bin/mpirun -srun -m hostfile ./a.out*

The environment variable SLURM_HOSTFILE was set to the hostfile with the
nodes on which the binary had to be run in the order n2,n1,n4,n3.

I got the following error in my error file:

*a.out: MPI_Init: node to rank map is not correct myrank :0 mynode:1
a.out: MPI_Init: node to rank map is not correct myrank :1 mynode:0
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: Cannot set srun startup protocol
a.out: MPI_Init: node to rank map is not correct myrank :3 mynode:2
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: Cannot set srun startup protocol
a.out: MPI_Init: Cannot set srun startup protocol
srun: error: n2: task0: Exited with exit code 1
a.out: MPI_Init: node to rank map is not correct myrank :2 mynode:3
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: Cannot set srun startup protocol
srun: Terminating job*


-- 
Best Regards,
Balamurugan. R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080221/443b57f0/attachment.html>

From mkinet at ulb.ac.be  Thu Feb 21 00:57:50 2008
From: mkinet at ulb.ac.be (Maxime Kinet)
Date: Thu, 21 Feb 2008 09:57:50 +0100
Subject: [Beowulf] need for an advice on nfs and diskless clients
In-Reply-To: <e4d4fd070802201324u782e11a5s55292b1570e09acf@mail.gmail.com>
References: <93756E65-EAFF-497D-9F17-1ABB095357D7@ulb.ac.be>
	<e4d4fd070802201324u782e11a5s55292b1570e09acf@mail.gmail.com>
Message-ID: <1A8B4338-2DDF-4586-9619-BE172426B262@ulb.ac.be>

> Maxime,
> Would it be feasible to use "ln" to create symbolic links to  
> populate /tftproot/192.168.1.N, for each node N,  from say /tftproot/ 
> generic ?
I tried that trick but didn't work. Apparently, symbolic links are not  
enough to link files at boot time.

What I was wondering is, if the directories /tftpboot/192.168.1.N  
contains sub-directories that are indentical for every node, and that  
are not supposed to be modified by the nodes, why not put them in a  
directory /tftpboot/generic and make the node mount the it from there?  
The problem is I have no idea of which directories have to be local,  
and which could be shared...

> Peter


Thanks for the answer anyway.
------------------
Maxime Kinet
Universit? Libre de Bruxelles
Physique Statistique et Plasmas, CP 231
Campus Plaine - Boulevard du Triomphe,
1050 Bruxelles.


>
> On Wed, Feb 20, 2008 at 5:26 AM, Maxime Kinet <mkinet at ulb.ac.be>  
> wrote:
> Hi,
>
> I'm setting up  a cluster, made of diskless workstations, booting  
> through NFS. For that purpose I created a copy of the node's  
> filesystem in a /tftpboot directory. That is, I have a full  
> filesytem for each single node : /tftpboot/192.168.1.1, /tftpboot/ 
> 192.168.1.2, /tftpboot/192.168.1.3, etc.
>
> I know that there are several different options to make several  
> clients mounting their filesystems, and that this one might not be  
> the most efficient one, but I choosed that one just because I knew  
> already how to do it. Now everything works fine, except that I would  
> like to spare some diskspace on the master partition. Since each  
> filesystem is around 500 Mb, I quickly reach more than 10Gb just for  
> the /tftpboot directory.
>
> A quick solution I founded is to mount the largests directory (which  
> are the same for each node) of the / (namely /lib and /var) from a  
> separate common directory. Could this cause some problem in the way  
> the node's are working? Is there any other easy/fast way to reduce  
> the amount of data contained in the /tftpboot/192.168.1.* directories?
>
> I also heard about ClusterNFS which would do exactly what I need,  
> but then I would have to start everything from the beggining and I  
> want everything to work as soon as possible.
>
> Thanks a lot for answers/advices,
>
> ------------------
> Maxime Kinet
> Universit? Libre de Bruxelles
> Physique Statistique et Plasmas, CP 231
> Campus Plaine - Boulevard du Triomphe,
> 1050 Bruxelles.
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080221/17627052/attachment.html>

From forum.san at gmail.com  Fri Feb 22 00:23:04 2008
From: forum.san at gmail.com (Sangamesh B)
Date: Fri, 22 Feb 2008 13:53:04 +0530
Subject: [Beowulf] python2.4 error when loose MPICH2 TI with Grid Engine
Message-ID: <cb60cbc40802220023nda70946u214967ce5c74a8cd@mail.gmail.com>

Dear Reuti & members of beowulf,

I need to execute a parallel job thru grid engine.

MPICH2 is installed with Process Manager:mpd.

Added a parallel environment MPICH2 into SGE:

$ qconf -sp MPICH2
pe_name           MPICH2
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /share/apps/MPICH2/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args    /share/apps/MPICH2/stopmpi.sh
allocation_rule   $pe_slots
control_slaves    FALSE
job_is_first_task TRUE
urgency_slots     min


Added this PE to the default queue: all.q.

mpdboot is done. mpd's are running on two nodes.

The script for submitting this job thru sge  is:

$ cat subsamplempi.sh
#!/bin/bash

#$ -S /bin/bash

#$ -cwd

#$ -N Samplejob

#$ -q all.q

#$ -pe MPICH2 4

#$ -e ERR_$JOB_NAME.$JOB_ID

#$ -o OUT_$JOB_NAME.$JOB_ID

date

hostname

/opt/MPI_LIBS/MPICH2-GNU/bin/mpirun -np $NSLOTS -machinefile
$TMP_DIR/machines ./samplempi

echo "Executed"

exit 0


The job is getting submitted, but not executing. The error and output file
contain:

cat ERR_Samplejob.192
/usr/bin/env: python2.4: No such file or directory

$ cat OUT_Samplejob.192
-catch_rsh
/opt/gridengine/default/spool/compute-0-0/active_jobs/192.1/pe_hostfile
compute-0-0
compute-0-0
compute-0-0
compute-0-0
Fri Feb 22 12:57:18 IST 2008
compute-0-0.local
Executed

So the problem is coming for python2.4.

$ which python2.4
/opt/rocks/bin/python2.4

I googled this error. Then created a symbolic link:

# ln -sf /opt/rocks/bin/python2.4 /bin/python2.4

After this also same error is coming.

I guess the problem might be different. i.e. gridengine might not getting
the link to running mpd.

And the procedure followed by me to configure PE might be wrong.

So, I expect from you to clear my doubts and help me to resolve this error.

1. Is the PE configuration of MPICH2 + grid engine right?

2. Without Tight integration, is there  a way to run a MPICh2(mpd) based job
using gridengine?

3. In smpd-daemon based and daemonless MPICH2 tight integration, which one
is better?

4. Can we do mvapich2 tight integration with SGE? Any differences with
process managers wrt MVAPICH2?


Thanks & Best Regards,
Sangamesh B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080222/693356e3/attachment.html>

From jpkosky at sps.aero  Fri Feb 22 07:50:18 2008
From: jpkosky at sps.aero (John P. Kosky, PhD)
Date: Fri, 22 Feb 2008 10:50:18 -0500
Subject: [Beowulf] Three questions on a new Beowulf Cluster
Message-ID: <47BEEF3A.9020806@sps.aero>

My company is taking it's first foray into the world of HPC with an 
expandable architecture, 16 processor (comprised of quad core Opterons), 
one header node cluster using Infiniband interconnects. OS has 
tentatively been selected as SUSE 64-bit Linux. The principal purpose of 
the cluster is as a tool for spacecraft and propulsion design support. 
The cluster will therefore be running the most recent versions of 
commercially available software - initially for FEA and CFD using COMSOL 
Multiphysics and associated packages, NASTRAN, MatLab modules, as well 
as an internally modified and expanded commercial code for materials 
properties prediction,with emphasis on polymer modeling (Accelrys 
Materials Studio). Since we will be repetitively running standard 
modeling codes on this system, we are trying to make the system as user 
friendly as possible... most of our scientists and engineers want to use 
this as a tool, and not have to become cluster experts. The company WILL 
be hiring an IT Sys Admin with good cluster experience to support the 
system, however...

Question 1:
1) Does anyone here know of any issues that have arisen running the 
above named commercial packages on clusters using infiniband?

Question 2:
2) As far as the MPI for the system is concerned, for the system and 
application requirements described above, would OpenMPI or MvApich be 
better for managing node usage?

ANY help or advice would be greatly appreciated.

Thanks in advance

John

John P. Kosky, PhD
Director of Technical Development
Space Propulsion Systems


From landman at scalableinformatics.com  Sat Feb 23 19:45:24 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Sat, 23 Feb 2008 22:45:24 -0500
Subject: [Beowulf] Three questions on a new Beowulf Cluster
In-Reply-To: <47BEEF3A.9020806@sps.aero>
References: <47BEEF3A.9020806@sps.aero>
Message-ID: <47C0E854.1040300@scalableinformatics.com>

John P. Kosky, PhD wrote:
> My company is taking it's first foray into the world of HPC with an 
> expandable architecture, 16 processor (comprised of quad core Opterons), 
> one header node cluster using Infiniband interconnects. OS has 
> tentatively been selected as SUSE 64-bit Linux. The principal purpose of 
> the cluster is as a tool for spacecraft and propulsion design support. 
> The cluster will therefore be running the most recent versions of 
> commercially available software - initially for FEA and CFD using COMSOL 
> Multiphysics and associated packages, NASTRAN, MatLab modules, as well 
> as an internally modified and expanded commercial code for materials 
> properties prediction,with emphasis on polymer modeling (Accelrys 
> Materials Studio). Since we will be repetitively running standard 
> modeling codes on this system, we are trying to make the system as user 
> friendly as possible... most of our scientists and engineers want to use 

Could you elaborate on this a little?  Do you want your users not to use 
command lines to submit jobs, but web interfaces instead?  Or are 
command lines ok?  This is usally cited as what people mean by "user 
friendly".

> this as a tool, and not have to become cluster experts. The company WILL 
> be hiring an IT Sys Admin with good cluster experience to support the 
> system, however...
> 
> Question 1:
> 1) Does anyone here know of any issues that have arisen running the 
> above named commercial packages on clusters using infiniband?

Not all of them use the exact same version of MPI stack.  We have 
customers running similar mixes (Dyna, NASTRAN, Accelrys, ...), and the 
stacks vary somewhat.  Try to use a similar MPI stack throughout (quite 
a few CAE codes will use HP MPI or Intel MPI).  This may save you some 
grief.

In both those cases, the MPI stack is pretty smart about linking to the 
Infiniband, though make sure that the MPI stack will talk to the correct 
library in your IB stack (DAPL or verbs or ...).

We normally use OFED for our customers IB efforts.  Some of the MPI 
stacks are compiled against vendor specific versions of IB stacks.


> Question 2:
> 2) As far as the MPI for the system is concerned, for the system and 
> application requirements described above, would OpenMPI or MvApich be 
> better for managing node usage?

The applications dictate what they are compiled against for MPI library 
usage.  MPI stacks are not ABI compatible, you cannot run mvapich 
binaries with an OpenMPI stack.

Moreover, MPI does not aide in the management of nodes.  There are other 
packages for that.  Some of the better ones for system management are 
Perceus, Rocks, and a few others.  Since you are using SuSE, Rocks is 
out.  We have built SuSE based clusters for quite a few customers, 
though most of the major cluster packages really don't support it that 
well.  Currently working on a diskless SuSE system.  Have it working, 
though still a bit more work to do on other elements.  The diskful 
system works quite well, and is effectively automatic at this point.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From csamuel at vpac.org  Sat Feb 23 19:55:58 2008
From: csamuel at vpac.org (Chris Samuel)
Date: Sun, 24 Feb 2008 14:55:58 +1100 (EST)
Subject: [Beowulf] VMC - Virtual Machine Console
In-Reply-To: <47906A09.2040908@aplpi.com>
Message-ID: <14877064.261203769909878.JavaMail.csamuel@ubuntu>


----- "stephen mulcahy" <smulcahy at aplpi.com> wrote:

> As a further aside, some MPI libraries (OpenMPI comes to mind) seem to
> make some efforts to keep processes on the same cores also (or can be
> instructed to via a run-time option).

At one point MVAPICH2 would enable this by default
on AMD Opteron - but always started from core 0 when
binding processes.

So if you had an 8 core box and 2 x 4 CPU jobs were
run on it they'd both end up fighting over cores 0-3
whilst cores 4-7 sat in a corner twiddling their
thumbs and wondering why they had no work to do. :-(

It got even worse if you launched 4 x 2 CPU jobs..

-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


From csamuel at vpac.org  Sat Feb 23 19:56:01 2008
From: csamuel at vpac.org (Chris Samuel)
Date: Sun, 24 Feb 2008 14:56:01 +1100 (EST)
Subject: [Beowulf] VMC - Virtual Machine Console
In-Reply-To: <73a01bf20801201920q3ac7b647q335e1cd13bec4cdb@mail.gmail.com>
Message-ID: <7186353.241203769702934.JavaMail.csamuel@ubuntu>


----- "Rayson Ho" <raysonlogin at gmail.com> wrote:

> I am working on adding processor affinity support for serial and
> parallel jobs for Grid Engine, and I am working with the OpenMPI
> developers to define an interface.

FWIW the Torque approach (currently in trunk in SVN) is to
not use cpu affinity but instead use the cpuset support
in most modern Linux kernels.

So once you've got /dev/cpuset created and have mounted the
VFS with "mount -t cpuset - /dev/cpuset" the new pbs_mom
will automatically create (if it doesn't already exist)
a "torque" cpuset with all the CPUs in it.

It then creates job cpusets beneath that for each job
and a "vnode" (aka per-process) cpuset for each process
created.

So, on an 8 core box running a 4 CPU MPI job you'd
end up with:

/dev/cpuset/torque (8 cores)
/dev/cpuset/torque/1.cluster-m.foo.edu/ (4 cores)
/dev/cpuset/torque/1.cluster-m.foo.edu/1/ (1 core)
/dev/cpuset/torque/1.cluster-m.foo.edu/2/ (1 core)
/dev/cpuset/torque/1.cluster-m.foo.edu/3/ (1 core)
/dev/cpuset/torque/1.cluster-m.foo.edu/4/ (1 core)

SMP processes would end up in the job set whereas
processes launched via PBS's TM API would end up
in their appropriate vnode set.

So if a user launches what they think is a single
CPU serial job that actually turns out to be a code
that detects how many cores are in a system and then
uses all of them it will no longer affect other users
code on the system - their job will just take a hammering
instead! :-)

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


From forum.san at gmail.com  Sun Feb 24 02:53:02 2008
From: forum.san at gmail.com (Sangamesh B)
Date: Sun, 24 Feb 2008 16:23:02 +0530
Subject: [Beowulf] Implementation thru Sun Grid Engine
Message-ID: <cb60cbc40802240253o3cd5a6a6hc45312bdc33d55f7@mail.gmail.com>

Hi All,

  The following are the requirements, which should be implememted through
Sun Grid Engine:

1.    Q1 - One queue with all the cores (e.g, 2 nodes, 8 cores)
2.    Q2 & Q3 - 2 more queues under the large queue (6 cores & 2 cores)
3.    user1 & user2 allowed to submit job to Q2
4.    User3 & User4 allowed to submit job to Q3 only
5.    Also User1 is allowed to submit job in Q1. (Brustable)
6.    Can Application be only allowed to run through Q2?
7.    Can Application be allowed to run through Q3 & Q1?

The cluster has 2 systems(1 master + 1 node). Each is Dual core, Dual
Processor.
So 4 cores each.
Totally 8 cores.

In the following I explain, what I've done and what not..

Created queues:

q1: Hosts=2 Master+Node slots=4
q2: Hosts=2 Master+Node slots=3
q3: Hosts=1 Node slots=1

Users:
user1
user2
user3
user4

Usersets:
userset12
userset34

Also,
user1 & user2 belongs to userset12
user3 & user4 belongs to userset34


1.    Q1 - One queue with all the cores (e.g, 2 nodes, 8 cores)

    Created queue q1, with 2 nodes, slots=4

2.    Q2 & Q3 - 2 more queues under the large queue (6 cores & 2 cores)

    Mentioned above.
    How to do that these two queues should fall under q1, the large queue?

    May I know what are subordinate queues?

    If we make q2 and q3 subordinate to q1(q2 of 6 core & q3 of 2 core),
does it meet our requirement?

   If not, is it possible to do it in other way?


3.    user1 & user2 allowed to submit job to Q2

  I've given access  userset12 to q2. By this user1 and user2 can submit the
jobs to q2.

   and given userset34 as xuserset.

Now if user1 or user2 submit a parallel job of 6 mpi process, will it take 4
cores from master and 2 core from Node?

I tested it. 6 process job was not getting executed. But 3 process job got
executed.

The error is:
...
....

parallel environment:  mpiq2 range: 6
scheduling info:            has no permission for queue "
all.q at compute-0-0.local"
                            cannot run in queue instance "
q1 at compute-0-0.local" because it is not contained in its hard queue list
(-q)
                            has no permission for queue "
q3 at compute-0-0.local"
                            has no permission for queue "
san.q at compute-0-0.local"
                            has no permission for queue "
all.q at locuzcluster.local"
                            cannot run in queue instance "
q1 at locuzcluster.local" because it is not contained in its hard queue list
(-q)
                            has no permission for queue "
san.q at locuzcluster.local"
                            cannot run in PE "mpiq2" because it only offers
0 slots

Means, it is not running when mpi processes are more than 3.

May I know what went wrong here?


In case of serial jobs its working.If user1/user2 submits 6 serial jobs, 3
gets running on master and three on compute node. If a 7th job is submitted,
it will be queued & waiting and starts to run when the slot becomes free.

So the problem with parallel job has to be resolved.

4.    User3 & User4 allowed to submit job to Q3 only
         Given access to userset34 and xuserset=userset12

     If user3 or user4 submits a 2 process job, job gets submitted but
doesn't execute. Error is:

parallel environment:  mpiq34 range: 2
scheduling info:            has no permission for host "locuzcluster.local"
                            has no permission for host "compute-0-0.local"
                            cannot run in PE "mpiq34" because it only offers
0 slots


The PE config is as follows:

$ qconf -sp mpiq34
pe_name           mpiq34
slots             2
user_lists        userset34
xuser_lists       userset12
start_proc_args   /share/apps/MPICH2/startmpi.sh  -catch_rsh  $pe_hostfile
stop_proc_args    /share/apps/MPICH2/stopmpi.sh
allocation_rule   $fill_up
control_slaves    TRUE
job_is_first_task TRUE
urgency_slots     min

I'm not getting how to resolve this issue. There might be something wrong in
the settings.


5.    Also User1 is allowed to submit job in Q1. (Brustable)

   For this I've added user1 to the owner's list and userset34 as xuserset.

6.    Can Application be only allowed to run through Q2?
7.    Can Application be allowed to run through Q3 & Q1?

  Still these two has to be implemented.


Can anyone help me out to resolve above mentioned issues?


Thanks,
Sangamesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080224/efdbed3b/attachment.html>

From hahn at mcmaster.ca  Sun Feb 24 22:11:03 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 25 Feb 2008 01:11:03 -0500 (EST)
Subject: [Beowulf] Implementation thru Sun Grid Engine
In-Reply-To: <cb60cbc40802240253o3cd5a6a6hc45312bdc33d55f7@mail.gmail.com>
References: <cb60cbc40802240253o3cd5a6a6hc45312bdc33d55f7@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0802250110230.20963@coffee.psychology.mcmaster.ca>

> Can anyone help me out to resolve above mentioned issues?

isn't there a better list than this for an utterly SGE-specific question?


From hahn at mcmaster.ca  Sun Feb 24 22:16:54 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 25 Feb 2008 01:16:54 -0500 (EST)
Subject: [Beowulf] need for an advice on nfs and diskless clients
In-Reply-To: <1A8B4338-2DDF-4586-9619-BE172426B262@ulb.ac.be>
References: <93756E65-EAFF-497D-9F17-1ABB095357D7@ulb.ac.be>
	<e4d4fd070802201324u782e11a5s55292b1570e09acf@mail.gmail.com>
	<1A8B4338-2DDF-4586-9619-BE172426B262@ulb.ac.be>
Message-ID: <Pine.LNX.4.64.0802250111290.786@coffee.psychology.mcmaster.ca>

> /tftpboot/generic and make the node mount the it from there? The problem is I 
> have no idea of which directories have to be local, and which could be 
> shared...

depends.  you can definitely get by with only a few files in /var
being node-specific, since, for instance, compute nodes can all get
their specific IP configuration via dhcp (and clusters normally have 
compute nodes identically configured.)


From landman at scalableinformatics.com  Sun Feb 24 22:30:54 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Mon, 25 Feb 2008 01:30:54 -0500
Subject: [Beowulf] Implementation thru Sun Grid Engine
In-Reply-To: <Pine.LNX.4.64.0802250110230.20963@coffee.psychology.mcmaster.ca>
References: <cb60cbc40802240253o3cd5a6a6hc45312bdc33d55f7@mail.gmail.com>
	<Pine.LNX.4.64.0802250110230.20963@coffee.psychology.mcmaster.ca>
Message-ID: <47C2609E.50307@scalableinformatics.com>

Mark Hahn wrote:
>> Can anyone help me out to resolve above mentioned issues?
> 
> isn't there a better list than this for an utterly SGE-specific question?

users at gridengine.sunsource.net

SGE does have its own lists.  And wikis and howtos, etc. 
http://gridengine.sunsource.net .  You might find some familiar 
faces/names there ...

Chris D on this list is a guru for GE.  He runs the wiki.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From csamuel at vpac.org  Mon Feb 25 00:36:31 2008
From: csamuel at vpac.org (Chris Samuel)
Date: Mon, 25 Feb 2008 19:36:31 +1100 (EST)
Subject: [Beowulf] Re: Time limits in queues
In-Reply-To: <478F85A7.9040600@noaa.gov>
Message-ID: <13705010.01203867729049.JavaMail.csamuel@ubuntu>


----- "Craig Tierney" <Craig.Tierney at noaa.gov> wrote:

/* Just catching up on list email on a plane */

> First all I agree that it is always a YMMV case.  We good about that
> here (the list).

Indeed.

> My point was, that in every instance that I have seen, multi-day
> queue limits are not the norm.  Those places do have exceptions for
> particular codes and particular projects.

FWIW as another data point we limit our 600+ users to 3 months for
their job walltimes in Torque. 

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


From csamuel at vpac.org  Mon Feb 25 00:36:27 2008
From: csamuel at vpac.org (Chris Samuel)
Date: Mon, 25 Feb 2008 19:36:27 +1100 (EST)
Subject: [Beowulf] fast disks for fluent
In-Reply-To: <f8b14cb80801230702y3fe4a9baw485c5b1d972ecc8e@mail.gmail.com>
Message-ID: <15954272.21203868842137.JavaMail.csamuel@ubuntu>


----- "andrew holway" <andrew at moonet.co.uk> wrote:

> We were thinking of using tempfs to make fast scratch disks for
> fluent. Has anyone done this or any other method to ease the disk
> bottleneck.

We run with /tmp on a software RAID using either JFS (across
2 drives on our older SLES9 Power5 cluster) or XFS (across
4 drives on our newer CentOS 5 AMD Opteron cluster).

We're running mainline kernels under CentOS.

-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


From diep at xs4all.nl  Mon Feb 25 10:16:38 2008
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Mon, 25 Feb 2008 19:16:38 +0100
Subject: [Beowulf] Opinions of  Hyper-threading?
In-Reply-To: <47B32FB1.60905@berkeley.edu>
References: <47B32FB1.60905@berkeley.edu>
Message-ID: <67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>

Let's suppose you've inherited 3Ghz dual Xeon nodes and that the  
power costs get paid anyway.

Then the choice then is between:

without hyperthreading you've got
    2 cores @ 3Ghz

with hyperthreading you've got if you're lucky:
   2 cores @ 3Ghz which can split itself to
   4 cores @ 1.6Ghz

If you'd run 2 processes at each node, then there is 4 cores {A.1,A. 
2,B.1,B.2}
So from scheduling points seen there is a number of possibilities.

We can compress those possibilities.

{A.1,A.2}     2 x 1.6ghz
{A.1,B.1}    2  x 3Ghz
{A.1,B.2}   2 x  3Ghz

So odds is roughly 33% that you end up getting dicked as your total  
throughput is
in 33% of the cases 3.2Ghz instead of 6.0Ghz

Seymour Crays principle comes to mind.

Now there seems to exist software on planet earth that just needs a  
lot of throughput.

Like the LL/LLR type software, provided that the FFT size isn't too big.
You schedule 4 processes and it wins 5% in throughput compared to 2  
Xeons.
Not the predicted 20% nor 30%, but 5%.

Heep Heep Huray, Seymour Crays principle refuted.

So for software that just needs throughput and where you run that  
might be faster
under specific circumstance.

That's however very risky.

Therefore most likely, you want to turn off hyperthreading in hardware.

Vincent

p.s. it's nice if someone else pays your power bill, isn't it?

On Feb 13, 2008, at 6:58 PM, Jon Forrest wrote:

> I inherited a cluster containing a bunch
> of Xeon-based compute nodes. The compute
> nodes were configured with hyper-threading
> turned on. I'm wondering what you HPC cluster
> people think of hyper-threading. I haven't
> heard much about it recently since most
> modern processors are true multi-core.
>
> The main thing I'd like to know is whether
> hyper-threading can do any harm when cpu
> bound jobs are run.
>
> Cordially,
> -- 
> Jon Forrest
> Research Computing Support
> College of Chemistry
> 173 Tan Hall
> University of California Berkeley
> Berkeley, CA
> 94720-1460
> 510-643-1032
> jlforrest at berkeley.edu
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>


From geoff at galitz.org  Mon Feb 25 11:40:08 2008
From: geoff at galitz.org (Geoff Galitz)
Date: Mon, 25 Feb 2008 11:40:08 -0800 (PST)
Subject: [Beowulf] Opinions of  Hyper-threading?
In-Reply-To: <67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
Message-ID: <19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>


As a matter of habit, I usually disable hyper-threading whenever I run
across it.  However... due disclosure;  Jon works in my old facility and
may very well be referring to something I intalled many moons ago.

There are lots of reasons to disable HT, but one of the biggies is
resource contention.  Assuming HT is enabled to run multiple jobs (rather
than restricting the system to only run multiple peers or threads of the
same job), those jobs will likely be contending for disk and network I/O.

Really, this has been said in this thread already, but some clusters are
more network reliant than others and bears repeating. I always try to
configure scratch space locally on the node, but I often stored permanent
data one hop away via NFS.  NFS is not the most elegant of protocols in
applicationl. And consider that node with its NFS traffic may also be
running MPI jobs.

Another thing to consider is that while HT certainly still exists in the
wild, no new systems implement it, hence limited testing.  It is quite
possible that a regression introduced into the Linux kernel at a later
time could break HT support upon installation of that new kernel.


Just my two cents.

-geoff


> Let's suppose you've inherited 3Ghz dual Xeon nodes and that the
> power costs get paid anyway.
>
> Then the choice then is between:
>
> without hyperthreading you've got
>     2 cores @ 3Ghz
>
> with hyperthreading you've got if you're lucky:
>    2 cores @ 3Ghz which can split itself to
>    4 cores @ 1.6Ghz
>
> If you'd run 2 processes at each node, then there is 4 cores {A.1,A.
> 2,B.1,B.2}
> So from scheduling points seen there is a number of possibilities.
>
> We can compress those possibilities.
>
> {A.1,A.2}     2 x 1.6ghz
> {A.1,B.1}    2  x 3Ghz
> {A.1,B.2}   2 x  3Ghz
>
> So odds is roughly 33% that you end up getting dicked as your total
> throughput is
> in 33% of the cases 3.2Ghz instead of 6.0Ghz
>
> Seymour Crays principle comes to mind.
>
> Now there seems to exist software on planet earth that just needs a
> lot of throughput.
>
> Like the LL/LLR type software, provided that the FFT size isn't too big.
> You schedule 4 processes and it wins 5% in throughput compared to 2
> Xeons.
> Not the predicted 20% nor 30%, but 5%.
>
> Heep Heep Huray, Seymour Crays principle refuted.
>
> So for software that just needs throughput and where you run that
> might be faster
> under specific circumstance.
>
> That's however very risky.
>
> Therefore most likely, you want to turn off hyperthreading in hardware.
>
> Vincent
>
> p.s. it's nice if someone else pays your power bill, isn't it?
>
> On Feb 13, 2008, at 6:58 PM, Jon Forrest wrote:
>
>> I inherited a cluster containing a bunch
>> of Xeon-based compute nodes. The compute
>> nodes were configured with hyper-threading
>> turned on. I'm wondering what you HPC cluster
>> people think of hyper-threading. I haven't
>> heard much about it recently since most
>> modern processors are true multi-core.
>>
>> The main thing I'd like to know is whether
>> hyper-threading can do any harm when cpu
>> bound jobs are run.
>>
>> Cordially,
>> --
>> Jon Forrest
>> Research Computing Support
>> College of Chemistry
>> 173 Tan Hall
>> University of California Berkeley
>> Berkeley, CA
>> 94720-1460
>> 510-643-1032
>> jlforrest at berkeley.edu
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>


------------------------------
Geoff Galitz
Blankenheim, DE
http://www.galitz.org


From peter.st.john at gmail.com  Mon Feb 25 11:58:25 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Mon, 25 Feb 2008 14:58:25 -0500
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
Message-ID: <e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>

Just for a clarification (for those of us who are hardware deficient),
hyperthreading (the Intel mechanism) isn't actually simultaneous, correct? I
had assumed so but I appear to be confused about it. Hyperthreading keeps a
thread ready to take advantage of stalls in a preceeding thread, but doesn't
ever actually perform a second instruction in one click tick, correct? One
might think that there are so many pathways on a modern chip that collisions
could be managed among several simultaneous threads.
Thanks,
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080225/6ffb0430/attachment.html>

From bill at cse.ucdavis.edu  Mon Feb 25 12:12:18 2008
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Mon, 25 Feb 2008 12:12:18 -0800
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
References: <47B32FB1.60905@berkeley.edu>	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
Message-ID: <47C32122.1090505@cse.ucdavis.edu>

I believe it's actually simultaneous, instructions from 2 different processes 
can run in the same cycle against 2 different register files.

Other chips have vertical multithreading where only 1 process runs in any 
given cycle.

Peter St. John wrote:
> Just for a clarification (for those of us who are hardware deficient),
> hyperthreading (the Intel mechanism) isn't actually simultaneous, correct? I
> had assumed so but I appear to be confused about it. Hyperthreading keeps a
> thread ready to take advantage of stalls in a preceeding thread, but doesn't
> ever actually perform a second instruction in one click tick, correct? One
> might think that there are so many pathways on a modern chip that collisions
> could be managed among several simultaneous threads.
> Thanks,
> Peter
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From gdjacobs at gmail.com  Mon Feb 25 12:47:58 2008
From: gdjacobs at gmail.com (Geoff Jacobs)
Date: Mon, 25 Feb 2008 14:47:58 -0600
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <47C32122.1090505@cse.ucdavis.edu>
References: <47B32FB1.60905@berkeley.edu>	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu>
Message-ID: <47C3297E.3060200@gmail.com>

Bill Broadley wrote:
> I believe it's actually simultaneous, instructions from 2 different
> processes can run in the same cycle against 2 different register files.
> 
> Other chips have vertical multithreading where only 1 process runs in
> any given cycle.

Multiple threads residing in any particular stage of execution at any
particular time. If one thread stalls, the processor can proceed with
execution on the other thread.

Somewhat outside the bounds of this list, but would an in order
processor like Power 6 derive more benefit from SMT?

-- 
Geoffrey D. Jacobs


From hahn at mcmaster.ca  Mon Feb 25 13:03:13 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Mon, 25 Feb 2008 16:03:13 -0500 (EST)
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <47C32122.1090505@cse.ucdavis.edu>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu>
Message-ID: <Pine.LNX.4.64.0802251542280.17613@coffee.psychology.mcmaster.ca>

> I believe it's actually simultaneous, instructions from 2 different processes 
> can run in the same cycle against 2 different register files.

for some definition of 'simultaneous'.  I suspect that netburst-HT simply
runs with a thread until it stalls, then switches.  I don't think Intel 
ever detailed which stalling events do this.  in some of the initial papers
on netburst-HT, it was implied that the implementation was almost a
side-effect of how the chip tracks in-flight operations.  since no modern 
chip really has a unitary pipeline, HT might well tolerate one thread
chugging through a microcoded transcendental at the same time as another,
say, follows a pointer.

>> had assumed so but I appear to be confused about it. Hyperthreading keeps a
>> thread ready to take advantage of stalls in a preceeding thread, but 
>> doesn't
>> ever actually perform a second instruction in one click tick, correct? One

I believe HT switching does happen cycle-by-cycle, and would guess that 
in-flight ops from multiple threads can coexist (not executing on the same 
unit in the same cycle, though.)

to me, this makes a lot more sense than manycore chips, actually.
manycore basically assumes that tracking inflight ops is the main scaling 
problem with modern chips.  that may be the case, but I've never really 
heard it described as such.

imagine if, instead of 8 cores onchip, you just had 8 "thread sequence"
units that contained fetch/decode, architected registers and retirement.
and a single big pool of scoreboarded functional units, of course.  the 
advantage being that one thread could use many units.  as opposed to a 
static 8-core where each thread gets only the unit(s) in its core...

I think the main takehome from netburst-HT is that SMT needs to provide 
more units, not just provide a new way for two threads to interfere.

regards, mark hahn.


From jcownie at cantab.net  Mon Feb 25 14:04:17 2008
From: jcownie at cantab.net (James Cownie)
Date: Mon, 25 Feb 2008 22:04:17 +0000
Subject: [Beowulf] Opinions of  Hyper-threading?
In-Reply-To: <19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
Message-ID: <F15C9A00-F020-446E-9C99-D6508BF0CBFB@cantab.net>


On 25 Feb 2008, at 19:40, Geoff Galitz wrote:

> Another thing to consider is that while HT certainly still exists  
> in the
> wild, no new systems implement it, hence limited testing.

However, if you believe the The Inquirer, it may be coming back.

"Nehalem is projected as arriving late, supporting two, four and  
eight cores, representing four, eight and 16 threads. "

http://www.theinquirer.net/gb/inquirer/news/2008/02/24/intel-csi- 
nehalem-dunnington

(Disclaimer: I work for Intel, but The Inquirer appears to know much  
more than me about this :-) (not hard, since I know nothing))
--
-- Jim
--
James Cownie <jcownie at cantab.net>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080225/f9f74fc4/attachment.html>

From andrew at moonet.co.uk  Tue Feb 26 07:03:02 2008
From: andrew at moonet.co.uk (andrew holway)
Date: Tue, 26 Feb 2008 15:03:02 +0000
Subject: [Beowulf] cell processors
Message-ID: <f8b14cb80802260703i7dda1dadyadd6344e6826d7d7@mail.gmail.com>

Has anyone got any up and running? What are you doing with them etc?

Cheers

Andrew


From landman at scalableinformatics.com  Tue Feb 26 07:36:06 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 26 Feb 2008 10:36:06 -0500
Subject: [Beowulf] cell processors
In-Reply-To: <f8b14cb80802260703i7dda1dadyadd6344e6826d7d7@mail.gmail.com>
References: <f8b14cb80802260703i7dda1dadyadd6344e6826d7d7@mail.gmail.com>
Message-ID: <47C431E6.6040703@scalableinformatics.com>

andrew holway wrote:
> Has anyone got any up and running? What are you doing with them etc?

For computing ... ? :)  Oh, their other use ...

Various bio and chem apps are being ported.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From andrew at moonet.co.uk  Tue Feb 26 07:48:57 2008
From: andrew at moonet.co.uk (andrew holway)
Date: Tue, 26 Feb 2008 15:48:57 +0000
Subject: [Beowulf] cell processors
In-Reply-To: <47C431E6.6040703@scalableinformatics.com>
References: <f8b14cb80802260703i7dda1dadyadd6344e6826d7d7@mail.gmail.com>
	<47C431E6.6040703@scalableinformatics.com>
Message-ID: <f8b14cb80802260748q41c076bbgf720f76cedfbb04a@mail.gmail.com>

mmm, I was interested in getting some bio benchmarks. See how they weigh up.


On Tue, Feb 26, 2008 at 3:36 PM, Joe Landman
<landman at scalableinformatics.com> wrote:
> andrew holway wrote:
>  > Has anyone got any up and running? What are you doing with them etc?
>
>  For computing ... ? :)  Oh, their other use ...
>
>  Various bio and chem apps are being ported.
>
>
>  --
>  Joseph Landman, Ph.D
>  Founder and CEO
>  Scalable Informatics LLC,
>  email: landman at scalableinformatics.com
>  web  : http://www.scalableinformatics.com
>         http://jackrabbit.scalableinformatics.com
>  phone: +1 734 786 8423
>  fax  : +1 866 888 3112
>  cell : +1 734 612 4615
>


From libo at buaa.edu.cn  Tue Feb 26 07:48:31 2008
From: libo at buaa.edu.cn (Li, Bo)
Date: Tue, 26 Feb 2008 23:48:31 +0800
Subject: [Beowulf] cell processors
References: <f8b14cb80802260703i7dda1dadyadd6344e6826d7d7@mail.gmail.com>
Message-ID: <000801c8788f$0ee08230$6300a8c0@JSIIBM>

Cell is a great processor for HPC in most of areas if you can get the data transferring well planned. 
Generally, a DP version Cell can show about 90GFlops in DP Linpack. And the performance depends on the tuning greatly.
Regards,
Li, Bo
----- Original Message ----- 
From: "andrew holway" <andrew at moonet.co.uk>
To: "Beowulf Mailing List" <beowulf at beowulf.org>
Sent: Tuesday, February 26, 2008 11:03 PM
Subject: [Beowulf] cell processors


> Has anyone got any up and running? What are you doing with them etc?
> 
> Cheers
> 
> Andrew
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


From peter.st.john at gmail.com  Tue Feb 26 07:50:20 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Tue, 26 Feb 2008 10:50:20 -0500
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <F15C9A00-F020-446E-9C99-D6508BF0CBFB@cantab.net>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<F15C9A00-F020-446E-9C99-D6508BF0CBFB@cantab.net>
Message-ID: <e4d4fd070802260750w724db1f4j9cb10820af11881c@mail.gmail.com>

Slashdot points to Daily Tech with the headline "Sun leaks 6 core...Nehalem
Details" at
http://www.dailytech.com/Sun%20Leaks%206core%20Intel%20Xeon%20Nehalem%20Details/article10834.htm
Peter

On Mon, Feb 25, 2008 at 5:04 PM, James Cownie <jcownie at cantab.net> wrote:

>
>  On 25 Feb 2008, at 19:40, Geoff Galitz wrote:
>
>  Another thing to consider is that while HT certainly still exists in the
>
> wild, no new systems implement it, hence limited testing.
>
>
> However, if you believe the The Inquirer, it may be coming back.
>
> "Nehalem is projected as arriving late, supporting two, four and eight
> cores, representing four, eight and 16 threads. "
>
>
> http://www.theinquirer.net/gb/inquirer/news/2008/02/24/intel-csi-nehalem-dunnington
>
>
> (Disclaimer: I work for Intel, but The Inquirer appears to know much more
> than me about this :-) (not hard, since I know nothing))
> --
>
> -- Jim
>
> --
>
> James Cownie <jcownie at cantab.net>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080226/92894437/attachment.html>

From forum.san at gmail.com  Mon Feb 25 05:18:23 2008
From: forum.san at gmail.com (Sangamesh B)
Date: Mon, 25 Feb 2008 18:48:23 +0530
Subject: [Beowulf] Implementation thru Sun Grid Engine
In-Reply-To: <47C2609E.50307@scalableinformatics.com>
References: <cb60cbc40802240253o3cd5a6a6hc45312bdc33d55f7@mail.gmail.com>
	<Pine.LNX.4.64.0802250110230.20963@coffee.psychology.mcmaster.ca>
	<47C2609E.50307@scalableinformatics.com>
Message-ID: <cb60cbc40802250518i15791ba1k6f6c93a2f6107101@mail.gmail.com>

I had posted this to gridengine mailing list previous to Beowulf. But did
not get response.

Any how I've resolved the issues..

Thanks for your response.

regards,
Sangamesh

On Mon, Feb 25, 2008 at 12:00 PM, Joe Landman <
landman at scalableinformatics.com> wrote:

> Mark Hahn wrote:
> >> Can anyone help me out to resolve above mentioned issues?
> >
> > isn't there a better list than this for an utterly SGE-specific
> question?
>
> users at gridengine.sunsource.net
>
> SGE does have its own lists.  And wikis and howtos, etc.
> http://gridengine.sunsource.net .  You might find some familiar
> faces/names there ...
>
> Chris D on this list is a guru for GE.  He runs the wiki.
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web  : http://www.scalableinformatics.com
>        http://jackrabbit.scalableinformatics.com
> phone: +1 734 786 8423
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080225/ed6efe7d/attachment.html>

From toon at moene.indiv.nluug.nl  Mon Feb 25 14:21:14 2008
From: toon at moene.indiv.nluug.nl (Toon Moene)
Date: Mon, 25 Feb 2008 23:21:14 +0100
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <3F034AC3-445A-43D0-AA9B-056BDE3BC396@sanger.ac.uk>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>	<47B5B542.7000605@scalableinformatics.com>	<0853B432-4B14-40AA-B246-CC302B81FEFD@sanger.ac.uk>	<47B5CFC9.6060105@scalableinformatics.com>
	<3F034AC3-445A-43D0-AA9B-056BDE3BC396@sanger.ac.uk>
Message-ID: <47C33F5A.5070201@moene.indiv.nluug.nl>

Tim Cutts wrote:

>  I've been a Debian Developer for more than 10 years, but I bought that
> book last year and it's still teaching me useful stuff.  Several of us 
> in my group have bought it now, and we all swear by it.  Pretty much 
> everything it says about Debian applies to Ubuntu as well.

This is definitely the wrong list to ask this question - and therefore, 
I'm only phishing for pointers.

Three months ago I bought a machine (from a vendor I won't name, because 
it was HP), that featured a 320 Gbyte IDE drive and a (removable, but 
kept installed in my case) 320 Gbyte SCSI device).

The Stable install went fine - IDE drive got /dev/hda1 (swap) and 
/dev/hda2 (/ - the rest of the device).  SCSI drive got /dev/sda1 
(/scratch - I need lots of it).

So far, so good.  I downloaded the e1000 ethernet driver, because it 
wasn't included in stable's 2.6.18-5 kernel.  Compiled, modprobed, 
dhclient'ed to my ISP's modem, all OK.

Changed /etc/apt/sources.list to exclude the DVD and include

deb http://ftp.nl.debian.org/debian testing main contrib

and apt-get update && apt-get dist-upgrade'd away.

Unfortunately, the resulting system (unlike the original from the DVD) 
won't boot - it can't find the boot device.  It does (helpfully) display 
the message that the root device might be renamed (/dev/sda2), but 
booting with that root device doesn't bring up the system.

Something rather fundamental must have changed (probably in udev) 
between "stable" and the recent "testing" system - but what ?

-- 
Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.indiv.nluug.nl/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html


From poknam at gmail.com  Mon Feb 25 18:49:45 2008
From: poknam at gmail.com (PN)
Date: Tue, 26 Feb 2008 10:49:45 +0800
Subject: [Beowulf] Structural analysis and design
Message-ID: <92daa7bf0802251849l38e1ea2mc832aff3004d8621@mail.gmail.com>

hi all,

our department is using Etabs for structural analysis and design in windows
platform,
we want to find that kind of programs that can be run in beowulf cluster.
does anyone have ideas about it?

thanks in advance,
PN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080226/aef66931/attachment.html>

From landman at scalableinformatics.com  Tue Feb 26 09:15:37 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 26 Feb 2008 12:15:37 -0500
Subject: [Beowulf] centos5 as cluster os
In-Reply-To: <47C33F5A.5070201@moene.indiv.nluug.nl>
References: <a31cd3860802120548i3af1be17j92ddcea2751ce262@mail.gmail.com>	<Pine.LNX.4.64.0802151006480.31809@coffee.psychology.mcmaster.ca>	<47B5B542.7000605@scalableinformatics.com>	<0853B432-4B14-40AA-B246-CC302B81FEFD@sanger.ac.uk>	<47B5CFC9.6060105@scalableinformatics.com>	<3F034AC3-445A-43D0-AA9B-056BDE3BC396@sanger.ac.uk>
	<47C33F5A.5070201@moene.indiv.nluug.nl>
Message-ID: <47C44939.2050504@scalableinformatics.com>


Toon Moene wrote:

> Three months ago I bought a machine (from a vendor I won't name, because 
> it was HP), that featured a 320 Gbyte IDE drive and a (removable, but

Heh...

> kept installed in my case) 320 Gbyte SCSI device).
> 
> The Stable install went fine - IDE drive got /dev/hda1 (swap) and 
> /dev/hda2 (/ - the rest of the device).  SCSI drive got /dev/sda1 
> (/scratch - I need lots of it).
> 
> So far, so good.  I downloaded the e1000 ethernet driver, because it 
> wasn't included in stable's 2.6.18-5 kernel.  Compiled, modprobed, 
> dhclient'ed to my ISP's modem, all OK.
> 
> Changed /etc/apt/sources.list to exclude the DVD and include
> 
> deb http://ftp.nl.debian.org/debian testing main contrib
> 
> and apt-get update && apt-get dist-upgrade'd away.
> 
> Unfortunately, the resulting system (unlike the original from the DVD) 
> won't boot - it can't find the boot device.  It does (helpfully) display 
> the message that the root device might be renamed (/dev/sda2), but 
> booting with that root device doesn't bring up the system.

Did any of these steps do a

	mkinitramfs

that you remember?  If so, is it possible that your scsi stuff got 
excluded from the initrd?

> Something rather fundamental must have changed (probably in udev) 
> between "stable" and the recent "testing" system - but what ?

Might not be udev, could be a missing scsi driver in initrd.

> 

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From peter.st.john at gmail.com  Tue Feb 26 09:56:06 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Tue, 26 Feb 2008 12:56:06 -0500
Subject: [Beowulf] Structural analysis and design
In-Reply-To: <92daa7bf0802251849l38e1ea2mc832aff3004d8621@mail.gmail.com>
References: <92daa7bf0802251849l38e1ea2mc832aff3004d8621@mail.gmail.com>
Message-ID: <e4d4fd070802260956y1c370fb2y50996a82bcdbf641@mail.gmail.com>

I'd ask Civil Engineers, I think.
I see at http://www.icivilengineer.com/Software_Guide/Structural_Analysis/
(which
has descritptions and "free demos" of various CE software packages) that
Etabs is "A suite of linear & nonlinear static & dynamic analysis & design
of building systems. " The term "structural analysis" didn't get me any
meaningful hits at Sourceforge; unfortunately "civil engineering" isn't a
subproject of the twenty thousand "science and engineering projects" but you
might take a look through the Simulations project at
http://sourceforge.net/softwaremap/trove_list.php?form_cat=600, there are
over a thousand items and some keyword might jump out at you.

Somewhere, there is a Civil Engineer thinking, "it would be crazy for all
the traffic lights in Manhatten to be red while a single car drives the
length of the island, stopping at hundreds of buildings along the way. The
traffic lights should all be green at least half of the time, on a chip with
millions of transisitors and millions of possible paths" and I would like to
buy that Civil Engineer a beer :-)
Peter

On Mon, Feb 25, 2008 at 9:49 PM, PN <poknam at gmail.com> wrote:

> hi all,
>
> our department is using Etabs for structural analysis and design in
> windows platform,
> we want to find that kind of programs that can be run in beowulf cluster.
> does anyone have ideas about it?
>
> thanks in advance,
> PN
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080226/41b97bed/attachment.html>

From coutinho at dcc.ufmg.br  Tue Feb 26 13:03:22 2008
From: coutinho at dcc.ufmg.br (Bruno Coutinho)
Date: Tue, 26 Feb 2008 18:03:22 -0300
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <47C3297E.3060200@gmail.com>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu> <47C3297E.3060200@gmail.com>
Message-ID: <a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>

2008/2/25, Geoff Jacobs <gdjacobs at gmail.com>:
>
> Bill Broadley wrote:
> > I believe it's actually simultaneous, instructions from 2 different
> > processes can run in the same cycle against 2 different register files.
> >
> > Other chips have vertical multithreading where only 1 process runs in
> > any given cycle.
>
>
> Multiple threads residing in any particular stage of execution at any
> particular time. If one thread stalls, the processor can proceed with
> execution on the other thread.
>
> Somewhat outside the bounds of this list, but would an in order
> processor like Power 6 derive more benefit from SMT?


Yes, and more than an out of order processor. A out of order processor, can
reorder the instructions whenever a hazard occurs. A in order processor, on
the other hand, has to wait. With SMT, it can switch to another thread.

And today memory access can stall up to hundreds of cycles, so any processor
can hide this latency by switching to another thread.
But the you have to make sure the processor has enough cache and memory
bandwidth to handle the increased memory traffic (like Sun Niagara).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080226/c904a47f/attachment.html>

From landman at scalableinformatics.com  Tue Feb 26 13:42:05 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 26 Feb 2008 16:42:05 -0500
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
References: <47B32FB1.60905@berkeley.edu>	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>	<47C32122.1090505@cse.ucdavis.edu>
	<47C3297E.3060200@gmail.com>
	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
Message-ID: <47C487AD.5050503@scalableinformatics.com>


Bruno Coutinho wrote:

> Yes, and more than an out of order processor. A out of order processor, 
> can reorder the instructions whenever a hazard occurs. A in order 
> processor, on the other hand, has to wait. With SMT, it can switch to 
> another thread.
> 
> And today memory access can stall up to hundreds of cycles, so any 
> processor can hide this latency by switching to another thread.

My gosh ... we have re-invented the Tera MTA.  ...

> But the you have to make sure the processor has enough cache and memory 
> bandwidth to handle the increased memory traffic (like Sun Niagara).

The problem with many (cores|threads) is that memory bandwidth wall.  A 
fixed size (B) pipe to memory, with N requesters on that pipe ...


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From hahn at mcmaster.ca  Tue Feb 26 15:58:10 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Tue, 26 Feb 2008 18:58:10 -0500 (EST)
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <47C487AD.5050503@scalableinformatics.com>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu> <47C3297E.3060200@gmail.com>
	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
	<47C487AD.5050503@scalableinformatics.com>
Message-ID: <Pine.LNX.4.64.0802261838580.16530@coffee.psychology.mcmaster.ca>

>> And today memory access can stall up to hundreds of cycles, so any 
>> processor can hide this latency by switching to another thread.
>
> My gosh ... we have re-invented the Tera MTA.  ...

I think the reason we both know what that name means is that 
they had (have?) a nugget of truth.  after all, a multiplier 
unit on a chip doesn't really care on which thread's behalf 
it's doing work.  MTA is perhaps a bit far towards the pure 
gatling-gun approach, but I think we can all agree that ultimately
any program is just a big hairy dataflow graph.

>> But the you have to make sure the processor has enough cache and memory 
>> bandwidth to handle the increased memory traffic (like Sun Niagara).
>
> The problem with many (cores|threads) is that memory bandwidth wall.  A fixed 
> size (B) pipe to memory, with N requesters on that pipe ...

I think that's why almost everyone agrees with the elegance of AMD's 
system architecture - memory attached to and thus scaling with ncpus.
and yes, there's a lot of work already going on regarding making caches
more intelligent - predicting the multireference or sharing properties
of a cache block, for instance, to choose when to move it and between
which caches in a big system.


From landman at scalableinformatics.com  Tue Feb 26 16:13:32 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Tue, 26 Feb 2008 19:13:32 -0500
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <Pine.LNX.4.64.0802261838580.16530@coffee.psychology.mcmaster.ca>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu> <47C3297E.3060200@gmail.com>
	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
	<47C487AD.5050503@scalableinformatics.com>
	<Pine.LNX.4.64.0802261838580.16530@coffee.psychology.mcmaster.ca>
Message-ID: <47C4AB2C.2010901@scalableinformatics.com>


Mark Hahn wrote:
>>> And today memory access can stall up to hundreds of cycles, so any 
>>> processor can hide this latency by switching to another thread.
>>
>> My gosh ... we have re-invented the Tera MTA.  ...
> 
> I think the reason we both know what that name means is that they had 
> (have?) a nugget of truth.  after all, a multiplier unit on a chip 

had ... morphed into "The New Cray".  Burton Smith is long gone, now at 
Microsoft.

> doesn't really care on which thread's behalf it's doing work.  MTA is 
> perhaps a bit far towards the pure gatling-gun approach, but I think we

It was very interesting when I first heard about it at an SC9x 
conference.  Spoke to Burton for a few minutes on it.

Ooo was a (very weak) version of something like this.  SMT is a little 
stronger.  Register renaming and all those fancy ooo optimizations were 
in there to make breaking those dependencies down to enable better IPC 
... which is the name of the game in the end anyway ...

You are absolutely right, in that the functional units don't (and 
shouldn't) care what thread they are using.

> can all agree that ultimately
> any program is just a big hairy dataflow graph.

I would like to use that as a quote ... :)

> 
>>> But the you have to make sure the processor has enough cache and 
>>> memory bandwidth to handle the increased memory traffic (like Sun 
>>> Niagara).
>>
>> The problem with many (cores|threads) is that memory bandwidth wall.  
>> A fixed size (B) pipe to memory, with N requesters on that pipe ...
> 
> I think that's why almost everyone agrees with the elegance of AMD's 
> system architecture - memory attached to and thus scaling with ncpus.
> and yes, there's a lot of work already going on regarding making caches
> more intelligent - predicting the multireference or sharing properties
> of a cache block, for instance, to choose when to move it and between
> which caches in a big system.

I seem to remember hearing about the processors-in-core idea many moons 
ago.  It seemed hard to program.  But compare that to the big honking 
pile-o-ram, with many processors, few pipes, and bandwidth limits ...

The AMD model is elegant.  As you expand the number of cores you expand 
the number of memory connections.  This is part of the reason the 2350's 
at 2GHz give some 3+ GHz Intel 5472's a run for the money on a number of 
real world tests.  Sort of like the alpha ... you can make the CPU 
"infinitely fast", but there is the little matter of the rest of the 
system to worry about.

Joe


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From apittman at concurrent-thinking.com  Wed Feb 27 03:12:04 2008
From: apittman at concurrent-thinking.com (Ashley Pittman)
Date: Wed, 27 Feb 2008 11:12:04 +0000
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <47C3297E.3060200@gmail.com>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu>  <47C3297E.3060200@gmail.com>
Message-ID: <1204110724.6787.15.camel@bruce.priv.wark.uk.streamline-computing.com>


On Mon, 2008-02-25 at 14:47 -0600, Geoff Jacobs wrote:
> Bill Broadley wrote:
> > I believe it's actually simultaneous, instructions from 2 different
> > processes can run in the same cycle against 2 different register files.
> > 
> > Other chips have vertical multithreading where only 1 process runs in
> > any given cycle.
> 
> Multiple threads residing in any particular stage of execution at any
> particular time. If one thread stalls, the processor can proceed with
> execution on the other thread.
> 
> Somewhat outside the bounds of this list, but would an in order
> processor like Power 6 derive more benefit from SMT?

I saw a talk which said SMT was worth a maximum of 20% on power5 and
often performed worse than if it had been tured off.  This correlates
well with my experience of it on Intel CPUs.

http://www.hpcx.ac.uk/about/events/annual2006/slides/hague.pdf

It seems most people, myself included, benchmark(ed) with hyperthreading
disabled in the bois/at boot time and again with hyperthreading enabled
and jobs scheduled to the meta-cpu's.  Not surprisingly the performance
often isn't all that different despite having twice as many cpu's
however the variance is much higher when it's enabled.

I believe there should be a third way whereby the virtual cpu's are
enabled and running but not used to run parallel jobs, more to run any
background tasks the OS should happen to throw at them, if we were to go
down this road I could use the reclaimed cycles to do something sensible
with marshaling data for non-blocking MPI operations.  At least part of
the reason this wasn't tested before is scheduler support for
hyperthreading and CPU binding, by the time kernel support was good
enough to do the tests I'd have liked to have done the window was closed
and hardware technology had moved on.

Ashley.


From gdjacobs at gmail.com  Wed Feb 27 06:50:02 2008
From: gdjacobs at gmail.com (Geoff Jacobs)
Date: Wed, 27 Feb 2008 08:50:02 -0600
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <1204110724.6787.15.camel@bruce.priv.wark.uk.streamline-computing.com>
References: <47B32FB1.60905@berkeley.edu>	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>	<47C32122.1090505@cse.ucdavis.edu>
	<47C3297E.3060200@gmail.com>
	<1204110724.6787.15.camel@bruce.priv.wark.uk.streamline-computing.com>
Message-ID: <47C5789A.2030207@gmail.com>

Ashley Pittman wrote:
> I saw a talk which said SMT was worth a maximum of 20% on power5 and
> often performed worse than if it had been tured off.  This correlates
> well with my experience of it on Intel CPUs.
> 
> http://www.hpcx.ac.uk/about/events/annual2006/slides/hague.pdf
> 
> It seems most people, myself included, benchmark(ed) with hyperthreading
> disabled in the bois/at boot time and again with hyperthreading enabled
> and jobs scheduled to the meta-cpu's.  Not surprisingly the performance
> often isn't all that different despite having twice as many cpu's
> however the variance is much higher when it's enabled.
> 
> I believe there should be a third way whereby the virtual cpu's are
> enabled and running but not used to run parallel jobs, more to run any
> background tasks the OS should happen to throw at them, if we were to go
> down this road I could use the reclaimed cycles to do something sensible
> with marshaling data for non-blocking MPI operations.  At least part of
> the reason this wasn't tested before is scheduler support for
> hyperthreading and CPU binding, by the time kernel support was good
> enough to do the tests I'd have liked to have done the window was closed
> and hardware technology had moved on.

I was wondering about the Power 6 because it is an in order design. To
me, hyperthreading and OoOE are optimizing in the same area, and I
wanted to know if SMT is more beneficial where there is no OoOE.

-- 
Geoffrey D. Jacobs


From diep at xs4all.nl  Wed Feb 27 11:30:29 2008
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Wed, 27 Feb 2008 20:30:29 +0100
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <Pine.LNX.4.64.0802251542280.17613@coffee.psychology.mcmaster.ca>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu>
	<Pine.LNX.4.64.0802251542280.17613@coffee.psychology.mcmaster.ca>
Message-ID: <C77161C8-6298-44D6-A23B-B4C7A440AC9B@xs4all.nl>


On Feb 25, 2008, at 10:03 PM, Mark Hahn wrote:

>> I believe it's actually simultaneous, instructions from 2  
>> different processes can run in the same cycle against 2 different  
>> register files.
>
> for some definition of 'simultaneous'.  I suspect that netburst-HT  
> simply
> runs with a thread until it stalls, then switches.  I don't think  
> Intel ever detailed which stalling events do this.  in some of the  
> initial papers
> on netburst-HT, it was implied that the implementation was almost a
> side-effect of how the chip tracks in-flight operations.  since no  
> modern chip really has a unitary pipeline, HT might well tolerate  
> one thread
> chugging through a microcoded transcendental at the same time as  
> another,
> say, follows a pointer.
>
>>> had assumed so but I appear to be confused about it.  
>>> Hyperthreading keeps a
>>> thread ready to take advantage of stalls in a preceeding thread,  
>>> but doesn't
>>> ever actually perform a second instruction in one click tick,  
>>> correct? One
>
> I believe HT switching does happen cycle-by-cycle, and would guess  
> that in-flight ops from multiple threads can coexist (not executing  
> on the same unit in the same cycle, though.)
>
> to me, this makes a lot more sense than manycore chips, actually.
> manycore basically assumes that tracking inflight ops is the main  
> scaling problem with modern chips.  that may be the case, but I've  
> never really heard it described as such.
>
> imagine if, instead of 8 cores onchip, you just had 8 "thread  
> sequence"
> units that contained fetch/decode, architected registers and  
> retirement.
> and a single big pool of scoreboarded functional units, of course.   
> the advantage being that one thread could use many units.  as  
> opposed to a static 8-core where each thread gets only the unit(s)  
> in its core...

Hi Mark,

Let's calculate with your imaginary chip where you get rid of the  
multicore thought and have to get rid of out of order in order to get  
your thread sequence idea to work:

If you've got 8 threads that execute each 1 instruction a cycle,
that's:

8 * 1 * 3Ghz = 24 Gflop double precision

Now let's compare with a todays quadcore, a system we build for just  
600 euro, like the nodes i'm planning to build now:

198 euro for the chip @ 2.4Ghz of intel and 134 euro for amd @ 2.2Ghz:

here goes calculation against your 24 gflop:

4 cores * 3 instructions a cycle * 2 DP in each SSE2/SSE3 vector *  
2.4Ghz = 24 * 2.4 = 48 + 9.6 = 57.6

It is very hard to beat todays quadcores with the imaginary cpu of  
the future.

Multicore and out of order are big winners that butcher RISC and the  
old Alpha engineers SMT idea completely,
with exception of power usage.

Multicore right now means BOOM you are factor 4.0 faster nearly (3.8  
in case of my chessproggie), and out of order means you have
a potential of 3 to 4 instructions a cycle which is a big winner too.

Replacing that with some other technique SMT means the other  
technique SMT needs to find a factor 12 in speed somewhere.

Vincent


> I think the main takehome from netburst-HT is that SMT needs to  
> provide more units, not just provide a new way for two threads to  
> interfere.
>
> regards, mark hahn.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>


From hahn at mcmaster.ca  Wed Feb 27 11:45:51 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Wed, 27 Feb 2008 14:45:51 -0500 (EST)
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <C77161C8-6298-44D6-A23B-B4C7A440AC9B@xs4all.nl>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu>
	<Pine.LNX.4.64.0802251542280.17613@coffee.psychology.mcmaster.ca>
	<C77161C8-6298-44D6-A23B-B4C7A440AC9B@xs4all.nl>
Message-ID: <Pine.LNX.4.64.0802271433410.13449@coffee.psychology.mcmaster.ca>

>> imagine if, instead of 8 cores onchip, you just had 8 "thread sequence"
>> units that contained fetch/decode, architected registers and retirement.
>> and a single big pool of scoreboarded functional units, of course.  the 
>> advantage being that one thread could use many units.  as opposed to a 
>> static 8-core where each thread gets only the unit(s) in its core...
>
> Hi Mark,
>
> Let's calculate with your imaginary chip where you get rid of the multicore 
> thought and have to get rid of out of order in order to get your thread 
> sequence idea to work:
>
> If you've got 8 threads that execute each 1 instruction a cycle,
> that's:
>
> 8 * 1 * 3Ghz = 24 Gflop double precision

no.  I did not say that each thread unit was single-scalar.

> 4 cores * 3 instructions a cycle * 2 DP in each SSE2/SSE3 vector * 2.4Ghz = 
> 24 * 2.4 = 48 + 9.6 = 57.6

"my" 8-thread-unit chip would also have superscalar dispatch, and would 
obviously also have SIMD units.  as far as I can tell, you missed 
the WHOLE point of my digression: that the current manycore trend is a form
of static partitioning.

such chopping up into little pieces leads to less efficiency,
since it's hard to ensure your workload is always perfectly balanced.
such balance is exponentially more unlikley with the core count explodes.

> Multicore and out of order are big winners that butcher RISC and the old 
> Alpha engineers SMT idea completely,
> with exception of power usage.

don't be so naive - all current processors owe most of their credit to 
previous architectures (_especially_ trends like RISC, OOO, Alpha and SMT).

> Multicore right now means BOOM you are factor 4.0 faster nearly (3.8 in case 
> of my chessproggie), and out of order means you have
> a potential of 3 to 4 instructions a cycle which is a big winner too.

you're confusing superscalar with OOO here.  but again, that's not the point.
there's nothing wrong with the manycore trend, just that it's kind of dumb - 
enough to make me think chip architects who cut their teeth on RISC are now 
looking forward to retirement rather than pushing for excellent designs ;)

> Replacing that with some other technique SMT means the other technique SMT 
> needs to find a factor 12 in speed somewhere.

again, replication of cores is trivial, architecturally, and soaks up the 
extra transistors.  my question is: are there improved or better ways?

observe, for instance, that your code is clearly cache-happy.  good for you!
not all workloads are, and because offchip memory interfaces are not
following moore's law, memory is a real problem only exacerbated by manycore.


From richard.walsh at comcast.net  Wed Feb 27 15:18:49 2008
From: richard.walsh at comcast.net (richard.walsh at comcast.net)
Date: Wed, 27 Feb 2008 23:18:49 +0000
Subject: [Beowulf] Opinions of Hyper-threading?
Message-ID: <022720082318.15100.47C5EFD8000BC7F100003AFC2200761438089C040E99D20B9D0E080C079D@comcast.net>


-------------- Original message -------------- 
From: Ashley Pittman <apittman at concurrent-thinking.com> 

> I saw a talk which said SMT was worth a maximum of 20% on power5 and 
> often performed worse than if it had been tured off. This correlates 
> well with my experience of it on Intel CPUs. 

As Joe Landman suggested the notion of a thread (as a logical construct representing parallelizable work) can be reduced to a single instruction.  In this case, the logical distance between the two work loads is minimal and managed by OoO hardware (or VLIW).  As the separation of the parallel workloads (threads) grows we have parallel threads within one code that are defined by code blocks, and then workloads in different processes with the same MPI application, to workloads separated  by an even greater logical distant in different applications, and finally to thread groups virtualized across OS environments.  
As Mark H. points out the functional units do not care whom is parent to its work.  Still, the problem of shepherding the result back to it proper pen grows with the distance of logical separation.  Hardware resources and chip surface area are required to manage this. That is one reason why Intel delayed SMT in Clovertown and Harpertown and why many-core advocates think that threads are a waste of chip space, especially in a data parallel universe.  
We wish to more fully utilize under utilized functional unit resources in a core of course, but the as Ashley P. intimated as we pile threads disproportionately on top of ever growing parallel hardware the chance of a delaying collision grows in a non-linear way.  Thus, the expanded variance.  Put another way the gap between trivial or random schedules through the hardware and optimal ones grows  (Like the distance between a pair and a straight flush in poker) as both workloads and resouces thread.  If we are allocating resouces based on sampling, we then run into the problem of not being able to discover where the idle resouces are.  This is visible in scaling tests of very fat server nodes using the VMmark benchmark.  Efficiency drops off even with benchmarks scaled weakly.  Are there better alternatives?  Well, at the instruction level, we have the VLIW -- prepacked workloads known not to intefer with each other. What about at the level of schedulers, which as I understand it ar
e all sampling based ... ?? There is the notion of resource-requirement-aware scheduling which intends to eliminate resource collisions in advance for virtualized work loads.  The Cray XMT uses hardware resources to insulate a large groups of parallel workloads (at the expense of individual or related ones sometimes) from interference that might idle useable resources if additional more or less distant work was not available.
This discussion invokes wild thoughts ... like the notion of compile multiple applications together in a cluster ... and running them together knowing that the compiler can shuffle the work together smartly with the need for additional hardware resources to do it.
Are folks hear familiar with eXludus' resource-requirement-aware scheduling technology?
Sorry about the length ... but it is an interesting topic.
Regards,
rbw
-- 

"Making predictions is hard, especially about the future." 

Niels Bohr 

-- 

Richard Walsh 
Thrashing River Consulting-- 
5605 Alameda St. 
Shoreview, MN 55126 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080227/f11276ff/attachment.html>

From hahn at mcmaster.ca  Wed Feb 27 16:42:50 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Wed, 27 Feb 2008 19:42:50 -0500 (EST)
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <022720082318.15100.47C5EFD8000BC7F100003AFC2200761438089C040E99D20B9D0E080C079D@comcast.net>
References: <022720082318.15100.47C5EFD8000BC7F100003AFC2200761438089C040E99D20B9D0E080C079D@comcast.net>
Message-ID: <Pine.LNX.4.64.0802271919250.13449@coffee.psychology.mcmaster.ca>

> alternatives?  Well, at the instruction level, we have the VLIW --
> prepacked workloads known not to intefer with each other. What about at the

there's a good reason VLIW never made much of a splash - even EPIC is 
pretty much past-tense.  the reason is that static schedules don't work
well with things like caches or even early-out ALU's.  they are fine 
when your ops are consistent/deterministic in execution time - heck,
we should consider VLIW to be a sort of mixed-grill SIMD or vector approach.

as far as I can tell, current GPUs are an example of how far this can be
taken, and at what cost.  you get a set of very in-order core-like units
whose memory model is massively constrained by the need to remain in lock-step.
very good for data-parallel workloads like graphics, but not as easy to 
apply to more complicated data access patterns or conditional flows.

(since I'm ranting about GPUs, I'm really surprised those guys haven't 
dived deeply into the processor-in-memory model.  I guess it says something
significant about how differently specialized cpu vs dram fabs are. 
especially when you consider that GPU boards use custom for-GPU-only dram.)


From coutinho at dcc.ufmg.br  Wed Feb 27 18:03:59 2008
From: coutinho at dcc.ufmg.br (Bruno Coutinho)
Date: Wed, 27 Feb 2008 23:03:59 -0300
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <Pine.LNX.4.64.0802271919250.13449@coffee.psychology.mcmaster.ca>
References: <022720082318.15100.47C5EFD8000BC7F100003AFC2200761438089C040E99D20B9D0E080C079D@comcast.net>
	<Pine.LNX.4.64.0802271919250.13449@coffee.psychology.mcmaster.ca>
Message-ID: <a8d96dec0802271803v16578feft240831e9ee45acf6@mail.gmail.com>

2008/2/27, Mark Hahn <hahn at mcmaster.ca>:
>
> > alternatives?  Well, at the instruction level, we have the VLIW --
> > prepacked workloads known not to intefer with each other. What about at
> the
>
> there's a good reason VLIW never made much of a splash - even EPIC is
> pretty much past-tense.  the reason is that static schedules don't work
> well with things like caches or even early-out ALU's.  they are fine
> when your ops are consistent/deterministic in execution time - heck,
> we should consider VLIW to be a sort of mixed-grill SIMD or vector
> approach.


Another alternative to multicore and multithreading is replicating parts of
the core.
In multithreading you replicate nothing, in multicore you replicate
everyting, but you can replicate only smallest or most used core components.
Sun did it in Niagera 2: each core has two integer execution units.

as far as I can tell, current GPUs are an example of how far this can be
> taken, and at what cost.  you get a set of very in-order core-like units
> whose memory model is massively constrained by the need to remain in
> lock-step.
> very good for data-parallel workloads like graphics, but not as easy to
> apply to more complicated data access patterns or conditional flows.
>
> (since I'm ranting about GPUs, I'm really surprised those guys haven't
> dived deeply into the processor-in-memory model.  I guess it says
> something
> significant about how differently specialized cpu vs dram fabs are.
> especially when you consider that GPU boards use custom for-GPU-only
> dram.)
>

As far I know GPUs access large amounts of data (most of it is textures) in
regular streams with relatively low frequency (once each frame). Most time
they only need very high throughput.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080227/de2880be/attachment.html>

From bill at cse.ucdavis.edu  Wed Feb 27 23:30:35 2008
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Wed, 27 Feb 2008 23:30:35 -0800
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <47C487AD.5050503@scalableinformatics.com>
References: <47B32FB1.60905@berkeley.edu>	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>	<47C32122.1090505@cse.ucdavis.edu>	<47C3297E.3060200@gmail.com>	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
	<47C487AD.5050503@scalableinformatics.com>
Message-ID: <47C6631B.7000609@cse.ucdavis.edu>

> The problem with many (cores|threads) is that memory bandwidth wall.  A 
> fixed size (B) pipe to memory, with N requesters on that pipe ...

What wall?  Bandwidth is easy, it just costs money, and not much at that. 
Want 50GB/sec[1] buy a $170 video card.  Want 100GB/sec... buy a better video 
card.  Want 200GB/sec buy 2.  Sure they don't have much memory (512-768MB) and 
of course no double (although I'm not sure if the now shipping 9600GT fixed 
that).  Sure video cards have minimal memory (512-768MB), no double precision 
on the normal cards [2], and are harder to program (CUDA vs the normal 
compilers).  Any programmed and CUDA and the IBM Cell chip that could comment 
on how hard it is to do something useful?  In any case, the reality and market 
acceptance of this approach seem to be aggressively closing.  Thus machines
with 16-32 threads/cores are becoming rather common (Sun T1000/T2000, quad
socket quad core Intel, and hopefully RSN 4-8 socket 4 core AMDs).

Seems like additional cores|threads are an excellent way to make use of tons 
of memory bandwidth in a latency tolerant fashion to get reasonable real world 
performance on applications that people actually care about (read that as 
willing to pay for).  All the while utilizing more commodity technology then 
the vector machines of yesteryear.

Latency on the other hand (especially when measured in clock cycles) is a 
wall, extremely hard to fix, and those nasty laws of physics keep getting in 
the way.

I don't see any particular reason why memory bandwidth can go through a full 
doublings in the near future if there was a market for it, last I checked 
nvidia was doing pretty well ;-)

[1] Sorry to use marketing bandwidth, I've not seen stream numbers for CUDA
     yet.  I hope to work on one though.  If anyone has numbers please speak
     up.
[2] The nvidia 8600/8800 are single precision AFAIK, no idea if the 9600GT
     is one of the new generation DP capable chips.


From john.hearns at streamline-computing.com  Thu Feb 28 00:38:11 2008
From: john.hearns at streamline-computing.com (John Hearns)
Date: Thu, 28 Feb 2008 08:38:11 +0000
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <47C6631B.7000609@cse.ucdavis.edu>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu>	<47C3297E.3060200@gmail.com>
	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
	<47C487AD.5050503@scalableinformatics.com>
	<47C6631B.7000609@cse.ucdavis.edu>
Message-ID: <1204187901.10209.15.camel@Vigor13>

On Wed, 2008-02-27 at 23:30 -0800, Bill Broadley wrote:

> 
> I don't see any particular reason why memory bandwidth can go through a full 
> doublings in the near future if there was a market for it, last I checked 
> nvidia was doing pretty well ;-)
> 
> [1] Sorry to use marketing bandwidth, I've not seen stream numbers for CUDA
>      yet.  I hope to work on one though.  If anyone has numbers please speak
>      up.

Some numbers here:

http://forums.nvidia.com/index.php?showtopic=52686


Running this on my 8600 card I get:

STREAM Benchmark implementation in CUDA
 Array size (single precision)=2000000
 using 128 threads per block, 15625 blocks
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:      291777.6696       0.0001       0.0001       0.0001
Scale:     291777.6696       0.0001       0.0001       0.0001
Add:       437666.5043       0.0001       0.0001       0.0001
Triad:     437666.5043       0.0001       0.0001       0.0001


All the 0.0001 don't look right to me, but its 8:30 here and I should
get on with the day job.


From landman at scalableinformatics.com  Thu Feb 28 05:20:01 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 28 Feb 2008 08:20:01 -0500
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <47C6631B.7000609@cse.ucdavis.edu>
References: <47B32FB1.60905@berkeley.edu>	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>	<47C32122.1090505@cse.ucdavis.edu>	<47C3297E.3060200@gmail.com>	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
	<47C487AD.5050503@scalableinformatics.com>
	<47C6631B.7000609@cse.ucdavis.edu>
Message-ID: <47C6B501.4080000@scalableinformatics.com>

Bill Broadley wrote:
>> The problem with many (cores|threads) is that memory bandwidth wall.  
>> A fixed size (B) pipe to memory, with N requesters on that pipe ...
> 
> What wall?  Bandwidth is easy, it just costs money, and not much at 
> that. Want 50GB/sec[1] buy a $170 video card.  Want 100GB/sec... buy a 

Heh... if it were that easy, we would spend extra on more bandwidth for 
Harpertown and Barcelona ...

The point is that the design determines your hard/fixed per socket 
limits, and no programming technique is going to get you around that 
limit per socket.  You need to change your programming technique to go 
many socket.  That limit is the bandwidth wall.

> better video card.  Want 200GB/sec buy 2.  Sure they don't have much 
> memory (512-768MB) and of course no double (although I'm not sure if the 
> now shipping 9600GT fixed that).  Sure video cards have minimal memory 
> (512-768MB), no double precision on the normal cards [2], and are harder 
> to program (CUDA vs the normal compilers).  Any programmed and CUDA and 
> the IBM Cell chip that could comment on how hard it is to do something 
> useful?  In any case, the reality and market acceptance of this approach 
> seem to be aggressively closing.  Thus machines
> with 16-32 threads/cores are becoming rather common (Sun T1000/T2000, quad
> socket quad core Intel, and hopefully RSN 4-8 socket 4 core AMDs).

My point was that it is going to get harder and harder to make effective 
use of those cores.

Basically, I have postulated elsewhere that all computing technology 
evolves to a point where it is bandwidth limited.  Each core on a 
Clovertown can completely swamp the memory controller, yet we put 8 of 
them on a motherboard.  This means that under particular workloads, the 
Clovertown (and Harpertown and Woodcrest) cores are being starved of 
data.  If you can't feed all of them (the cores) fast enough, you can't 
make efficient use of all them (the cores).

I do agree that 16/32 core machines are showing up, and we are excited 
about this, but I am concerned that expectations are not going to be 
met.  Doubling the number of cores won't necessarily double the 
processing power of the machines, especially if a few cores are idling 
while the system is under load, as they cannot get data to compute with.

> 
> Seems like additional cores|threads are an excellent way to make use of 
> tons of memory bandwidth in a latency tolerant fashion to get reasonable 
> real world performance on applications that people actually care about 
> (read that as willing to pay for).  All the while utilizing more 
> commodity technology then the vector machines of yesteryear.
> 
> Latency on the other hand (especially when measured in clock cycles) is 
> a wall, extremely hard to fix, and those nasty laws of physics keep 
> getting in the way.

Bandwidth is as much of a physical issue, but latency is harder.  You 
can overcome the bandwidth issue once we make the transition away from 
using fermions (spin 1/2 particles that follow Pauli's exlusion 
principle) to using boson's (spin 1 particles that can "sit atop" each 
other in configuration space ... sort of like photons).  As light based 
ALU's and logic units are not being developed rapidly at this point, I 
expect us to languish for a while in indirect band gap semiconductor 
(Silicon).  Even a direct band gap semiconductor would be faster ... 
GaAs (as the joke goes) is the material of the future ... and always 
will be (of the future).

> I don't see any particular reason why memory bandwidth can go through a 
> full doublings in the near future if there was a market for it, last I 
> checked nvidia was doing pretty well ;-)

I would like a 512 bit memory bus ... please!!!

> [1] Sorry to use marketing bandwidth, I've not seen stream numbers for CUDA
>     yet.  I hope to work on one though.  If anyone has numbers please speak
>     up.

Just saw one here, pretty impressive.

> [2] The nvidia 8600/8800 are single precision AFAIK, no idea if the 9600GT
>     is one of the new generation DP capable chips.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From landman at scalableinformatics.com  Thu Feb 28 05:53:36 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 28 Feb 2008 08:53:36 -0500
Subject: [Beowulf] What are people seeing performance-wise for NFS over 10GbE
Message-ID: <47C6BCE0.3050909@scalableinformatics.com>

Hi folks:

   Have a few simple setups that I am trying to figure out if I have a 
problem, or if what I am seeing is normal.

   Two 10 GbE cards, connected with a CX4 cable.  Server is one of our 
JackRabbits, with 750+ MB/s direct IO and 500-650 MB/s buffered IO 
(read-write), for IO about 10x system ram (~100x RAID cache).

   Getting ok iperf numbers, about 7 Gb/s single thread.  Running NFS, 
and seeing ~200-300 MB/s best case between the systems.  Is this what 
others have seen?  Latest drivers from vendor, sent a note to them to 
see if we can figure this out.  Figured I would tap into the 'wulf 
collective memory to learn what other people see.

   2.6.23.14 kernel on both sides, jumbo frames enabled. No switch, just 
a CX4 cable.

   My expectations are that we would be able to see 80-90% of the 
JackRabbits speed, with the rest being eaten by stack issues.  This is 
the case over channel bonded GbE.

   Do you see 500 MB/s or more?  Less?

   Thanks.

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From james.p.lux at jpl.nasa.gov  Thu Feb 28 06:50:27 2008
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Thu, 28 Feb 2008 06:50:27 -0800
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <47C6B501.4080000@scalableinformatics.com>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu> <47C3297E.3060200@gmail.com>
	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
	<47C487AD.5050503@scalableinformatics.com>
	<47C6631B.7000609@cse.ucdavis.edu>
	<47C6B501.4080000@scalableinformatics.com>
Message-ID: <20080228065027.6i2i0yft44ockcc0@webmail.jpl.nasa.gov>

Quoting Joe Landman <landman at scalableinformatics.com>, on Thu 28 Feb  
2008 05:20:01 AM PST:

> Bill Broadley wrote:
>>> The problem with many (cores|threads) is that memory bandwidth   
>>> wall.  A fixed size (B) pipe to memory, with N requesters on that   
>>> pipe ...
>>
>> What wall?  Bandwidth is easy, it just costs money, and not much at  
>>  that. Want 50GB/sec[1] buy a $170 video card.  Want 100GB/sec...   
>> buy a
>
> Heh... if it were that easy, we would spend extra on more bandwidth for
> Harpertown and Barcelona ...
>
> The point is that the design determines your hard/fixed per socket
> limits, and no programming technique is going to get you around that
> limit per socket.  You need to change your programming technique to go
> many socket.  That limit is the bandwidth wall.
>

And this is much the same as the earlier discussions on this list,  
when folks were building 8 and 16 processor clusters.  There, the  
bandwidth wall was the 10Mbps Ethernet interconnect, first through a  
hub, then a switch, etc.

This is sort of why any programming technique for speed up that relies  
on tight coupling (e.g. shared memory) can't scale infinitely.  At  
some point, the speed of light and physical size conspire to do you in.

If one wanted to design revolutionary distributed/parallel computing  
algorithms, one could probably work with floppy disks and sneakernet.   
If it works there, it will certainly work on any faster mechanism.   
See.. true computer science doesn't need a 1000 processor cluster.

Another cluster related computer science issue is to start dealing  
with unreliable links between the nodes of the cluster. The  
overwhelming majority of cluster codes assume that message passing is  
perfect and has no errors.  Sometimes this is provided transparently  
by the communications mechanism (i.e. TCP/IP promises in order,  
error-free delivery).  However, in the TCP case that comes at a cost..  
the latency isn't constant (because it achieves reliability by  
temporal redundancy:retries), and if your algorithm does some sort of  
scatter/gather and needs barrier synchronization, a late packet on one  
link brings the whole mass to a halt.

As data rates get higher, even really good bit error rates on the wire  
get to be too big.  Consider this.. a BER of 1E-10 is quite good, but  
if you're pumping 10Gb/s over the wire, that's an error every second.   
(A BER of 1E-10 is a typical rate for something like 100Mbps link...).  
  So, practical systems use some sort of FEC, but even with that, BERs  
of 1E-14 or 1E-15 are pretty much state of the art over shortish  
(meters) distances.  (It's a power/signal to noise ratio thing..How  
much energy can you put into sending one bit of information?)

Jim


From hahn at mcmaster.ca  Thu Feb 28 07:33:07 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Thu, 28 Feb 2008 10:33:07 -0500 (EST)
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <20080228065027.6i2i0yft44ockcc0@webmail.jpl.nasa.gov>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu> <47C3297E.3060200@gmail.com>
	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
	<47C487AD.5050503@scalableinformatics.com>
	<47C6631B.7000609@cse.ucdavis.edu>
	<47C6B501.4080000@scalableinformatics.com>
	<20080228065027.6i2i0yft44ockcc0@webmail.jpl.nasa.gov>
Message-ID: <Pine.LNX.4.64.0802281012030.24158@coffee.psychology.mcmaster.ca>

>>>> The problem with many (cores|threads) is that memory bandwidth  wall.  A 
>>>> fixed size (B) pipe to memory, with N requesters on that  pipe ...
>>> 
>>> What wall?  Bandwidth is easy, it just costs money, and not much at  that. 
>>> Want 50GB/sec[1] buy a $170 video card.  Want 100GB/sec...  buy a
>> 
>> Heh... if it were that easy, we would spend extra on more bandwidth for
>> Harpertown and Barcelona ...

I think the point is that chip vendors are not talking about mere
doubling of number of cores, but (apparently with straight faces),
things like 1k GP cores/chip.

personally, I think they're in for a surprise - that there isn't a vast
market for more than 2-4 cores per chip.

>> limits, and no programming technique is going to get you around that
>> limit per socket.  You need to change your programming technique to go
>> many socket.  That limit is the bandwidth wall.

IMO, this is the main fallacy behind the current industry harangue.
the problem is _NOT_ that programmers are dragging their feet, 
but rather some combination of amdahl's law and the low average _inherent_
parallelism of computation.  (I'm _not_ talking about MC or graphics 
rendering here, but today's most common computer uses: web and email.)

the manycore cart is being put before the horse.  worse, no one has really
shown that manycore (and the presumed ccnuma model) is actually scalable 
to large values on "normal" workloads.  (getting good scaling for an AM
CFD code on 128 cores in an Altix is kind of a different proposition than
scaling to 128 cores in a single chip.)

as far as I know, all current examples of large ccnuma scaling are 
premised on core:memory ratios of about 4:1 (4 it2 cores per bank of 
dram in an Altix, for instance.)  I don't doubt that we can improve 
memory bandwidth (and concurrency) per chip, but it's not an area-driven
process, so will never keep up.

so: do an exponential and a sublinear trend diverge?  yes: meet memory wall.

what's missing is a reason to think that basically all workloads can be made
cache-friendly enough to scale to 10e2 or 10e3 cores.  I just don't see that.

it's really a memory-to-core issue: from what I see, the goal should be 
something in the range of 1GB per core.  there are examples up to 10G/core
and down to 100M/core, but not really beyond that.  (except for stream
processing, which is great stuff but _cries_ for non-general-purpose HW.)

> As data rates get higher, even really good bit error rates on the wire get to 
> be too big.  Consider this.. a BER of 1E-10 is quite good, but if you're 
> pumping 10Gb/s over the wire, that's an error every second.  (A BER of 1E-10 
> is a typical rate for something like 100Mbps link...).  So, practical systems

I'm no expert, but 1e-10 seems quite high to me.  the docs I found about 10G
requirements all specified 1e-12, and claimed to have achieved 1e15 in
realistic, long-range tests...


From james.p.lux at jpl.nasa.gov  Thu Feb 28 08:17:04 2008
From: james.p.lux at jpl.nasa.gov (Jim Lux)
Date: Thu, 28 Feb 2008 08:17:04 -0800
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <Pine.LNX.4.64.0802281012030.24158@coffee.psychology.mcmaster.ca>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu> <47C3297E.3060200@gmail.com>
	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
	<47C487AD.5050503@scalableinformatics.com>
	<47C6631B.7000609@cse.ucdavis.edu>
	<47C6B501.4080000@scalableinformatics.com>
	<20080228065027.6i2i0yft44ockcc0@webmail.jpl.nasa.gov>
	<Pine.LNX.4.64.0802281012030.24158@coffee.psychology.mcmaster.ca>
Message-ID: <20080228081704.i73mb2mhwk40ks88@webmail.jpl.nasa.gov>

Quoting Mark Hahn <hahn at mcmaster.ca>, on Thu 28 Feb 2008 07:33:07 AM PST:

>>>>> The problem with many (cores|threads) is that memory bandwidth    
>>>>> wall.  A fixed size (B) pipe to memory, with N requesters on   
>>>>> that  pipe ...
>>>>
>>>> What wall?  Bandwidth is easy, it just costs money, and not much   
>>>> at  that. Want 50GB/sec[1] buy a $170 video card.  Want   
>>>> 100GB/sec...  buy a
>>>
>>> Heh... if it were that easy, we would spend extra on more bandwidth for
>>> Harpertown and Barcelona ...
>
> I think the point is that chip vendors are not talking about mere
> doubling of number of cores, but (apparently with straight faces),
> things like 1k GP cores/chip.
>
> personally, I think they're in for a surprise - that there isn't a vast
> market for more than 2-4 cores per chip.

Perhaps not today.  But then, Thomas Watson said there wasn't a vast  
market for computers.. perhaps 5 world wide.

No question that folks will have to figure out how to effectively use  
all that parallelism.  (e.g. each processor deals with one page of a  
Word document, or a range of Excel cells?).  I can see a lot of fairly  
easily coded things dealing with rapid search (e.g. which of my  
documents have the word hyperthreading and Hahn in them).  Right now,  
search and retrieval of unstructured data is a very computationally  
intensive task that millions of folks suffer through daily. (How many  
of you find Google over the web faster than Microsoft's "Search for  
File or Folder.." (or, greping the entire disk) on your local machine? )


And we cluster dweebs have a headstart on them... we've been dealing  
with figuring out how to spread problems that are too big to fit on  
one node across multiples for years now.  After all, billg's  
programming fame is from a flood fill graphics algorithm, and look how  
well he's done with that <grin>.


>
>>> limits, and no programming technique is going to get you around that
>>> limit per socket.  You need to change your programming technique to go
>>> many socket.  That limit is the bandwidth wall.
>
> IMO, this is the main fallacy behind the current industry harangue.
> the problem is _NOT_ that programmers are dragging their feet, but
> rather some combination of amdahl's law and the low average _inherent_
> parallelism of computation.  (I'm _not_ talking about MC or graphics
> rendering here, but today's most common computer uses: web and email.)

Text search and retrieval is where it's at.  almost 30 years ago I  
worked on developing a piece of office equipment the size of a 2  
drawer filecabinet that would do just that, hooked up to a bunch of  
word processors (i.e. find me that letter we sent to John Smith).. It  
was expensive! It had a 80MB (or 160MB) disk drive (huge!), it could  
search thousands of pages in the blink of an eye. (called the  
OFISfile, sold by Burroughs)  And people DID buy it.  And, without  
giving away the internals, it could have made excellent use of a 1000  
core type processor.

Granted, the googles of the world will (correctly) contend that an  
equally good solution is to have a good comm link to a centralized  
search and retrieval engine (doesn't even have to be that fast.. just  
comparable to the time it takes me to enter the request and read the  
results).  But, they too can use parallelism.


>
> the manycore cart is being put before the horse.  worse, no one has really
> shown that manycore (and the presumed ccnuma model) is actually
> scalable to large values on "normal" workloads.  (getting good scaling
> for an AM
> CFD code on 128 cores in an Altix is kind of a different proposition than
> scaling to 128 cores in a single chip.)

To a certain extent it's an example of  build it and they will come  
(to 10% of the things that are built, the other 90% are interesting  
blips left by the side of the road).

When compilers were introduced, I'm sure the skilled machine language  
coders said.. hmmph, we can do just fine with our octal and hex,  
there's no expressed demand for high level languages. (Kids..get offa  
my lawn!)  Heck, the plugboard programmers on EAM equipment probably  
said that to the guys working with stored program computers. And  
before that, the supervisor of the computer pool probably  said that  
to the plugboard guys, as he gazed over a room full of Marchand  
calculators with computers punching numbers and pulling the handles.


>
> what's missing is a reason to think that basically all workloads can be made
> cache-friendly enough to scale to 10e2 or 10e3 cores.  I just don't see that.

Not all workloads... just enough so that it forms a significant  
market. and text search and retrieval is a pretty big consumer of CPU  
cycles, in the big wide world (as opposed to the specialized world of  
large numeric simulations and the like that have historically been  
hosted on clusters)

Remember, the recurring cost is basically related to the size of the  
die, not what's on it.  So, if there's a significant market for 10,000  
processor widgets, they'll be made, and cheaply.

>> As data rates get higher, even really good bit error rates on the   
>> wire get to be too big.  Consider this.. a BER of 1E-10 is quite   
>> good, but if you're pumping 10Gb/s over the wire, that's an error   
>> every second.  (A BER of 1E-10 is a typical rate for something like  
>>  100Mbps link...). So, practical systems
>
> I'm no expert, but 1e-10 seems quite high to me.  the docs I found about 10G
> requirements all specified 1e-12, and claimed to have achieved 1e15 in
> realistic, long-range tests...

That's probably the error rate above the PHY layer. I.e. after the  
forward error correction.  And the 10G requirement is tighter than the  
100Mbps requirement, just to make FEC possible with reasonable  
redundancy.  Typically, you want a raw PHY BER at least 100x away from  
the data rate (e.g. 1E8 bps->1E-10 BER, 1E10 bps->1E-12 BER)


From diep at xs4all.nl  Thu Feb 28 09:13:40 2008
From: diep at xs4all.nl (Vincent Diepeveen)
Date: Thu, 28 Feb 2008 18:13:40 +0100
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <Pine.LNX.4.64.0802281012030.24158@coffee.psychology.mcmaster.ca>
References: <47B32FB1.60905@berkeley.edu>
	<67E3D08F-18FE-48CF-BFEC-8E07DEDBDA38@xs4all.nl>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu> <47C3297E.3060200@gmail.com>
	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
	<47C487AD.5050503@scalableinformatics.com>
	<47C6631B.7000609@cse.ucdavis.edu>
	<47C6B501.4080000@scalableinformatics.com>
	<20080228065027.6i2i0yft44ockcc0@webmail.jpl.nasa.gov>
	<Pine.LNX.4.64.0802281012030.24158@coffee.psychology.mcmaster.ca>
Message-ID: <DFAF5B5E-C31A-4A28-AA29-91A4F478E87C@xs4all.nl>


On Feb 28, 2008, at 4:33 PM, Mark Hahn wrote:

>>>>> The problem with many (cores|threads) is that memory bandwidth   
>>>>> wall.  A fixed size (B) pipe to memory, with N requesters on  
>>>>> that  pipe ...
>>>> What wall?  Bandwidth is easy, it just costs money, and not much  
>>>> at  that. Want 50GB/sec[1] buy a $170 video card.  Want 100GB/ 
>>>> sec...  buy a
>>> Heh... if it were that easy, we would spend extra on more  
>>> bandwidth for
>>> Harpertown and Barcelona ...
>
> I think the point is that chip vendors are not talking about mere
> doubling of number of cores, but (apparently with straight faces),
> things like 1k GP cores/chip.
>
> personally, I think they're in for a surprise - that there isn't a  
> vast
> market for more than 2-4 cores per chip.

Microsoft might give a helping hand there by writing their own  
software more user friendly,
requiring somehow heavier processors :)

>>> limits, and no programming technique is going to get you around that
>>> limit per socket.  You need to change your programming technique  
>>> to go
>>> many socket.  That limit is the bandwidth wall.
>
> IMO, this is the main fallacy behind the current industry harangue.
> the problem is _NOT_ that programmers are dragging their feet, but  
> rather some combination of amdahl's law and the low average _inherent_
> parallelism of computation.  (I'm _not_ talking about MC or  
> graphics rendering here, but today's most common computer uses: web  
> and email.)
>
> the manycore cart is being put before the horse.  worse, no one has  
> really
> shown that manycore (and the presumed ccnuma model) is actually  
> scalable to large values on "normal" workloads.  (getting good  
> scaling for an AM
> CFD code on 128 cores in an Altix is kind of a different  
> proposition than
> scaling to 128 cores in a single chip.)
>
> as far as I know, all current examples of large ccnuma scaling are  
> premised on core:memory ratios of about 4:1 (4 it2 cores per bank  
> of dram in an Altix, for instance.)  I don't doubt that we can  
> improve memory bandwidth (and concurrency) per chip, but it's not  
> an area-driven
> process, so will never keep up.
>
> so: do an exponential and a sublinear trend diverge?  yes: meet  
> memory wall.
>
> what's missing is a reason to think that basically all workloads  
> can be made
> cache-friendly enough to scale to 10e2 or 10e3 cores.  I just don't  
> see that.
>

Some fields which are overly represented in this mailing list just  
require more RAM rather than cpu.
Just a few fields have embarrassingly parallel software that only  
needs cpu power and not much of a RAM.
Most of them are either encryption or security related searches.

There is however a growing number of fields where communication speed  
between the processors is very important,
not so much the bandwidth, but rather the latency.

As CPU's are that fast nowadays that algorithms can kick in where  
branching factor,
practically that is the time needed to get 1 step or iteration in the  
process further,
is heavily dependant upon communication speed between the processors  
and especially
reusing data stored in the (huge) RAM of other memory-nodes.

In the long run of course many fields will converge to such types of  
algorithms, field after field is
inventing algorithms like that, which is a logical consequence of the  
progress in hardware.

Todays highend hardware ALLOWS complex algorithms to get invented,  
which IMHO is a good thing.

Now let's invent something that makes me coffee :)


> it's really a memory-to-core issue: from what I see, the goal  
> should be something in the range of 1GB per core.  there are  
> examples up to 10G/core

1GB a core are standards that most supercomputers had a year or 8 ago.

It's quite interesting to see how RAM and latency between cpu's  
hasn't kept up pace with cpu crunching power.


> and down to 100M/core, but not really beyond that.  (except for stream
> processing, which is great stuff but _cries_ for non-general- 
> purpose HW.)
>
>> As data rates get higher, even really good bit error rates on the  
>> wire get to be too big.  Consider this.. a BER of 1E-10 is quite  
>> good, but if you're pumping 10Gb/s over the wire, that's an error  
>> every second.  (A BER of 1E-10 is a typical rate for something  
>> like 100Mbps link...).  So, practical systems
>
> I'm no expert, but 1e-10 seems quite high to me.  the docs I found  
> about 10G
> requirements all specified 1e-12, and claimed to have achieved 1e15 in
> realistic, long-range tests...

In the end the conclusion will be of course that we need a newer  
proces technology from ASML very badly
to get into production to produce CPU's with even more transistors in  
order to push some of the problems to the 1GB L3 cache :)

> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>


From ppk at ats.ucla.edu  Wed Feb 27 15:35:11 2008
From: ppk at ats.ucla.edu (Korambath, Prakashan)
Date: Wed, 27 Feb 2008 15:35:11 -0800
Subject: [Beowulf] Open source Job Scheduler for Apple Leopard 10.5.2 server
	that will work with Open Directory?
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net><02A63D14-3E34-4C0E-A012-D491922AC023@ee.duke.edu><Pine.LNX.4.64.0802061312001.20835@cain.rgb.private.net>
	<220FE1C2-C27A-4B94-8060-D4D78DFCF50A@staff.uni-marburg.de>
Message-ID: <43F64E86355A744E9D51506B6C6783B9021AE68C@EM2.ad.ucla.edu>

Anyone knows an Open source job scheduler for Apple Leopard 10.5.2 server that will work with Open Directory?  SGE seems to have intermittent problems, Condor and Torque supports only 10.4. Tiger. Thanks.

Prakashan Korambath

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080227/71dc472b/attachment.html>

From peter.st.john at gmail.com  Thu Feb 28 10:07:59 2008
From: peter.st.john at gmail.com (Peter St. John)
Date: Thu, 28 Feb 2008 13:07:59 -0500
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <20080228065027.6i2i0yft44ockcc0@webmail.jpl.nasa.gov>
References: <47B32FB1.60905@berkeley.edu>
	<19977.80.138.118.154.1203968408.squirrel@webmail.sonic.net>
	<e4d4fd070802251158y1e5bb0fasc6e04bbd2d9ab6b2@mail.gmail.com>
	<47C32122.1090505@cse.ucdavis.edu> <47C3297E.3060200@gmail.com>
	<a8d96dec0802261303g25f2ff22l83d6a5b609bffe1b@mail.gmail.com>
	<47C487AD.5050503@scalableinformatics.com>
	<47C6631B.7000609@cse.ucdavis.edu>
	<47C6B501.4080000@scalableinformatics.com>
	<20080228065027.6i2i0yft44ockcc0@webmail.jpl.nasa.gov>
Message-ID: <e4d4fd070802281007y6e381fc3ua37190226d92d941@mail.gmail.com>

Jim,
Just re:

"If one wanted to design revolutionary distributed/parallel computing
algorithms, one could probably work with floppy disks and sneakernet. If it
works there, it will certainly work on any faster mechanism.
See.. true computer science doesn't need a 1000 processor cluster."

I agree of course, I grew up on paper and pencil, but there's one proviso:
the process of trial and error is just murder with floppies and sneakernet.
If trial and error is part of your development process, then a thousand
processor cluster (whatever powerful hardware) is so convenient as to be
qualitatively necessary. It still astounds me that Holland came up with
Genetic Algorithms in the sixties, though; so you're right it's possible to
do without.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080228/d20b85fc/attachment.html>

From m.janssens at opencfd.co.uk  Thu Feb 28 10:09:12 2008
From: m.janssens at opencfd.co.uk (Mattijs Janssens)
Date: Thu, 28 Feb 2008 18:09:12 +0000
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <1204187901.10209.15.camel@Vigor13>
References: <47B32FB1.60905@berkeley.edu> <47C6631B.7000609@cse.ucdavis.edu>
	<1204187901.10209.15.camel@Vigor13>
Message-ID: <200802281809.12329.m.janssens@opencfd.co.uk>

How do your Rate numbers correlate to the max bandwitdh of 32GB/s 
(http://en.wikipedia.org/wiki/GeForce_8_Series)?

Or do these threads all operate on the same data?

Mattijs

On Thursday 28 February 2008 08:38, John Hearns wrote:
> On Wed, 2008-02-27 at 23:30 -0800, Bill Broadley wrote:
> > I don't see any particular reason why memory bandwidth can go through a
> > full doublings in the near future if there was a market for it, last I
> > checked nvidia was doing pretty well ;-)
> >
> > [1] Sorry to use marketing bandwidth, I've not seen stream numbers for
> > CUDA yet.  I hope to work on one though.  If anyone has numbers please
> > speak up.
>
> Some numbers here:
>
> http://forums.nvidia.com/index.php?showtopic=52686
>
>
> Running this on my 8600 card I get:
>
> STREAM Benchmark implementation in CUDA
>  Array size (single precision)=2000000
>  using 128 threads per block, 15625 blocks
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:      291777.6696       0.0001       0.0001       0.0001
> Scale:     291777.6696       0.0001       0.0001       0.0001
> Add:       437666.5043       0.0001       0.0001       0.0001
> Triad:     437666.5043       0.0001       0.0001       0.0001
>
>
> All the 0.0001 don't look right to me, but its 8:30 here and I should
> get on with the day job.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf


From bill at cse.ucdavis.edu  Thu Feb 28 12:22:58 2008
From: bill at cse.ucdavis.edu (Bill Broadley)
Date: Thu, 28 Feb 2008 12:22:58 -0800
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <200802281809.12329.m.janssens@opencfd.co.uk>
References: <47B32FB1.60905@berkeley.edu>
	<47C6631B.7000609@cse.ucdavis.edu>	<1204187901.10209.15.camel@Vigor13>
	<200802281809.12329.m.janssens@opencfd.co.uk>
Message-ID: <47C71822.5010300@cse.ucdavis.edu>

Mattijs Janssens wrote:
> How do your Rate numbers correlate to the max bandwitdh of 32GB/s 
> (http://en.wikipedia.org/wiki/GeForce_8_Series)?
> 
> Or do these threads all operate on the same data?

My first guess was some kind of caching, after all 2M floats is only 8MB.  But 
I couldn't reproduct it on my 8600GT so I'm guessing it's a timing issue.

I downloaded the source, compiled:
/usr/local/cuda/bin/nvcc -O3 -o stream stream.cu

Ran it:
./stream
  STREAM Benchmark implementation in CUDA
  Array size (single precision)=2000000
  using 128 threads per block, 15625 blocks
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       16596.1294       0.0010       0.0010       0.0010
Scale:      16581.7649       0.0010       0.0010       0.0010
Add:        18750.8822       0.0013       0.0013       0.0013
Triad:      18736.6081       0.0013       0.0013       0.0013

I maade the array 4 times bigger:
  STREAM Benchmark implementation in CUDA
  Array size (single precision)=8000000
  using 128 threads per block, 62500 blocks
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       16706.3212       0.0039       0.0038       0.0044
Scale:      16666.2770       0.0046       0.0038       0.0100
Add:        18408.0866       0.0053       0.0052       0.0056
Triad:      18738.6603       0.0052       0.0051       0.0055

Stream numbers that are 50% of marketing numbers seem relatively common.

I'm not that familiar with CUDA, this ran on a video card that happens to be 
driving my 1920x1200 display, I might get better numbers if I turned off
compiz, let alone X11.

Kudos to Nvidia for having a linux friendly toolchain that I could find, 
download, install, and compile a code with minimal hassle.


From landman at scalableinformatics.com  Thu Feb 28 12:47:48 2008
From: landman at scalableinformatics.com (Joe Landman)
Date: Thu, 28 Feb 2008 15:47:48 -0500
Subject: [Beowulf] What are people seeing performance-wise for NFS over
	10GbE
In-Reply-To: <412838240.20080228214217@gmx.net>
References: <47C6BCE0.3050909@scalableinformatics.com>
	<412838240.20080228214217@gmx.net>
Message-ID: <47C71DF4.40701@scalableinformatics.com>

Jan Heichler wrote:
> Hallo Joe,
> 
> Donnerstag, 28. Februar 2008, meintest Du:

[...]

> The best i saw for NFS over 10 GE was about 350-400 MB/s write and about 450 MB/s read.
> Single server to 8 simultaneous accessing clients (aggregated performance). 

Hi Jan:

   Ok. Thanks.  This is quite helpful.

> 
> On the blockdevice i got 550 MB/s write and 1.1 GB/s read performance. 

Using iSCSI?  To real disks or ramdisk/nullio? Most of the benchmarks I 
have seen online have been to nullio or ramdisks.  We are going to real 
disks.

> JL>    2.6.23.14 kernel on both sides, jumbo frames enabled. No switch, just
> JL> a CX4 cable.
> 
> rsize/wsize are set to? 

I tried a range: 8k through 64k

> 
> NFS3 or NFS4? 

3.

Thanks!

Joe

> 
> Cheers,
> Jan


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


From hahn at mcmaster.ca  Thu Feb 28 12:45:52 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Thu, 28 Feb 2008 15:45:52 -0500 (EST)
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <200802281809.12329.m.janssens@opencfd.co.uk>
References: <47B32FB1.60905@berkeley.edu> <47C6631B.7000609@cse.ucdavis.edu>
	<1204187901.10209.15.camel@Vigor13>
	<200802281809.12329.m.janssens@opencfd.co.uk>
Message-ID: <Pine.LNX.4.64.0802281535090.8692@coffee.psychology.mcmaster.ca>

> How do your Rate numbers correlate to the max bandwitdh of 32GB/s
> (http://en.wikipedia.org/wiki/GeForce_8_Series)?

good point.  I had assumed the quoted numbers were merely in-cache,
but it does claim to be running on array size 2e6 (8e6 bytes),
which seems a bit large for in-cache.  (though very small for a Stream run).

>> http://forums.nvidia.com/index.php?showtopic=52686

this quotes a plausible 64-65 GB/s on a C870 (76.8 peak theoretical).

>> Running this on my 8600 card I get:
>>
>> STREAM Benchmark implementation in CUDA
>>  Array size (single precision)=2000000
>>  using 128 threads per block, 15625 blocks
>> Function      Rate (MB/s)   Avg time     Min time     Max time
>> Copy:      291777.6696       0.0001       0.0001       0.0001
>> Scale:     291777.6696       0.0001       0.0001       0.0001
>> Add:       437666.5043       0.0001       0.0001       0.0001
>> Triad:     437666.5043       0.0001       0.0001       0.0001

this is implausible.  my guess is the timing code is broken.


From hahn at mcmaster.ca  Thu Feb 28 13:02:00 2008
From: hahn at mcmaster.ca (Mark Hahn)
Date: Thu, 28 Feb 2008 16:02:00 -0500 (EST)
Subject: [Beowulf] Opinions of Hyper-threading?
In-Reply-To: <47C71822.5010300@cse.ucdavis.edu>
References: <47B32FB1.60905@berkeley.edu> <47C6631B.7000609@cse.ucdavis.edu>
	<1204187901.10209.15.camel@Vigor13>
	<200802281809.12329.m.janssens@opencfd.co.uk>
	<47C71822.5010300@cse.ucdavis.edu>
Message-ID: <Pine.LNX.4.64.0802281555490.8692@coffee.psychology.mcmaster.ca>

> STREAM Benchmark implementation in CUDA
> Array size (single precision)=8000000
> using 128 threads per block, 62500 blocks
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:       16706.3212       0.0039       0.0038       0.0044
> Scale:      16666.2770       0.0046       0.0038       0.0100
> Add:        18408.0866       0.0053       0.0052       0.0056
> Triad:      18738.6603       0.0052       0.0051       0.0055

I got
  STREAM Benchmark implementation in CUDA
  Array size (single precision)=8000000
  using 128 threads per block, 62500 blocks
Copy:       50006.6051       0.0013       0.0013       0.0013
Scale:      50006.6051       0.0013       0.0013       0.0013
Add:        56409.8044       0.0017       0.0017       0.0017
Triad:      56409.8044       0.0017       0.0017       0.0017

on a "nVidia Corporation G80 [Quadro FX 4600] (rev a2)".
wikipedia quotes 67.2 GB/s theoretical.

it didn't matter whether the machine was in init 3 or 5, though the X 
config was just an idle 1280x1024 server.

> Kudos to Nvidia for having a linux friendly toolchain that I could find, 
> download, install, and compile a code with minimal hassle.

absolutely.  AMD has really dropped the ball on this, even though it looks
like they at least announced availability of DP earlier...


From jan.heichler at gmx.net  Thu Feb 28 12:42:17 2008
From: jan.heichler at gmx.net (Jan Heichler)
Date: Thu, 28 Feb 2008 21:42:17 +0100
Subject: [Beowulf] What are people seeing performance-wise for NFS over
	10GbE
In-Reply-To: <47C6BCE0.3050909@scalableinformatics.com>
References: <47C6BCE0.3050909@scalableinformatics.com>
Message-ID: <412838240.20080228214217@gmx.net>

Hallo Joe,

Donnerstag, 28. Februar 2008, meintest Du:

JL>    Have a few simple setups that I am trying to figure out if I have a
JL> problem, or if what I am seeing is normal.

JL>    Two 10 GbE cards, connected with a CX4 cable.  Server is one of our
JL> JackRabbits, with 750+ MB/s direct IO and 500-650 MB/s buffered IO 
JL> (read-write), for IO about 10x system ram (~100x RAID cache).

JL>    Getting ok iperf numbers, about 7 Gb/s single thread.  Running NFS,
JL> and seeing ~200-300 MB/s best case between the systems.  Is this what 
JL> others have seen?  Latest drivers from vendor, sent a note to them to 
JL> see if we can figure this out.  Figured I would tap into the 'wulf 
JL> collective memory to learn what other people see.

The best i saw for NFS over 10 GE was about 350-400 MB/s write and about 450 MB/s read.
Single server to 8 simultaneous accessing clients (aggregated performance). 

On the blockdevice i got 550 MB/s write and 1.1 GB/s read performance. 


JL>    2.6.23.14 kernel on both sides, jumbo frames enabled. No switch, just
JL> a CX4 cable.

rsize/wsize are set to? 

NFS3 or NFS4? 

Cheers,
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080228/29412ece/attachment.html>

From jan.heichler at gmx.net  Thu Feb 28 13:05:13 2008
From: jan.heichler at gmx.net (Jan Heichler)
Date: Thu, 28 Feb 2008 22:05:13 +0100
Subject: [Beowulf] What are people seeing performance-wise for NFS over
	10GbE
In-Reply-To: <47C71DF4.40701@scalableinformatics.com>
References: <47C6BCE0.3050909@scalableinformatics.com>
	<412838240.20080228214217@gmx.net>
	<47C71DF4.40701@scalableinformatics.com>
Message-ID: <1019924365.20080228220513@gmx.net>

Hallo Joe,

Donnerstag, 28. Februar 2008, meintest Du:


>> The best i saw for NFS over 10 GE was about 350-400 MB/s write and about 450 MB/s read.
>> Single server to 8 simultaneous accessing clients (aggregated performance). 

The clients had a 1 gig uplink...  

JL> Hi Jan:

JL>    Ok. Thanks.  This is quite helpful.


>> On the blockdevice i got 550 MB/s write and 1.1 GB/s read performance. 

JL> Using iSCSI?  To real disks or ramdisk/nullio? Most of the benchmarks I
JL> have seen online have been to nullio or ramdisks.  We are going to real
JL> disks.

Real disks. 16 SAS 15k Disks on a LSI 8888 Controller in RAID-5. Connected through a x4 SAS connection on a backplane. Because of the x4 SAS connection the read rate is limited to 1.1 gig/s - with discrete connections to the drives the read speed should be 50% higher. 

I couldn't get a tmpfs exported over NFS - but i did not try very hard on that because it does not make any sense for practical usage - just to find out if NFS itself is the bottleneck.

I tried several configs including software raid-0. The performance was a disaster compared to theoretical values. 

>> JL>    2.6.23.14 kernel on both sides, jumbo frames enabled. No switch, just
>> JL> a CX4 cable.

>> rsize/wsize are set to? 

JL> I tried a range: 8k through 64k

Okay. That was the most important improvement i could do. The first kernel i used did not allow to go over 8k - with 32k if was much faster. 

>> NFS3 or NFS4? 

JL> 3.

I saw no performance improve with NFS4 i have to say. Everybody i talked to pointed at the bad NFS performance of linux (and many said: use solaris - it is much faster ;-) )

Cheers,
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080228/b4f1dfdf/attachment.html>

From libo at buaa.edu.cn  Thu Feb 28 16:14:43 2008
From: libo at buaa.edu.cn (Li, Bo)
Date: Fri, 29 Feb 2008 08:14:43 +0800
Subject: [Beowulf] What are people seeing performance-wise for NFS over
	10GbE
References: <47C6BCE0.3050909@scalableinformatics.com>
Message-ID: <002301c87a68$1a7d8400$6300a8c0@JSIIBM>

Just a silly question, Is there anybody got the similiar benchmark for Samba or Windows file sharing? 
Regards,
Li, Bo
----- Original Message ----- 
From: "Joe Landman" <landman at scalableinformatics.com>
To: "Beowulf Mailing List" <beowulf at beowulf.org>
Sent: Thursday, February 28, 2008 9:53 PM
Subject: [Beowulf] What are people seeing performance-wise for NFS over 10GbE


> Hi folks:
> 
>   Have a few simple setups that I am trying to figure out if I have a 
> problem, or if what I am seeing is normal.
> 
>   Two 10 GbE cards, connected with a CX4 cable.  Server is one of our 
> JackRabbits, with 750+ MB/s direct IO and 500-650 MB/s buffered IO 
> (read-write), for IO about 10x system ram (~100x RAID cache).
> 
>   Getting ok iperf numbers, about 7 Gb/s single thread.  Running NFS, 
> and seeing ~200-300 MB/s best case between the systems.  Is this what 
> others have seen?  Latest drivers from vendor, sent a note to them to 
> see if we can figure this out.  Figured I would tap into the 'wulf 
> collective memory to learn what other people see.
> 
>   2.6.23.14 kernel on both sides, jumbo frames enabled. No switch, just 
> a CX4 cable.
> 
>   My expectations are that we would be able to see 80-90% of the 
> JackRabbits speed, with the rest being eaten by stack issues.  This is 
> the case over channel bonded GbE.
> 
>   Do you see 500 MB/s or more?  Less?
> 
>   Thanks.
> 
> Joe
> 
> -- 
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web  : http://www.scalableinformatics.com
>        http://jackrabbit.scalableinformatics.com
> phone: +1 734 786 8423
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>


From dag at sonsorol.org  Thu Feb 28 16:49:46 2008
From: dag at sonsorol.org (Chris Dagdigian)
Date: Thu, 28 Feb 2008 19:49:46 -0500
Subject: [Beowulf] Open source Job Scheduler for Apple Leopard 10.5.2
	server that will work with Open Directory?
In-Reply-To: <43F64E86355A744E9D51506B6C6783B9021AE68C@EM2.ad.ucla.edu>
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net><02A63D14-3E34-4C0E-A012-D491922AC023@ee.duke.edu><Pine.LNX.4.64.0802061312001.20835@cain.rgb.private.net>
	<220FE1C2-C27A-4B94-8060-D4D78DFCF50A@staff.uni-marburg.de>
	<43F64E86355A744E9D51506B6C6783B9021AE68C@EM2.ad.ucla.edu>
Message-ID: <C3C1C79F-2B7F-43D2-AEDD-E572F8837A24@sonsorol.org>


My colleague Bill Van Etten appears to have fixed the SGE and Open  
Directory issues with recent versions of Grid Engine (SGE). Some minor  
patches are required as is compiling the binaries yourself from source  
(until changes get merged into the codebase).

This document describes the patch and build process:

http://gridengine.info/articles/2008/02/25/building-6-1u3-on-mac-osx-10-5-2-leopard-server

So far 2 organizations other than our own have reported success with  
this method. No promises though!

Regards,
Chris


On Feb 27, 2008, at 6:35 PM, Korambath, Prakashan wrote:

> Anyone knows an Open source job scheduler for Apple Leopard 10.5.2  
> server that will work with Open Directory?  SGE seems to have  
> intermittent problems, Condor and Torque supports only 10.4. Tiger.  
> Thanks.
>
> Prakashan Korambath
>


From deadline at clustermonkey.net  Fri Feb 29 05:45:14 2008
From: deadline at clustermonkey.net (Douglas Eadline)
Date: Fri, 29 Feb 2008 08:45:14 -0500 (EST)
Subject: [Beowulf] Harpertown Numbers
Message-ID: <52768.192.168.1.1.1204292714.squirrel@mail.eadline.org>


I finally got around to posting some of my
Harpertown numbers on ClusterMonkey. These numbers were
part of a white paper I wrote for Appro, so I could not
post the whole thing. You can get the full white paper
by going to their website and filling out a
form. (link is in the CM article)

Or stay tuned, I will be running other tests
with the same hardware real soon.

Story is here:

 http://www.clustermonkey.net//content/view/224/34/


--
Doug


From ppk at ats.ucla.edu  Thu Feb 28 19:18:37 2008
From: ppk at ats.ucla.edu (Korambath, Prakashan)
Date: Thu, 28 Feb 2008 19:18:37 -0800
Subject: [Beowulf] Open source Job Scheduler for Apple Leopard 10.5.2
	server that will work with Open Directory?
References: <Pine.LNX.4.64.0802061035340.13167@cain.rgb.private.net><02A63D14-3E34-4C0E-A012-D491922AC023@ee.duke.edu><Pine.LNX.4.64.0802061312001.20835@cain.rgb.private.net>
	<220FE1C2-C27A-4B94-8060-D4D78DFCF50A@staff.uni-marburg.de>
	<43F64E86355A744E9D51506B6C6783B9021AE68C@EM2.ad.ucla.edu>
	<C3C1C79F-2B7F-43D2-AEDD-E572F8837A24@sonsorol.org>
Message-ID: <43F64E86355A744E9D51506B6C6783B9021AE6A2@EM2.ad.ucla.edu>

Thank you very much Chris for your efforts.  Problem is that even with the latest patch as you described someone has to continously restart sgeexecd daemon as and when jobs go into error state.   It worked for a while. Now it doesn't work even though I restarted sgeexecd several times.

Prakashan


-----Original Message-----
From: Chris Dagdigian [mailto:dag at sonsorol.org]
Sent: Thu 2/28/2008 4:49 PM
To: Korambath, Prakashan; Beowulf Mailing List
Subject: Re: [Beowulf] Open source Job Scheduler for Apple Leopard 10.5.2 server that will work with Open Directory?
 

My colleague Bill Van Etten appears to have fixed the SGE and Open  
Directory issues with recent versions of Grid Engine (SGE). Some minor  
patches are required as is compiling the binaries yourself from source  
(until changes get merged into the codebase).

This document describes the patch and build process:

http://gridengine.info/articles/2008/02/25/building-6-1u3-on-mac-osx-10-5-2-leopard-server

So far 2 organizations other than our own have reported success with  
this method. No promises though!

Regards,
Chris


On Feb 27, 2008, at 6:35 PM, Korambath, Prakashan wrote:

> Anyone knows an Open source job scheduler for Apple Leopard 10.5.2  
> server that will work with Open Directory?  SGE seems to have  
> intermittent problems, Condor and Torque supports only 10.4. Tiger.  
> Thanks.
>
> Prakashan Korambath
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080228/05f1e43d/attachment.html>

From hahn at MCMASTER.CA  Fri Feb 29 07:09:46 2008
From: hahn at MCMASTER.CA (Mark Hahn)
Date: Fri, 29 Feb 2008 10:09:46 -0500 (EST)
Subject: [Beowulf] Harpertown Numbers
In-Reply-To: <52768.192.168.1.1.1204292714.squirrel@mail.eadline.org>
References: <52768.192.168.1.1.1204292714.squirrel@mail.eadline.org>
Message-ID: <Pine.LNX.4.64.0802291005510.13897@coffee.psychology.mcmaster.ca>

> Or stay tuned, I will be running other tests
> with the same hardware real soon.
> http://www.clustermonkey.net//content/view/224/34/

hopefully using the same compiler on both machines eh?
is there any reason to attribute the differences to hardware vs compiler?
(I'd expect _some_ noticable differences from the compiler alone,
since those versions are _years_ apart.  for that matter, 8 vs 12M L2
would also explain some of the differences...)


From kalpana0611 at gmail.com  Fri Feb 29 06:07:14 2008
From: kalpana0611 at gmail.com (Cally)
Date: Fri, 29 Feb 2008 22:07:14 +0800
Subject: [Beowulf] Cluster Monitoring Tool
Message-ID: <b05971d10802290607r76b1e067y4538bafb5be4711f@mail.gmail.com>

Hi everyone,


I have built a 2 node cluster to test my visualization code, I can now run
the visualization using mpi and it appears on both the machines. I use
paraview. And now, I need to see the rendering performance recorded by both
machines ( this 2 machine - cluster is just a prototype, we have a 32 node
at our lab ). The thing .. I am quite new to the clustering stuff and I was
not enrolled at the uni when the cluster was built.. I do know that we can
see the some stuff using ganglia... but I think there is more to seeing the
performance. I need to be able to analyze, compare the results and come up
with a report. And finally propose some load balancing technique based on
that. Is there some kinda tool to use, say if I want to see how much of
memory is being used just for a rendering process. Or maybe there are some
codes available that I can run to check, but I do know that there are some
tools. I am searching on the net, but i still hope someone can point out
some things to me. Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080229/f83ebd2d/attachment.html>

From kekechen at cc.gatech.edu  Fri Feb 29 22:32:22 2008
From: kekechen at cc.gatech.edu (Keke Chen)
Date: Fri, 29 Feb 2008 22:32:22 -0800
Subject: [Beowulf] 
 ODBASE08 2nd CFP: International Conference on Ontologies, DataBases, 
 and Applications of Semantics
Message-ID: <47C8F876.9010902@cc.gatech.edu>


We apologize if you receive multiple copies

======== 2nd Call For Papers ===================


The 7th International Conference on

Ontologies, DataBases, and Applications of Semantics

(ODBASE 2008)

Monterrey, Mexico, Nov 11 - 13, 2008

http://www.cs.rmit.edu.au/fedconf/

Scale of use, ease of use, breadth of use and choice of use have 
earmarked the most important transitions of semantic technologies in the 
years since the first ODBASE conference in 2002. Recent methods allow 
for scaling of semantic technologies to handling dozens of millions of 
triples; they allow for composing intriguing semantic applications 
within a few days; they address target applications from the sciences up 
to eCommerce; and they allow to chose among plenty of existing 
ontologies and half a dozen of RDF stores, inferencing engines, or 
ontology mapping systems.

While these developments greatly contribute to the success of semantic 
technologies, for enterprise-wide and Web-scale applications, the 
envelope needs to be pushed much higher, faster, wider, and broader. The 
2008 conference on Ontologies, DataBases, and Applications of Semantics 
(ODBASE'08) solicits original research papers that push the current 
boundaries.

As in recent years, the focus of the conference lies in addressing 
research issues that bridge traditional boundaries between disciplines 
such as databases, artificial intelligence, semantic web, or data 
extraction. Also, ODBASE'08 encourages the submission of papers that 
examine the information needs of various applications, including 
electronic commerce, electronic government, bioinformatics, or emergency 
response.

ODBASE'08 will consider two categories of papers: research and 
experience. Research papers must contain novel, unpublished results. 
Experience papers must describe existing, realistically large systems. 
In the latter case, preference will be given to papers that describe 
software products or systems that are in wide (experimental) use.

ODBASE'08 intends to draw a highly diverse body of researchers and 
practitioners by being part of the Federated conferences Event "On the 
Move to Meaningful Internet Systems 2008" that co-locates five 
conferences: ODBASE'08, DOA'08 (International Symposium on Distributed 
Objects and Applications), CoopIS'08 (International Conference on 
Cooperative Information Systems), GADA'08 (International Conference on 
Grid computing, high-performAnce and Distributed Applications), and 
IS'08 (International Symposium on Information Security).

TOPICS OF INTEREST

Specific areas of interest to ODBASE'08 include but are not limited to:

     * Semantic data models and semantic querying
     * Semantic dataspaces
     * Ontology engineering
     * Semantic integration, including ontology matching, merging, etc.
     * Management of large ontology-driven data and knowledge bases
     * Semantic information retrieval
     * Emergent semantics
     * Social semantic systems
     * Semantic multimedia management
     * Metadata management
     * XML and Semantics
     * Hypertext, multimedia, and hypermedia semantics
     * Semantic middleware
     * Semantic SOA
     * Ontological support for location-aware services and mobile 
information systems
     * Searching and managing dynamic knowledge

Applications, Evaluations, and Experiences in the following domains:

     * Web 2.0
     * Personal Information Management
     * Media Archives and Digital Libraries
     * Enterprise-wide Information Systems
     * Web-based Information Systems
     * Web Services
     * eCommerce
     * eScience
     * eOrganizations (virtual organizations, virtual marketplaces, etc.)
     * Bioinformatics
     * Emergency Response
     * Ubiquitous and Mobile Information Systems


IMPORTANT DATES
   	Abstract Submission Deadline 	  	June 8, 2008
Paper Submission Deadline 	June 15, 2008
Acceptance Notification 	August 10, 2008
Camera Ready Due 	August 25, 2008
Registration Due 	August 25, 2008
OTM Conferences 	November 9 - 14, 2008

SUBMISSION GUIDELINES

Papers submitted to ODBASE'08 must not have been accepted for 
publication elsewhere or be under review for another workshop or conference.

All submitted papers will be carefully evaluated based on originality, 
significance, technical soundness, and clarity of expression. All papers 
will be refereed by at least three members of the program committee, and 
at least two will be experts from industry in the case of practice 
reports. All submissions must be in English. Submissions must not exceed 
18 pages in the final camera-ready paper style. Submissions must be laid 
out according to the final camera-ready formatting instructions and must 
be submitted in PDF format.

The paper submission site will be announced shortly

Failure to comply with the formatting instructions for submitted papers 
will lead to the outright rejection of the paper without review.

Failure to commit to presentation at the conference automatically 
excludes a paper from the proceedings.


ORGANISATION COMMITTEE

General Co-Chairs

     * Robert Meersman, VU Brussels, Belgium
     * Zahir Tari, RMIT University, Australia

Program Committee Co-Chairs

     * Malu Castellanos, HP, USA
     * Fausto Giunchiglia, University of Trento, Italy
     * Feng Ling, Tsinghua University, China

Program Committee Members

     * Harith Alani, University of Southampton, UK
     * Franz Baader, University of Dresden, Germany
     * Renato Barrera, UNAM , Mexico
     * Sonia Bergamaschi, University of Modena and Reggio Emilia, Italy
     * Mohand Boughanem, Universit? Paul Sabatier of Toulouse, France
     * Francisco Cantu-Ortiz, ITESM-Monterrey , Mexico
     * Edgar Chavez, Universidad de Michoacan, Mexico
     * Oscar Corcho, Universidad Polit?cnica de Madrid, Spain
     * Umesh Dayal, HP, USA
     * Benjamin Habegger, Nirva Systems Ltd, France
     * Bin He, IBM Almaden Research Center, USA
     * Andreas Hotho, University of Kassel, Germany
     * Farookh Hussain, Curtin University of Technology, Australia
     * Vipul Kashyap, Clinical Informatics R&D, Partners HealthCare 
System, USA
     * Phokion Kolaitis, IBM, USA
     * Manolis Koubarakis, National and Kapodistrian University of 
Athens, Greece
     * Maurizio Lenzerini, Universita di Roma "La Sapienza", Italy
     * Juanzi Li, Tsinghua University, China
     * Alexander L?ser, SAP Research, Dresden
     * Riichiro Mizoguchi, Osaka University, Japan
     * Peter Mork, The MITRE Corporation , USA
     * Wolfgang Nejdl, University of Hannover, Germany
     * Erich Neuhold, Universit?t Darmstadt, Germany
     * Wenny Rahayu, La Trobe University, Australia
     * Rajugan Rajagopalapillai, Curtin University of Technology, Australia
     * Arnon Rosenthal, The MITRE Corporation, USA
     * Pavel Shvaiko, University of Trento, Italy
     * Stefano Spaccapietra, EPFL, Switzerland
     * Umberto Straccia, ISTI-CNR, Italy
     * Eleni Stroulia, University of Alberta , Canada
     * Heiner Stuckenschmidt, University of Mannheim, Germany
     * York Sure, SAP, Germany
     * Michael Uschold, The Boeing Company, USA
     * Yannis Velegrakis, University of Trento, Italy
     * Guido Vetere, IBM, Italy
     * Kevin Wilkinson, HP Labs, UK
     * Jose Luis Zechinelli, CENTIA , Mexico
     * Yanchun Zhang, Victoria University, Australia
     * Baoshi Yan, Bosch Research, USA
     * Jingshan Huang, University of South Carolina, USA
     * Laura Zavala, University of South Carolina, USA
     * Octavian Udrea, University of Toronto, Canada
     * Li Ma, IBM, USA
     * Maurizio Marchese, U. Trento, Italy
     * Vijayan Sugumaran, Oakland University, USA
     * Veda C. Storey, Georgia State University, USA
     * Leopoldo Bertossi, Carleton University, Canada
     * Lois M. L. Delcambre , Portland State University, USA
     * Sudha Ram, University of Arizona , USA
     * Il-Yeol Song, Drexel University , USA
     * Satya Sahoo, Wright State University, USA
     * Matthew Perry, Wright State University, USA
     * Mar?a Auxilio Mendina, Polythechnic University of Puebla, Mexico
     * Jon Atle Gulla, Norwegian University of Science and Technology , 
Norway


From kekechen at cc.gatech.edu  Fri Feb 29 22:35:29 2008
From: kekechen at cc.gatech.edu (Keke Chen)
Date: Fri, 29 Feb 2008 22:35:29 -0800
Subject: [Beowulf] CoopIS08 CFP: international conference on COOPERATIVE
 INFORMATION SYSTEMS 
Message-ID: <47C8F931.3010809@cc.gatech.edu>

We apologize if you receive multiple copies

======== 2nd Call For Papers ===================

16th International Conference on

COOPERATIVE INFORMATION SYSTEMS

(CoopIS 2008)

Monterrey, Mexico, Nov 12 - 14, 2008

http://www.cs.rmit.edu.au/fedconf

Acceptance rate of CoopIS in recent years was approx. 20%

Cooperative Information Systems are the cornerstone for moving the 
technical network infrastructure to a meaningful integrated information 
infrastructure.

The CIS paradigm has traditionally encompassed distributed systems 
technologies such as middleware, business process management (BPM) and 
Web technologies. In recent years service oriented architectures have 
fundamentally altered the technological landscape of CIS systems. 
Service Oriented Computing (SOC) introduces the service abstraction (a 
remotely accessible software component) as the building block of both 
inter and intra organizational distributed applications and its 
supporting middleware.

Cooperative Information Systems applications are heavily distributed and 
highly coordinated, often exhibiting inter-organizational interaction 
patterns and requiring distributed access and sharing of computing and 
information resources. Typically they fall under the categories of 
e-Business, e-Commerce, e-Government, e-Health, e-Science among others.

The CoopIS conference series has established itself as a major 
international forum for exchanging ideas and results on scientific 
research for practitioners in fields such as computer supported 
cooperative work (CSCW), middleware, Internet data management, 
electronic commerce, human-computer interaction, workflow management, 
agent technologies, and software architectures, to name a few. In 
addition, the 2008 edition of CoopIs aims to highlight the impact of 
service oriented computing and the importance of sustainability of CIS 
as a necessary prerequisite for mission critical applications.

As in previous years, CoopIS'08 will be part of a joint event with other 
conferences, in the context of the OTM ("On The Move") federated 
conferences, covering different aspects of distributed information systems.

Topics that are addressed by CoopIS'08 are logically grouped in three 
broad areas, and include but are not limited to:

     * Business Process Management and Compliance
           o Business Process Integration and Management
           o Cooperation Aspects in Business Process Management
           o Distributed Workflow Management and Systems
           o Service orchestration and service compositions
           o Process choreographies
           o Business process compliance
           o Integrated supply chains
           o Concurrent engineering and distributed groupware
           o Business level policies
           o Governance, risk and compliance models and runtimes
           o Sustainability of processes
     * Advanced middleware and architectures and runtimes
           o Service oriented middleware
           o Web services standards and runtimes
           o Grid computing infrastructure
           o Enterprise Grids architectures and services
           o Web centric information and processing architectures
           o Semantic interoperability
           o Self-adapting and self-healing systems
           o Model driven middleware architectures
           o Multi-agent systems and architectures for CIS
           o Peer-to-peer technologies
           o Security and privacy in CIS
           o Quality of service in cooperative information systems
           o Mediation, matchmaking, and brokering architectures
           o Collaboration and negotiation protocols
           o Markets, auctions, exchanges, and coalitions
     * CIS Applications
           o Novel CIS applications for the large organizations: 
e-business, e-commerce, e-government
           o Advances in e-science and Grid computing applications
           o Medical and biological information systems
           o Industrial applications of CIS
           o Web 2.0


IMPORTANT DATES

   	Abstract Submission Deadline 	  	June 8, 2008
Paper Submission Deadline 	June 15, 2008
Acceptance Notification 	August 10, 2008
Camera Ready Due 	August 25, 2008
Registration Due 	August 25, 2008
OTM Conferences 	November 9 - 14, 2008

SUBMISSION GUIDELINES

Papers submitted to CoopIS'08 must not have been accepted for 
publication elsewhere or be under review for another workshop or 
conference. All submitted papers will be carefully evaluated based on 
originality, significance, technical soundness, and clarity of 
expression. All papers will be refereed by at least three members of the 
program committee, and at least two will be experts from industry in the 
case of practice reports. All submissions must be in English. 
Submissions must not exceed 18 pages in the final camera-ready paper 
style. Submissions must be laid out according to the final camera-ready 
formatting instructions and must be submitted in PDF format.

The paper submission site will be announced later
Failure to comply with the formatting instructions for submitted papers 
will lead to the outright rejection of the paper without review.

Failure to commit to presentation at the conference automatically 
excludes a paper from the proceedings.

ORGANISATION COMMITTEE

General Co-Chairs

     * Robert Meersman, VU Brussels, Belgium
     * Zahir Tari, RMIT University, Australia

Program Committee Co-Chairs

     * Johann Eder, University of Klagenfurt, Austria
     * Masaru Kitsuregawa, University of Tokyo, Japan
     * Ling Liu, Georgia Institute of Technology, USA


Program Committee Members (to be extended and confirmed)

     * Ghaleb Abdulla, Lawrence Livermore National Laboratory, USA
     * Marco Aiello, University of Groningen, The Netherlands
     * Joonsoo Bae, Chonbuk National Universiry, South Korea
     * Alistair Barros, SAP, Research Centre Brisbane, Australia
     * Zohra Bellahsene, LIRMM- CNRS/Universit? Montpellier 2, France
     * Salima Benbernou, University Lyon 1, France
     * Djamal Benslimane, University of Lyon, France
     * M. Brian Blake, Georgetown University, Washington DC, USA
     * Klemens B?hm, University of Karlsruhe, Germany
     * Christoph Bussler, Cisco Systems, Inc, USA
     * Ying Cai, Iowa State University, USA
     * James Caverlee, Texas A&M University, USA
     * Keke Chen, Yahoo!, USA
     * Vincenzo D'Andrea, University of Trento, Italy
     * Umesh Dayal, HP Labs
     * Xiaoyoung Du, Renmin University of China, PR China
     * Marlon Dumas, University of Tartu, Estonia
     * Schahram Dustdar, Vienna University of Technology, Austria
     * Rik Eshuis, Eindhoven University, The Netherlands
     * Opher Etzion, IBM Israel Software Lab
     * Renato Fileto, Federal University of Santa Catarina, Brazil
     * Klaus Fischer, DFKI, Germany
     * Avigdor Gal, Technion Israel Institute of Technology, Israel
     * Bugra Gedik, IBM TJ Watson, USA
     * Dimitrios Georgakopoulos, Telcordia, USA
     * Paul Grefen, Eindhoven University of Technology, The Netherlands
     * Amarnath Gupta, University of California San Diego, USA
     * Mohand-Said Hacid, Lyon University, France
     * Thorsten Hampel, University of Paderborn, Germany
     * Geert-Jan Houben, TU Eindhoven & VUB Brussels
     * Richard Hull, Lucent Technologies, USA
     * Patrick Hung, University of Ontario Institute of Technology 
(UOIT), Canada
     * Paul Johannesson, Royal Institute of Technology (KTH), Sweden
     * Dimka Karastoyanova, University of Stuttgart, Germany
     * Rania Khalaf, IBM Research
     * Hiroyuki Kitagawa, University of Tsukuba
     * Shim Kyusock, Seoul National Univ.
     * Akhil Kumar, Penn State University, USA
     * Wang-Chien Lee, Pennsylvania State University, USA
     * Frank Leymann, University of Stuttgart, Germany
     * Chen Li, University of California, Irvine, USA
     * Sanjay K. Madria, Missouri University of Science and Technology, USA
     * Leo Mark, Georgia Institute of Technology
     * Maristella Matera, DEI - Politecnico di Milano, Italy
     * Massimo Mecella, Universita' di Roma, Italy
     * Nirmal Mukhi, IBM T J Watson Research Center
     * Mohamed Mokbel, University of Minnessota, USA
     * J?rg M?ller, Technische Universit?t Clausthal
     * Miyuki Nakano, University of Tokyo, Japan
     * Moira Norrie, ETH Zurich, Switzerland
     * Werner Nutt, Free University of Bozen-Bolzano, Italy
     * Andreas Oberweis, University of Karlsruhe, Germany
     * Cesare Pautasso, University of Lugano, Switzerland
     * Barbara Pernici, Politecnico di Milano, Italy
     * Frank Puhlmann, Hasso Plattner Institut, Germany
     * Manfred Reichert, Ulm University, Germany
     * Stefanie Rinderle-Ma, Ulm University, Germany
     * Lakshmish Ramaswamy, University of Georgia, USA
     * Duncan Ruiz, Catholic University of RS, Brazil
     * Kai-Uwe Sattler, TU Ilmenau, Germany
     * Ralf Schenkel, Max-Planck-Institut Informatik, Germany
     * Jialie Shen, Singapore Management University, Singapore
     * Aameek Singh, IBM Almaden Research Center
     * Mudhakar Srivatsa, IBM TJ Watson Research Center, USA
     * Jianwen Su, University of California, Santa Barbara, USA
     * Wei Tang, Teradata Corp. USA
     * Anthony Tung, National University of Singapore, Singapore
     * Susan Urban, Texas Tech University, USA
     * Willem-Jan Van den Heuvel, Tilburg University, The Netherlands
     * Maria Esther Vidal, Universidad Simon Bolivar, Caracas Venezuela
     * Shan Wang, Renmin University of China, PR China
     * X. Sean Wang, University of Vermont, USA
     * Jeffrey Yu, Chinese University of Hong Kong, HK
     * Matthias Weske, University of Potsdam, Germany
     * Li Xiong, Emory University, USA
     * Jian Yang, Macquarie University, Australia
     * Masatoshi Yoshikawa, Kyoto University, Japan
     * Leon Zhao, University of Arizona, USA
     * Xiaofang Zhou, University of Queensland, Australia
     * Aoying Zhou, East China Normal University, PR China
     * Michael zur Muehlen, Stevens Institute of Technology, USA


From kekechen at cc.gatech.edu  Fri Feb 29 22:36:48 2008
From: kekechen at cc.gatech.edu (Keke Chen)
Date: Fri, 29 Feb 2008 22:36:48 -0800
Subject: [Beowulf] GADA08 CFP: International Conference on Grid computing,
 high-performAnce and Distributed Applications
Message-ID: <47C8F980.50409@cc.gatech.edu>

We apologize if you receive multiple copies

======== 2nd Call For Papers ===================

    	
International Conference on

Grid computing, high-performAnce
and Distributed Applications (GADA'08)

Monterrey, Mexico, Nov 13 - 14, 2008

http://www.cs.rmit.edu.au/fedconf


In the last decade, grid computing has developed into one of the most 
important topics in the computing field. The research area of grid 
computing has been making particularly rapid progress in the last few 
years, due to the increasing number of scientific applications that are 
demanding intensive use of computational resources and a dynamic and 
heterogeneous infrastructure.

Within this framework, the GADA workshop arose in 2004 as a forum for 
researchers in grid computing whose aim was to extend their background 
in this area, and more specifically, for those who used grid 
environments in managing and analyzing data. Both GADA'04 and GADA'05 
were constituted as successful events, due to the large number of 
high-quality papers received, as well as the brainstorming of 
experiences and ideas interchanged in the associated forums. Because of 
this demonstrated success, GADA was upgraded as a Conference within On 
The Move Federated Conferences and Workshops (OTM'06). GADA'06 covered a 
broader set of disciplines, although grid computing kept a key role in 
the set of main topics of the conference.

The objective of grid computing is the integration of heterogeneous 
computing systems and data resources with the aim of providing a global 
computing space. The achievement of this goal is creating revolutionary 
changes in the field of computation, because it enables resource sharing 
across networks, with data being one of the most important resources. 
Thus, data access, management and analysis within grid and distributed 
environments are also dealt as main part of the conference.

Therefore, the main goal of GADA'08 is to provide a framework in which a 
community of researchers, developers and users can exchange ideas and 
works related to grid, high-performance and distributed applications and 
systems. The second goal of GADA'08 is to create interaction between 
grid computing researchers and the other OTM attendees.

GADA'08 intends to draw a highly diverse body of researchers and 
practitioners by being part of the "On the Move to Meaningful Internet 
Systems and Ubiquitous Computing 2008" federated conferences event that 
includes five co-located conferences:

     * GADA'08 (International Conference on Grid computing, 
high-performAnce and Distributed Applications)
     * CoopIS'08 (International Conference on Cooperative Information 
Systems)
     * DOA'08 (International Symposium on Distributed Objects and 
Applications)
     * ODBASE'08 (International Conference on Ontologies, DataBases, and 
applications of Semantics)
     * IS'08 (Information Security Symposium)

TOPICS OF INTEREST

Topics of interest include, but are not limited to:

     * Computational grids
     * Data grids
     * High-performance computing
     * Distributed applications
     * Cluster computing
     * Parallel applications
     * Grid infrastructures for data analysis
     * High-performance computing for data-intensive applications
     * Grid computing infrastructures, middleware and tools
     * Mobile Grid Computing
     * Grid computing services
     * Collaboration technologies
     * Data analysis and management on grids
     * Distributed and parallel I/O systems
     * Extracting knowledge from data grids
     * Agent architectures for grid and distributed environments
     * Agent-based data extraction in distributed systems
     * Semantic Grid
     * Security in distributed environments
     * Security in computational and data grids
     * Grid standards as related to applications


IMPORTANT DATES

   	Abstract Submission Deadline 	  	June 8, 2008
Paper Submission Deadline 	June 15, 2008
Acceptance Notification 	August 10, 2008
Camera Ready Due 	August 25, 2008
Registration Due 	August 25, 2008
OTM Conferences 	November 9 - 14, 2008

SUBMISSION GUIDELINES

Papers submitted to GADA'08 must not have been accepted for publication 
elsewhere or be under review for another workshop or conference.

All submitted papers will be carefully evaluated based on originality, 
significance, technical soundness, and clarity of expression. All 
submissions must be in English. Submissions should be in PDF format and 
must not exceed 18 pages in the final camera-ready format.

The paper submission site will be announced shortly

Failure to commit to presentation at the conference automatically 
excludes a paper from the proceedings.

GADA PC co-chairs

     * Dennis Gannon
       Computer Science Department
       Indiana University
       Lindley Hall, Room 215
       150 S. Woodlawn Ave.
       Bloomington, IN 47405-7104
       Phone: (812) 855-5184
       Fax: (812) 855-4829
       Email: gannon at cs.indiana.edu

     * Pilar Herrero
       Facultad de Inform?tica
       Universidad Polit?cnica de Madrid
       Madrid (Spain)
       Phone: (+34) 91.336.74.56
       Fax: (+34) 91.336.65.95E
       Email: pherrero at fi.upm.es

     * Daniel S. Katz
       Louisiana State University
       Louisiana (USA)
       Phone: (+1) 225.578.2750
       Fax: (+1) 225.578.5362
       Email: d.katz at ieee.org

     * Mar?a S. P?rez
       Facultad de Inform?tica
       Universidad Polit?cnica de Madrid
       Madrid (Spain)
       Phone: (+34) 91.336.73.80
       Fax: (+34) 91.336.73.73
       Email: mperez at fi.upm.es

Program Committee (to be confirmed and extended)

     * Adam Wierzbicki, Polish-Japanese Institute of Information 
Technology, Poland
     * Akshai Aggarwal, University of Windsor, Canada
     * Alan Sussman, University of Maryland, College Park, USA
     * Alberto Sanchez, UPM, Spain
     * Anastasios Gounaris, Aristotle University of Thessaloniki, Greece
     * Artur Andrzejak, Zuse Institute Berlin (ZIB), Germany
     * Beniamino Di Martino, Department of Information Engineering, 
Seconda Universit? di Napoli, Italy
     * Bhanu Prasad, Florida A &M University, USA
     * Blanca Caminero Herraez, Universidad de Castilla-La Mancha, Spain
     * Carmela Comito, University of Calabria, Italy
     * Cho-Li Wang, Hong Kong University, China
     * Costin Badica, University of Craiova, Romania
     * Dana Petcu, Western University of Timisoara, Romania
     * Edgar Magana, CISCO Systems, USA
     * Eduardo Huedo, Universidad Complutense de Madrid, Spain
     * Elghazali Talbi, University of Lille, France
     * Ewa Deelman, USC Information Sciences Institute, USA
     * Felix Garc?a, Universidad Carlos III, Spain
     * F?lix J. Garc?a Clemente , Universidad de Murcia, Spain
     * Francisco Jos? da Silva e Silva, Universidade Federal do 
Maranh?o, Brasil
     * Francisco Luna, University of Malaga, Spain
     * Geoff Coulson,, Lancaster University, UK
     * Gregorio Martinez, Universidad de Murcia, Spain
     * Hamid Sarbazi-Azad, Sharif University of Technology, Iran
     * Heinz Stockinger, Swiss Institute of Bioinformatics, Lausanne, 
Switzerland
     * Hong Ong, Oak Ridge National Laboratory, USA
     * Ignacio M. Llorente, UCM-CAB, Madrid, Spain
     * Jemal Abawajy, Deakin University, Victoria, Australia
     * Jes?s Carretero, Universidad Carlos III, Spain
     * Jinjun Chen , Swinburne University of Technology, Australia
     * Jordi Torres, Barcelona SuperComputing Center (BSC-CNS), Spain
     * Jose Cunha, Universidade Nova de Lisboa, Portugal
     * Jose L. Bosque, Universidad de Cantabria, Spain
     * Jos? Luis V?zquez Poletti , Universidad Complutense de Madrid, Spain
     * Jose M. Pe?a, UPM, Spain
     * Juan A. Bot?a Blaya, Universidad de Murcia, Spain
     * Kamil Kuliberda, Polish-Japanese Institute of Information 
Technology, Poland
     * Kostas Karasavvas, National e-Science Centre, UK
     * Laurent Lefevre, INRIA, France
     * Manish Parashar, Rutgers University, NJ
     * Manuel Salvadores, University of Southampton, UK
     * Marcin Paprzycki, Systems Research Institute Polish Academy of 
Science, Poland
     * Maria Ganzha, Elblag University of Humanities and Economy, Poland
     * Mario Cannataro, Univ. of Catanzaro, Italy
     * Marios Dikaiakos, University of Cyprus, Cyprus
     * Mark Baker , University of Reading, UK
     * Markus Endler, PUC-Rio,
     * Mirela Notare, Barddal University, Brazil
     * Neil P Chue Hong , The University of Edinburgh, UK
     * Oscar Ardaiz, Universidad de Navarra, Spain
     * Pascal Bouvry, Universit? du Luxembourg, Luxembourg
     * Rajkumar Buyya, University of Melbourne, Melbourne, Australia
     * Reagan Moore, San Diego Supercomputer Center (SDSC), USA
     * Rizos Sakellariou, Univ. of Manchester, UK
     * Rosa M. Badia, UPC, Barcelona, Spain
     * Ruben S. Montero, UCM-CAB, Madrid, Spain
     * Santi Caball? Llobet, Open University of Catalonia, Spain
     * Sattar B. Sadkhan Almaliky, Iraq - Alnahrain University, Iraq
     * Toni Cortes, UPC, Barcelona, Spain
     * V?ctor Robles, UPM, Spain
     * Geoffrey Fox, Indiana University, USA
     * Shantenu Jha, Louisiana State University, USA
     * Alfredo Cuzzocrea, University of Calabria, Italy
     * Liviu Joita, University of Oxford, UK