[Beowulf] Which distro for the cluster?
Geoff Jacobs
gdjacobs at gmail.com
Fri Dec 29 12:48:33 PST 2006
Robert G. Brown wrote:
<snip>
> The problem is REALLY evident for laptops -- there are really major
> changes in the way the kernel, rootspace, and userspace manages devices,
> changes that are absolutely necessary for us to be able to plug cameras,
> memory sticks, MP3 players, printers, bluetooth devices, and all of that
> right into the laptop and have it "just work". NetworkManager simply
> doesn't work for most laptops and wireless devices before FC5, and it
> doesn't really work "right" until you get to FC6 AND update to at least
> 0.6.4. On RHEL/Centos 4 (FC4 frozen, basically), well...
Laptops. *shudder* How often do laptop manufacturers change their
hardware configurations? Every other week?
I've found the optimal way to purchase laptop hardware is with a good
live cd for testing. No boot, no buy.
> One of the major disadvantages linux has had relative to WinXX over the
> years has been hardware support that lags, often by years, behind the
> WinXX standard. Because of the way linux is developed, the ONLY way one
> can fix this is to ride a horse that is rapidly co-developed as new
> hardware is released, and pray for ABI and API level standards in the
> hardware industry in front of your favorite brazen idol every night
> (something that is unlikely to work but might make you feel better:-).
>
> The fundamental "advantage" of FC6 is that its release timing actually
> matches up pretty well against the frenetic pace of new hardware
> development -- six to twelve month granularity means that you can
> "usually" by an off-the shelf laptop or computer and have a pretty good
> chance of it either being fully supported right away (if it is older
> than six months) or being fully supported within weeks to months --
> maybe before you smash it with a sledgehammer out of sheer frustration.
>> From what I've seen, ubuntu/debian has a somewhat similar aspect, user
> driven to get that new hardware running even more aggressively than with
> FC (and with a lot of synergy, of course, even though the two
> communities in some respects resemble Sunnis vs the Shites in Iraq:-).
RGB must now go into hiding due to the fatwa against him.
> SINCE they are user driven, they also tend to have lots of nifty
> userspace apps, and since we have entered the age of the massive, fully
> compatible, contributed package repo I expect FC7 to provide something
> on the order of 10K packages, maybe 70% of them square in userspace (and
> the rest libraries etc).
>
> This might even be the "nextgen" revolution -- Windows cannot yet
> provide fully transparent application installation (for money or not)
> over the network -- they have security issues, payment issues,
> installshield/automation issues, permission issues, and
> compatibility/library issues all to resolve before they get anywhere
> close to what yum and friends (or debian's older and also highly
> functional equivalents) can do already for linux. What the software
> companies that are stuck in the "RHEL grove" don't realize is that RPMs,
> yum and the idea of a repo enable them to set up a completely different
> software distribution paradigm, one that can in fact be built for and
> run on all the major RPM distros with minimal investment or risk on
> their part. Then don't "get it" yet. When they do, there could be an
> explosion in commercial grade, web-purchased linux software and
> something of a revolution in software distribution and maintenance (as
> this would obviously drive WinXX to clone/copy). Or not.
>
> Future cloudy, try again later.
>
<snip>
>
> Ooo, then you really don't like pretty much ANY of the traditional "true
> beowulf" designs. They are all pretty much cream eggs. Hell, lots of
> them use rsh without passwords, or open sockets with nothing like a
> serious handshaking layer to do things like distribute binary
> applications and data between nodes.
How things have improved...
> Grid designs, of course, are
> another matter -- they tend to use e.g. ssh and so on but they have to
> because nodes are ultimately exposed to users, probably not in a chroot
> jail. Even so, has anyone really done a proper security audit of e.g.
> pvm or mpi? How difficult is it to take over a PVM virtual machine and
> insert your own binary? I suspect that it isn't that difficult, but I
> don't really know. Any comments, any experts out there?
Would compromising PVM frag a user or the whole system?
> In the specific case of my house, anybody who gets to where they can
> actually bounce a packet off of my server is either inside its walls and
> hence has e.g. cracked e.g. WPA or my DSL firewall or one of my personal
> accounts elsewhere that hits the single (ssh) passthrough port. In all
> of these cases the battle is lost already, as I am God on my LAN of
> course, so a trivial password trap on my personal account would give
> them root everywhere in zero time. In fact, being a truly lazy
> individual who doesn't mind exposing his soft belly to the world, if
> they get root anywhere they've GOT it everywhere -- I have root set up
> to permit free ssh between all client/nodes so that I have to type a
> root password only once and can then run commands as root on any node
> from an xterms as one-liners.
>
> This security model is backed up by a threat of physical violence
> against my sons and their friends, who have carefully avoided learning
> linux at anything like the required level for cracking because they know
> I'd like them to, and the certain knowledge that my wife is doing very
> well if she can manage to crank up a web browser and read her mail
> without forgetting something and making me get up out of bed to help her
> at 5:30 am. So while I do appreciate your point on a
> production/professional network level, it really is irrelevant here.
<snip>
> There are three reasons I haven't upgraded it. One is sheer bandwidth.
> It takes three days or so to push FCX through my DSL link, and while I'm
> doing it all of my sons and wife and me myself scream because their
> ain't no bandwidth leftover for things like WoW and reading mail and
> working. This can be solved with a backpack disk and my laptop -- I can
> take my laptop into Duke and rsync mirror a primary mirror, current
> snapshot, with at worst a 100 Mbps network bottleneck (I actually think
> that the disk bottleneck might be slower, but it is still way faster
> than 384 kbps or thereabouts:-).
>
> The second is the bootstrapping problem. The system in question is my
> internal PXE/install server, a printer server, and an md raid
> fileserver. I really don't feel comfortable trying an RH9 -> FC6
> "upgrade" in a single jump, and a clean reinstall requires that I
> preserve all the critical server information and restore it post
> upgrade. At the same time it would be truly lovely to rebuild the MD
> partitions from scratch, as I believe that MD has moved along a bit in
> the meantime.
>
> This is the third problem -- I need to construct a full backup of the
> /home partition, at least, which is around 100 GB and almost full.
> Hmmm, it might be nice to upgrade the RAID disks from 80 GB to 160's or
> 250's and get some breathing room at the same time, which requires a
> small capital investment -- say $300 or thereabouts. Fortunately I do
> have a SECOND backpack disk with 160 GB of capacity that I use as a
> backup, so I can do an rsync mirror to that of /home while I do the
> reinstall shuffle, with a bit of effort.
>
> All of this takes time, time, time. And I cannot begin to describe my
> life to you, but time is what I just don't got to spare unless my life
> depends on it. That's the level of triage here -- staunch the spurting
> arteries first and apply CPR as necessary -- the mere compound fractures
> and contusions have to wait. You might have noticed I've been strangely
> quiet on-list for the last six months or so... there is a reason:-)
Time. The great equalizer.
> At the moment, evidently, I do have some time and am kind of catching
> up. Next week I might have even more time -- perhaps even the full day
> and change the upgrade will take. I actually do really want to do it --
> both because I do want it to be nice and current and secure and because
> there are LOTS OF IMPROVEMENTS at the server level in the meantime --
> managing e.g. printers with RH9 tools sucks for example, USB support is
> trans-dubious, md is iffy, and I'd like to be able to test out all sorts
> of things like the current version of samba, a radius server to be able
> to drop using PSK in WPA, and so on. So sure, I'll take your advice
> "any day now", but it isn't that simple a matter.
Walled gardens and VPNs for wireless access? Sweet.
<snip>
> No arguments. But remember, you say "users" because you're looking at
> topdown managed clusters with many users. There are lots of people with
> self-managed clusters with just a very few. And honestly,
> straightforward numerical code is generally cosmically portable -- I
> almost never even have to do a recompile to get it to work perfectly
> across upgrades. So YMMV as far as how important that stability is to
> users of any given cluster. There is a whole spectrum here, no simple
> or universal answers.
<snip>
> Truthfully, it is trans great. I started doing Unix admin in 1986, and
> have used just about every clumsy horrible scheme you can imagine to
> handle add-on open source packages without which Unix (of whatever
> vendor-supplied flavor) was pretty damn useless even way back then.
> They still don't have things QUITE as simple as they could be -- setting
> up a diskless boot network for pxe installs or standalone operation is
> still an expert-friendly sort of thing and not for the faint of heart or
> tyro -- but it is down to where a single relatively simple HOWTO or set
> of READMEs can guide a moderately talented sysadmin type through the
> process.
>
> With these tools, you can adminster at the theoretical/practical limit
> of scalability. One person can take care of literally hundreds of
> machines, either nodes or LAN clients, limited only by the need to
> provide USER support and by the rate of hardware failure. I could see a
> single person taking care of over a thousand nodes for a small and
> undemanding user community, with onsite service on all node hardware. I
> think Mark Hahn pushes this limit, as do various others on list. That's
> just awesome. If EVER corporate america twigs to the cost advantages of
> this sort of management scalability on TOP of free as in beer software
> for all standard needs in the office workplace... well, one day it will.
> Too much money involved for it not to.
<snip>
> I thought there was such a party, but I'm too lazy to google for it. I
> think Seth mentioned it on the yum or dulug list. It's the kind of
> thing a lot of people would pay for, actually.
<snip>
> And I _won't_ care...;-)
Come to think of it, the only way you can lose in such a contest is if
quality slips. Pretty much plusses across the board. :-D
> It took me two days to wade through extras in FC6, "shopping", and now
> there are another 500 packages I haven't even looked at a single time.
> The list of games on my laptop is something like three screenfuls long,
> and it would take me weeks to just explore the new applications I did
> install. And truthfully, the only reason I push FC is because (as noted
> above) it a) meets my needs pretty well; and b) has extremely scalable
> installation and maintenance; and c) (most important) I know how to
> install and manage it. I could probably manage debian as well, or
> mandriva, or SuSE, or Gentoo -- one advantage of being a 20 year
> administrator is I do know how everything works and where everything
> lives at the etc level beneath all GUI management tool gorp layers
> shovelled on top by a given distro -- but I'm lazy. Why learn YALD?
> One can be a master of one distro, or mediocre at several...
This is absolutely valid. There is no need to move to the latest
whiz-bang distro if what you're using works fine.
> Pretty much all of the current generation do this. Yum yum.
>
> Where one is welcome to argue about what constitutes a "fast-moving"
> repository. yum doesn't care, really. Everything else is up to the
> conservative versus experimental inclinations of the admin.
How usable is the FC development repository?
> The last time I looked at FAI with was Not Ready For Prime Time and
> languishing unloved. Of course this was a long time ago. I'm actually
> glad that it is loved. The same is true of replicators and system
> imagers -- I've written them myself (many years ago) and found them to
> be a royal PITA to maintain as things evolve, but at this point they
> SHOULD be pretty stable and functional. One day I'll play with them, as
> I'd really like to keep a standard network bootable image around to
> manage disk crashes on my personal systems, where I can't quite boot to
> get to a local disk to recover any data that might be still accessible.
> Yes there are lots of ways to do this and I do have several handy but a
> pure PXE boot target is very appealing.
>
>>> Yes, one can (re)invent many wheels to make all this happen --
>>> package up stuff, rsync stuff, use cfengine (in FC6 extras:-), write
>>> bash or python scripts. Sheer torture. Been there, done that, long
>>> ago and never again.
>> Hey, some people like this. Some people compete in Japanese game shows.
>
> Yes, but from the point of view of perfect scaling theory, heterogeneity
> and nonstandard anything is all dark evil. Yes, many people like to
> lose themselves in customization hell, but there is a certain zen
> element here and Enlightment consists of realizing that all of this is
> Illusion and that there is a great Satori to be gained by following the
> right path....
>
> OK, enough system admysticstration...;-)
>
> rgb
>
--
Geoffrey D. Jacobs
More information about the Beowulf
mailing list