I've got 8 linux boxes, what now
lindahl at conservativecomputer.com
Fri Dec 7 08:59:01 PST 2001
On Thu, Dec 06, 2001 at 04:40:43PM -0800, Chris Majewski wrote:
> We're a computer science department investigating, very tentatively,
> the possibility of installing a linux cluster as our next
> general-purpose compute server. To date we've been using things like
> expensive multiprocessor SUN machines.
You need to think about your user interface. Here are 3 possibilities:
1) When your user logs in, they are dumped on 1 of N Linux boxes. All
their processes run on the box. You can use LVS (Linux Virtual
Servers) to do this. NFS mount their directories. If a user starts a
long running job, they have to remember which box they started it on
if they want to kill it. And there's no guarantee that load averages
will remain similar, although LVS can stop sending users to a box with
a higher load average. Fortunately, most of your users don't start
long running jobs, so for most people, it just works.
2) As (1), but also users can also use a special scheme for long
running job, a batch queue. Use Condor for the batch queue. Condor
allows people to find out where their jobs are, and it will be able to
migrate some long running jobs to different boxes to balance the
load. Since only a subset of your users have long running jobs, most
people don't have to learn about Condor.
3) As (1), but use MOSIX. MOSIX can automagically migrate long running
jobs to a different system, but "ps" still shows the job on the "home"
system. This is more transparent to the users, but now the job dies if
either system crashes, so it's less reliable than Condor.
With any of the 3, you still need to work out a way of administering
the system to keep them synchronized.
TurboLinux has a cluster admin system that helps you keep system disks
You can use "rsync" or cfengine, which are traditional Linux sysadmin
Scyld Beowulf doesn't really address this situation. However, it
could, with a modest amount of work. Don, do you have any comments
about this? Since it's many users running many jobs, including
interactive ones, it's not really the area that "Beowulf clusters"
traditionally address. I wish people would work on this, though, as
I'd love to have a prepackaged solution I could sell in this area.
More information about the Beowulf