andrew at moonet.co.uk
Thu Jul 26 02:38:13 PDT 2007
Would this mean that a users environment could never exceed the
resources of a single node?
On 26/07/07, Julien Leduc <julien.leduc at lri.fr> wrote:
> >> I'm interested in utilising the hardware to create something akin to
> >> the sun grid or the amazon elastic computing cloud whereby the
> >> resources available to the environment are automatically expanded and
> >> contracted. Maybe I have the wrong end of the stick on how these
> >> services operate.
> > no, I think you're right on, and there's not much to it. why do you
> > think Sun or Amazon have any special magic? beowulf clusters running
> > multi-user queueing systems are precisely such an "elastic", "compute-
> > on-demand" thingy, just without paying for the isolation, because such
> > clusters are mainly motivated by performance.
> Running a multi-user queueing system, you can have a cluster that
> behaves like Sun or Amazon projects: you just choose the nodes that can
> fullfill the user needs and requirements, fetch a VM on those chosen
> nodes (during the 'prolog' section of the batch scheduler), start the
> VMs on the physical nodes, ensure the user can log on those or fetch his
> data / run a passive job. Then, once finished, clean up all that mess by
> destroying the VM, and let another user reserve the node.
> More isolation can be achieved, if the user needs to be root on the
> node, to run a modified version of the kernel, or run several VMs on top
> of his environment. For that, you have to let him deploy his own
> environment on the node.
> This last technique ensure reproductible experiments, more performances,
> drawbacks are: more work on the middleware that make all that magic come
> Combining the 2 previous techniques could help users to test their
> OS+experimentation program in a VM and then deploy it at larger scale
> for a true run on all the cluster(s ;) ).
> This is a very interesting approach (at least for computer scientists)
> and the second approach gives quite good results for the moment, the
> combination of the 2 techniques has to be implemented to give away more
> ressources so that users can test their environments on many virtual
> nodes, consuming less physical nodes.
> The main problem is to be able to control the nodes remotely, with
> hardware supporting remote reboots, remote console management...
> Julien Leduc
More information about the Beowulf