[Beowulf] open mosix alternative

Kenneth Duncan Strouts K.D.Strouts at sms.ed.ac.uk
Fri Jul 4 01:39:26 PDT 2008


Hi Jon,


> Quoting Tony Travis <ajt at rri.sari.ac.uk>:
> Although Kerrighed looks very promising, it is also quite fragile in  
> our hands. If one node crashes, you lose the entire cluster. That  
> said, the Kerrighed project is extremely well supported and I  
> believe it will be a good alternative in the near future.


We found that with Kerrighed, one node crashing sees the whole cluster  
go down.  The following is output to kern.log before the cluster dies.

Jul  2 13:57:03 nodeC at kghed kernel: TIPC: Resetting link
<1.1.2:eth1-1.1.3:eth1>, peer not responding
Jul  2 13:57:03 nodeC at kghed kernel: TIPC: Lost link
<1.1.2:eth1-1.1.3:eth1> on network plane B
Jul  2 13:57:03 nodeC at kghed kernel: TIPC: Lost contact with <1.1.3>

 From the Kerrighed mailing list (Louis Rilling);

"Indeed, Kerrighed does not tolerate node failures yet. We have no  
precise date
for this, and giving a date right now would be meaningless. The first step for
us is to support dynamic cluster resizing (IOW live node additions and  
removals), and we've just started working on it. We will work on node  
failures in a second step."

It seems they are working on this, and on a new framework for  
configurable process scheduling.  Probably Kerrighed will provide a  
good alternative in future.

Kenneth



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.







More information about the Beowulf mailing list