Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at mcmaster.ca
Thu Aug 14 22:23:53 PDT 2008


> Gus' numbers makes sense to me. I assume his workload consists of multiple 
> sized jobs, serial, modest parallel, and parallel jobs using all resources. 
> Without pre-emptive scheduling, the batch queue system has to starve the 
> system in order to run the larger jobs.

unless backfill can utilize those temporarily idle cpus.

> Obviously, before a job which 
> consumes all resources starts , then all resources have to be idle. Which 
> means no jobs can't be scheduled, even though they're idle.

true enough, but does depend on the size of large, high-prio jobs 
relative to the size of the cluster.

> Another interesting metric is of course how many of the jobs runs to 
> successful completion, i.e., are not killed due to resource limits, or 
> crashes, or for other reasons. That's what I call net vs. gross utilization.

surely this survival rate is quite high, no?  again, it depends largely
on the design of the cluster (I see few node crashes, maybe 1 of 768 nodes
per week, and few resource crashes (perhaps a couple buggy jobs per week))



More information about the Beowulf mailing list