[Beowulf] Please help to setup Beowulf
Chris Dagdigian
dag at sonsorol.org
Tue Feb 17 11:51:39 PST 2009
On Feb 17, 2009, at 2:29 PM, Michael Will wrote:
> What features differentiate SGE in support of life science workflow
> from LSF/PBS/Torque/Condor?
>
> Michael
They all have their pros and cons, heck I'm still an LSF zealot when
cost is not an issue as Platform has the best APIs, documentation and
layered products for the industry types who need to stand these things
up in full production mode within enterprise organizations that may
have varying levels of Linux/HPC/MPI experience.
The short list of why Grid Engine became popular in the life sciences:
LSF: great product but commercial-only and a pricing model that can
get out of hand (I remember when having more than 4GB RAM in a Linux
1U pushed me into an obscene license tier ...).
Condor: Did not have the fine grained policy and resource allocation
tools that make life easier when you need to have a shared cluster
resource supporting multiple competing users, groups, projects and
workflows. The policy tools for LSF/SGE/PBS were more capable. When I
saw condor out in the field seemed to be mostly used only in academic
sites and in situations where cycles from PC systems were being
aggregated across LAN, metro and wan-scale distances. Bio problems
tend to be more I/O or memory bound rather than CPU bound so most bio
clusters tend to be closely situated racks of gear.
PBS/TORQUE: I'll ignore the FUD from back in the day when people were
claiming that PBS lost jobs and data at high scale and concentrate on
just one key differentiator. At the time when life science was
transitioning from big SGI Altix and Tru64 Alphaservers machines to
commodity compute farms, PBS did not support the concept of array
jobs. If there was one overwhelming cluster resource management
feature essential for bio work
it would be array tasks. This is because we tend to have a very high
concentration of batch/serial workflows that involve running an
application many many times in a row with varying input files and
parameter options. The cliche example in bioinformatics is needing to
run half a million blast searches. Without array task scheduling this
would require 500,000 individual job submissions. The fact that I
never met a serious PBS shop that had not made local custom changes to
the source code also soured me on deploying it when I was putting such
things into conservative IT shops who were still new and fearful of
Linux.
We also don't make heavy use of the globus style WAN-scale capital "G"
grid computing as much of our workflows and pipelines are actually
performance bound by the speed of storage rather than CPU or memory
issues. It was always easier, cheaper and more secure to colocate
dedicated CPU resources local to fast storage rather than distribute
things out as far as possible.
The big news in Bio-IT these days is actually the terabyte scale wet
lab instruments such as confocal microscopes and next-gen DNA
sequencing systems that can produce 1-3TB of raw data per experiment.
Some of these lab instruments ship with software pipelines developed
to run under grid engine. A popular example is the Solexa/Illumina
Genome Analyzer which alone has driven SGE uptake in our field. A
notable exception is the SOLiD system which (I think) ships with a
Windows front end that hides a back end ROCKS cluster running either
PBS or torque under the hood.
And from Mark:
> how about providing some useful content - for instance, what is it
> that you think is especially valuable about sge?
Hopefully I've done some of that with this message. It basically boils
down to the fact that at the time our field started using compute
farms in a serious manner, SGE offered the best overall combination of
features, price and fine grained resource allocation & policy control.
I think what made us a bit different from some other use cases is our
heavy use of serial/batch workflows combined with our tendency to
require that our HPC infrastructures support multiple (and potentially
competing) workflows and pipelines which made the policy/allocation
features a key selection criteria. We also do little if any true WAN-
scale "grid" computing due to workflows that tend to be more storage/
IO bound than anything else. For people starting fresh with a cluster
scheduling layer who did not have an investment in time, expertise and/
or software licensing costs, Grid Engine turned out to be a popular
choice. With that popularity came a good set of people in the
community who can now support and configure these systems (as well as
evangelize them) so the cycle is fairly self perpetuating.
General life science cluster cheat sheet:
- Workloads tend to be far more serial/batch in nature than true
parallel
- Policy and resource allocation features are very important to people
deploying these systems
- Storage speed is often more important than network speed or latency
in many cases
- Fast interconnects are often used for cluster/distributed
filesystems rather than application message passing
- Our MPI codes are often quite horrific from an efficiency/tuning
standpoint - gigE works just as well as Myrinet or IB
- Exceptions to the MPI rule: computational chemistry, modeling and
structure prediction (those fields have well written commercial MPI
codes in use)
- Huge resistance to improved algorithms as scientists want to use
*exactly* the same code that was used to publish the journal paper
-Chris
More information about the Beowulf
mailing list