[Beowulf] [tjrc at sanger.ac.uk: Re: [Bioclusters] topbiocluster.org]
Eugen Leitl
eugen at leitl.org
Fri Jun 24 09:18:08 PDT 2005
----- Forwarded message from Tim Cutts <tjrc at sanger.ac.uk> -----
From: Tim Cutts <tjrc at sanger.ac.uk>
Date: Fri, 24 Jun 2005 17:02:50 +0100
To: "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org>
Subject: Re: [Bioclusters] topbiocluster.org
X-Mailer: Apple Mail (2.730)
Reply-To: "Clustering, compute farming & distributed computing in life science informatics" <bioclusters at bioinformatics.org>
On 24 Jun 2005, at 4:06 pm, Brodie, Kent wrote:
>Taking into account the whole pipeline (including networked I/O,
>formatdb, etc) is both a great idea and will give much more realistic
>results.
>
>I also think that a collection of data would be a catalyst for great
>future discussions and questions, e..g, "how the heck did you get your
>formatdb to run so fast on the 20K data?", the responses would then
>give
>the rest of us who may be a bit behind in these things great
>insight and
>ideas.
>
>I'd be VERY interested to see if anyone has results from using cluster
>filesystems, for example.....
Cluster filesystems have *drastically* cut our data distribution
time. We can distribute a new multi-GB genome data set to all the
machines that use cluster filesystems in a few minutes. The old RLX
blades, which have to rely on the hierarchy of rsync processes to
which James referred, trail in a dismal few hours later.
They've also increased performance when running jobs; the machines
can suck data over the filesystem's GB ethernet faster than the
individual spindles could supply the data locally.
We've been using cluster filesystems (specifically, GPFS) in
production since October 2003, for the static datasets; blastables
and so on. This is going to continue, and we've been so pleased with
it as a method, that it's going to be extended. The number of nodes
per cluster filesystem (currently 14) will be expanded, hopefully to
the entire cluster. Scratch filesystems for the cluster will be
moved to GPFS or Lustre, rather than NFS, which is where they are
currently. We're not wedded to GPFS - Lustre looks good too.
LSF is already running off a GPFS cluster filesystem so that it can
fail over without the performance sucking because of NFS (yay! No
more LSF masters on Tru64! Woohoo!)
The dream of a 1000+ node cluster entirely without NFS takes a step
closer to reality...
I'd be happy to run one of James' mini pipelines on Sanger's cluster,
if I could actually persuade Ensembl to give me a couple of hours of
completely clear air to actually get the benchmark done. :-)
Tim
--
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
_______________________________________________
Bioclusters maillist - Bioclusters at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bioclusters
----- End forwarded message -----
--
Eugen* Leitl <a href="http://leitl.org">leitl</a>
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050624/a86c44d2/attachment.sig>
More information about the Beowulf
mailing list