[Beowulf] Project Heron at the Sanger Institute [EXT]
hearnsj at gmail.com
Thu Feb 4 11:25:19 UTC 2021
In the seminar the graph of sequencing effort for Sanger/ rest of UK/
worldwide is very impressive.
On Thu, 4 Feb 2021 at 10:21, Tim Cutts <tjrc at sanger.ac.uk> wrote:
> > On 3 Feb 2021, at 18:23, Jörg Saßmannshausen <
> sassy-work at sassy.formativ.net> wrote:
> > Hi John,
> > interesting stuff and good reading.
> > For the IT interests on here: these sequencing machine are chucking out
> > amount of data per day. The project I am involved in can chew out 400 GB
> or so
> > on raw data per day. That is a small machine. That then needs to be
> > before you actually can analyze it. So there is quite some data movement
> > involved here.
> If anyone wants any details, just ask me, since the IT supporting all that
> sequencing is my team’s baby.
> Actually, the sequencing capacity for this volume of COVID samples is not
> great. The virus genome is so small (only 30,000 bases, compared to a
> human’s 3 billion base pairs) that you can massively multiplex the samples
> in a single sequencing run.
> Currently, we multiplex 384 samples per Novaseq sequencing lane. There
> are four lanes per flowcell, and two flowcells per sequencer. The
> sequencing run takes about 24 hours, so each instrument can sequence about
> 3,000 samples per day.
> We have about 20 of these sequencers, so our total capacity is very high;
> in fact we only use three sequencers for COVID at the moment, because
> sample and library preparation is actually the bottleneck. Getting those
> 384 samples ready for the sequencer. We are planning to increase it
> though, both by increasing multiplexing and by using more sequencers.
> Sequencing itself is a bit less than a day, and the computational analysis
> to de-multiplex and reconstruct the genomes is less than a day running on
> our production-oriented OpenStack cluster (we keep critical projects like
> Heron on a physically separate cluster from normal faculty research); we
> can easily keep up with the sequencers. We then upload our results to the
> folks at CLIMB, and that’s where the comparative genomics tends to take
> There’s a lot of effort at the moment going into speeding up the
> end-to-end process; for this sequencing to be as useful as possible for
> close-to-real-time outbreak and mutation analysis, the turnaround time
> needs to be as short as possible. It turns out you can see statistically
> significant new mutation signatures very early on before infection rates
> really start to rise (this was visible in Kent data for B.1.1.7), so the
> sooner we can see this sort of thing the better we will get at taking
> appropriate measures.
> For more details on the actual analysis, we released a public seminar a
> couple of weeks ago:
> The Wellcome Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf