[Beowulf] Project Heron at the Sanger Institute [EXT]

Thu Feb 4 11:25:19 UTC 2021

In the seminar the graph of sequencing effort for Sanger/ rest of UK/
worldwide is very impressive.

On Thu, 4 Feb 2021 at 10:21, Tim Cutts <tjrc at sanger.ac.uk> wrote:

>
>
> > On 3 Feb 2021, at 18:23, Jörg Saßmannshausen <
> sassy-work at sassy.formativ.net> wrote:
> >
> > Hi John,
> >
> > interesting stuff and good reading.
> >
> > For the IT interests on here: these sequencing machine are chucking out
> large
> > amount of data per day. The project I am involved in can chew out 400 GB
> or so
> > on raw data per day. That is a small machine. That then needs to be
> processed
> > before you actually can analyze it. So there is quite some data movement
> etc
> > involved here.
>
>
> If anyone wants any details, just ask me, since the IT supporting all that
> sequencing is my team’s baby.
>
> Actually, the sequencing capacity for this volume of COVID samples is not
> great.  The virus genome is so small (only 30,000 bases, compared to a
> human’s 3 billion base pairs) that you can massively multiplex the samples
> in a single sequencing run.
>
> Currently, we multiplex 384 samples per Novaseq sequencing lane.  There
> are four lanes per flowcell, and two flowcells per sequencer.  The
> sequencing run takes about 24 hours, so each instrument can sequence about
> 3,000 samples per day.
>
> We have about 20 of these sequencers, so our total capacity is very high;
> in fact we only use three sequencers for COVID at the moment, because
> sample and library preparation is actually the bottleneck.  Getting those
> 384 samples ready for the sequencer.  We are planning to increase it
> though, both by increasing multiplexing and by using more sequencers.
>
> Sequencing itself is a bit less than a day, and the computational analysis
> to de-multiplex and reconstruct the genomes is less than a day running on
> our production-oriented OpenStack cluster (we keep critical projects like
> Heron on a physically separate cluster from normal faculty research); we
> can easily keep up with the sequencers.  We then upload our results to the
> folks at CLIMB, and that’s where the comparative genomics tends to take
> place.
>
> There’s a lot of effort at the moment going into speeding up the
> end-to-end process; for this sequencing to be as useful as possible for
> close-to-real-time outbreak and mutation analysis, the turnaround time
> needs to be as short as possible.  It turns out you can see statistically
> significant new mutation signatures very early on before infection rates
> really start to rise (this was visible in Kent data for B.1.1.7), so the
> sooner we can see this sort of thing the better we will get at taking
> appropriate measures.
>
> For more details on the actual analysis, we released a public seminar a
> couple of weeks ago:
>
> https://stream.venue-av.com/e/sanger_seminars/Barrett
>
> Tim
>
>
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20210204/72986855/attachment-0001.htm>