<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
Hi Tim,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
Maybe SETI@home wasnt the right project to mention, just remembered there is another project but not in genomics on that distributed platform called Folding@home. So with genomics you cannot break it down into smaller chunks where the data can be crunched then
returned to sender and then processed once the data is back or as its being received?</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
Regards,<br>
Jonathan<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Tim Cutts <tjrc@sanger.ac.uk><br>
<b>Sent:</b> 04 February 2021 11:35<br>
<b>To:</b> Jonathan Aquilina <jaquilina@eagleeyet.net><br>
<b>Cc:</b> Beowulf <beowulf@beowulf.org><br>
<b>Subject:</b> Re: [Beowulf] Project Heron at the Sanger Institute [EXT]</font>
<div> </div>
</div>
<div class="" style="word-wrap:break-word; line-break:after-white-space">Compute capacity is not generally the issue. For this pipeline, we only need about 200 cores to keep up with each sequencer, so a couple of servers. Genomics has not, historically,
been a good fit for SETI@home style cycle-stealing, because the amount of compute you perform on a given unit of data is quite low. A lot of genomics is already I/O bound even when the compute is right next to the data, so you don’t gain much by shipping
it off to cycle-stealing desktops.
<div class=""><br class="">
</div>
<div class="">In fact, the direction most sequencing instrument suppliers are going is embedding the compute in the sequencer itself, at least for use cases where you don’t really need the sequence at all, you just need to know how it varies from a reference
genome. In such cases, it’s much more sensible to run the pipeline on or right next to the sequencer and just spit out the (very small) diffs.</div>
<div class=""><br class="">
</div>
<div class="">Scientists are conservative folks though, they sometimes get a bit nervous at the thought of discarding the raw sequence data.</div>
<div class=""><br class="">
</div>
<div class="">Tim</div>
<div class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On 4 Feb 2021, at 10:27, Jonathan Aquilina <<a href="mailto:jaquilina@eagleeyet.net" class="">jaquilina@eagleeyet.net</a>> wrote:</div>
<br class="x_Apple-interchange-newline">
<div class="">
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; background-color:rgb(255,255,255)">
Would love to help you guys out in anyway i can in terms of hardware processing.<span class="x_Apple-converted-space"> </span><br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; background-color:rgb(255,255,255)">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; background-color:rgb(255,255,255)">
Have you guys thought of doing something like SETI@home and those projects to get idle compute power to help churn through the massive amounts of data?</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; background-color:rgb(255,255,255)">
<br class="">
</div>
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; background-color:rgb(255,255,255)">
Regards,<br class="">
Jonathan<br class="">
</div>
<div id="x_appendonsend" class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
</div>
<hr tabindex="-1" class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; display:inline-block; width:673.25px">
<span class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; float:none; display:inline!important"></span>
<div id="x_divRplyFwdMsg" dir="ltr" class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none">
<font face="Calibri, sans-serif" class="" style="font-size:11pt"><b class="">From:</b><span class="x_Apple-converted-space"> </span>Tim Cutts <<a href="mailto:tjrc@sanger.ac.uk" class="">tjrc@sanger.ac.uk</a>><br class="">
<b class="">Sent:</b><span class="x_Apple-converted-space"> </span>04 February 2021 11:26<br class="">
<b class="">To:</b><span class="x_Apple-converted-space"> </span>Jonathan Aquilina <<a href="mailto:jaquilina@eagleeyet.net" class="">jaquilina@eagleeyet.net</a>><br class="">
<b class="">Cc:</b><span class="x_Apple-converted-space"> </span>Beowulf <<a href="mailto:beowulf@beowulf.org" class="">beowulf@beowulf.org</a>><br class="">
<b class="">Subject:</b><span class="x_Apple-converted-space"> </span>Re: [Beowulf] Project Heron at the Sanger Institute [EXT]</font>
<div class=""> </div>
</div>
<div class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; word-wrap:break-word; line-break:after-white-space">
<br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On 4 Feb 2021, at 10:14, Jonathan Aquilina via Beowulf <<a href="mailto:beowulf@beowulf.org" class="">beowulf@beowulf.org</a>> wrote:</div>
<br class="x_x_Apple-interchange-newline">
<div class="">
<div class="" style="font-style:normal; font-variant-caps:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; text-decoration:none; font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; background-color:rgb(255,255,255)">
I am curious though to chunk out such large data is something like hadoop/HBase and the like of those platforms, are those whats being used?</div>
<br class="x_x_Apple-interchange-newline">
</div>
</blockquote>
</div>
<br class="">
<div class="">It’s a combination of our home-grown sequencing pipeline which we use across the board, and then a specific COG-UK analysis of the genomes themselves. This pipeline is common to all consortium members who are contributing sequence data. It’s
a Nextflow pipeline, and the code is here:</div>
<div class=""><br class="">
</div>
<div class=""><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_connor-2Dlab_ncov2019-2Dartic-2Dnf&d=DwMF-g&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=jJhOeZORmye7vKliXyqrCd2Kvbe5xu9pHhLw4rNQmHM&s=lSbHd9Jxd4Dy9P7rosnrdgOmieVt-yzUuVI-MPK7TM0&e=" class="">https://github.com/connor-lab/ncov2019-artic-nf
[github.com]</a></div>
<div class=""><br class="">
</div>
<div class="">Being nextflow, you can run it on anything for which nextflow has a backend scheduler. It supports data from both Illumina and Oxford Nanopore sequencers.</div>
<div class=""><br class="">
</div>
<div class="">Tim</div>
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.</div>
</div>
</blockquote>
</div>
<br class="">
</div>
-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
</div>
</body>
</html>