<div dir="ltr">All our nodes, even most of our fileservers (non-DDN), boot statelessly (warewulf) and all local disks are managed by ZFS, either with JBOD controllers or with non-JBOD controllers configuring each disk as a 1 drive RAID0. So if at all possible, ZFS gets control of the raw disk.<div><br></div><div>ZFS has been extremely reliable. The only problems we have encountered was an underflow that broke quota's on one of our servers and a recent problem using a zvol as swap on CentOS 7.x. The ZFS on linux community is pretty solid at this point and it's nice to know that anything written to disk is correct. </div><div><br></div><div>Compute nodes use striping with no disk redundancy, storage nodes are almost all raidz3 (3 parity disks per vdev). Because we tend to use large drives, raidz3 gives us a cushion should a rebuild from a failed drive take a long time on a full filesystem. There are some mirrors in a few places, we even have the occasional workstation where we've set up a 3 disk mirror to provide extra protection for some critical data and work.</div><div><br></div><div>jbh<br><br><br><div class="gmail_quote"><div dir="ltr">On Tue, Feb 14, 2017 at 1:45 PM Jörg Saßmannshausen <<a href="mailto:j.sassmannshausen@ucl.ac.uk">j.sassmannshausen@ucl.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi John,<br class="gmail_msg">
<br class="gmail_msg">
thanks for the very interesting and informative post.<br class="gmail_msg">
I am looking into large storage space right now as well so this came really<br class="gmail_msg">
timely for me! :-)<br class="gmail_msg">
<br class="gmail_msg">
One question: I have noticed you were using ZFS on Linux (CentOS 6.8). What<br class="gmail_msg">
are you experiences with this? Does it work reliable? How did you configure the<br class="gmail_msg">
file space?<br class="gmail_msg">
>From what I have read is the best way of setting up ZFS is to give ZFS direct<br class="gmail_msg">
access to the discs and then install the ZFS 'raid5' or 'raid6' on top of<br class="gmail_msg">
that. Is that what you do as well?<br class="gmail_msg">
<br class="gmail_msg">
You can contact me offline if you like.<br class="gmail_msg">
<br class="gmail_msg">
All the best from London<br class="gmail_msg">
<br class="gmail_msg">
Jörg<br class="gmail_msg">
<br class="gmail_msg">
On Tuesday 14 Feb 2017 10:31:00 John Hanks wrote:<br class="gmail_msg">
> I can't compare it to Lustre currently, but in the theme of general, we<br class="gmail_msg">
> have 4 major chunks of storage:<br class="gmail_msg">
><br class="gmail_msg">
> 1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS clients<br class="gmail_msg">
> on nodes, this is presented to the cluster through cNFS.<br class="gmail_msg">
><br class="gmail_msg">
> 2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS presented<br class="gmail_msg">
> via NFS<br class="gmail_msg">
><br class="gmail_msg">
> 3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u server<br class="gmail_msg">
> with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS<br class="gmail_msg">
> 2015.xx. BeeGFS clients on all nodes.<br class="gmail_msg">
><br class="gmail_msg">
> 4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS<br class="gmail_msg">
> presented via NFS<br class="gmail_msg">
><br class="gmail_msg">
> Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over<br class="gmail_msg">
> wheezing under load. ZFS/NFS single server falls over wheezing under<br class="gmail_msg">
> slightly less load. BeeGFS tends to fall over a bit more gracefully under<br class="gmail_msg">
> load. Number 4, NVMe doesn't care what you do, your load doesn't impress<br class="gmail_msg">
> it at all, bring more.<br class="gmail_msg">
><br class="gmail_msg">
> We move workloads around to whichever storage has free space and works best<br class="gmail_msg">
> and put anything metadata or random I/O-ish that will fit onto the NVMe<br class="gmail_msg">
> based storage.<br class="gmail_msg">
><br class="gmail_msg">
> Now, in the theme of specific, why are we using BeeGFS and why are we<br class="gmail_msg">
> currently planning to buy about 4 PB of supermicro to put behind it? When<br class="gmail_msg">
> we asked about improving the performance of the DDN, one recommendation was<br class="gmail_msg">
> to buy GPFS client licenses for all our nodes. The quoted price was about<br class="gmail_msg">
> 100k more than we wound up spending on the 460 additional TB of Supermicro<br class="gmail_msg">
> storage and BeeGFS, which performs as well or better. I fail to see the<br class="gmail_msg">
> inherent value of DDN/GPFS that makes it worth that much of a premium in<br class="gmail_msg">
> our environment. My personal opinion is that I'll take hardware over<br class="gmail_msg">
> licenses any day of the week. My general grumpiness towards vendors isn't<br class="gmail_msg">
> improved by the DDN looking suspiciously like a SuperMicro system when I<br class="gmail_msg">
> pull the shiny cover off. Of course, YMMV certainly applies here. But<br class="gmail_msg">
> there's also that incident where we had to do an offline fsck to clean up<br class="gmail_msg">
> some corrupted GPFS foo and the mmfsck tool had an assertion error, not a<br class="gmail_msg">
> warm fuzzy moment...<br class="gmail_msg">
><br class="gmail_msg">
> Last example, we recently stood up a small test cluster built out of<br class="gmail_msg">
> workstations and threw some old 2TB drives in every available slot, then<br class="gmail_msg">
> used BeeGFS to glue them all together. Suddenly there is a 36 TB filesystem<br class="gmail_msg">
> where before there was just old hardware. And as a bonus, it'll do<br class="gmail_msg">
> sustained 2 GB/s for streaming large writes. It's worth a look.<br class="gmail_msg">
><br class="gmail_msg">
> jbh<br class="gmail_msg">
><br class="gmail_msg">
> On Tue, Feb 14, 2017 at 10:02 AM, Jon Tegner <<a href="mailto:tegner@renget.se" class="gmail_msg" target="_blank">tegner@renget.se</a>> wrote:<br class="gmail_msg">
> > BeeGFS sounds interesting. Is it possible to say something general about<br class="gmail_msg">
> > how it compares to Lustre regarding performance?<br class="gmail_msg">
> ><br class="gmail_msg">
> > /jon<br class="gmail_msg">
> ><br class="gmail_msg">
> ><br class="gmail_msg">
> > On 02/13/2017 05:54 PM, John Hanks wrote:<br class="gmail_msg">
> ><br class="gmail_msg">
> > We've had pretty good luck with BeeGFS lately running on SuperMicro<br class="gmail_msg">
> > vanilla hardware with ZFS as the underlying filesystem. It works pretty<br class="gmail_msg">
> > well for the cheap end of the hardware spectrum and BeeGFS is free and<br class="gmail_msg">
> > pretty amazing. It has held up to abuse under a very mixed and heavy<br class="gmail_msg">
> > workload and we can stream large sequential data into it fast enough to<br class="gmail_msg">
> > saturate a QDR IB link, all without any in depth tuning. While we don't<br class="gmail_msg">
> > have redundancy (other than raidz3), BeeGFS can be set up with some<br class="gmail_msg">
> > redundancy between metadata servers and mirroring between storage.<br class="gmail_msg">
> > <a href="http://www.beegfs.com/content/" rel="noreferrer" class="gmail_msg" target="_blank">http://www.beegfs.com/content/</a><br class="gmail_msg">
> ><br class="gmail_msg">
> > jbh<br class="gmail_msg">
> ><br class="gmail_msg">
> > On Mon, Feb 13, 2017 at 7:40 PM Alex Chekholko <<a href="mailto:alex.chekholko@gmail.com" class="gmail_msg" target="_blank">alex.chekholko@gmail.com</a>><br class="gmail_msg">
> ><br class="gmail_msg">
> > wrote:<br class="gmail_msg">
> >> If you have a preference for Free Software, GlusterFS would work, unless<br class="gmail_msg">
> >> you have many millions of small files. It would also depend on your<br class="gmail_msg">
> >> available hardware, as there is not a 1-to-1 correspondence between a<br class="gmail_msg">
> >> typical GPFS setup and a typical GlusterFS setup. But at least it is free<br class="gmail_msg">
> >> and easy to try out. The mailing list is active, the software is now<br class="gmail_msg">
> >> mature<br class="gmail_msg">
> >> ( I last used GlusterFS a few years ago) and you can buy support from Red<br class="gmail_msg">
> >> Hat if you like.<br class="gmail_msg">
> >><br class="gmail_msg">
> >> Take a look at the RH whitepapers about typical GlusterFS architecture.<br class="gmail_msg">
> >><br class="gmail_msg">
> >> CephFS, on the other hand, is not yet mature enough, IMHO.<br class="gmail_msg">
> >> On Mon, Feb 13, 2017 at 8:31 AM Justin Y. Shi <<a href="mailto:shi@temple.edu" class="gmail_msg" target="_blank">shi@temple.edu</a>> wrote:<br class="gmail_msg">
> >><br class="gmail_msg">
> >> Maybe you would consider Scality (<a href="http://www.scality.com/" rel="noreferrer" class="gmail_msg" target="_blank">http://www.scality.com/</a>) for your<br class="gmail_msg">
> >> growth concerns. If you need speed, DDN is faster in rapid data ingestion<br class="gmail_msg">
> >> and for extreme HPC data needs.<br class="gmail_msg">
> >><br class="gmail_msg">
> >> Justin<br class="gmail_msg">
> >><br class="gmail_msg">
> >> On Mon, Feb 13, 2017 at 4:32 AM, Tony Brian Albers <<a href="mailto:tba@kb.dk" class="gmail_msg" target="_blank">tba@kb.dk</a>> wrote:<br class="gmail_msg">
> >><br class="gmail_msg">
> >> On 2017-02-13 09:36, Benson Muite wrote:<br class="gmail_msg">
> >> > Hi,<br class="gmail_msg">
> >> ><br class="gmail_msg">
> >> > Do you have any performance requirements?<br class="gmail_msg">
> >> ><br class="gmail_msg">
> >> > Benson<br class="gmail_msg">
> >> ><br class="gmail_msg">
> >> > On 02/13/2017 09:55 AM, Tony Brian Albers wrote:<br class="gmail_msg">
> >> >> Hi guys,<br class="gmail_msg">
> >> >><br class="gmail_msg">
> >> >> So, we're running a small(as in a small number of nodes(10), not<br class="gmail_msg">
> >> >> storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum<br class="gmail_msg">
> >> >> Scale(GPFS) which works fine and has POSIX support. On top of GPFS we<br class="gmail_msg">
> >> >> have a GPFS transparency connector so that HDFS uses GPFS.<br class="gmail_msg">
> >> >><br class="gmail_msg">
> >> >> Now, if I'd like to replace GPFS with something else, what should I<br class="gmail_msg">
> >><br class="gmail_msg">
> >> use?<br class="gmail_msg">
> >><br class="gmail_msg">
> >> >> It needs to be a fault-tolerant DFS, with POSIX support(so that users<br class="gmail_msg">
> >> >> can move data to and from it with standard tools).<br class="gmail_msg">
> >> >><br class="gmail_msg">
> >> >> I've looked at MooseFS which seems to be able to do the trick, but are<br class="gmail_msg">
> >> >> there any others that might do?<br class="gmail_msg">
> >> >><br class="gmail_msg">
> >> >> TIA<br class="gmail_msg">
> >><br class="gmail_msg">
> >> Well, we're not going to be doing a huge amount of I/O. So performance<br class="gmail_msg">
> >> requirements are not high. But ingest needs to be really fast, we're<br class="gmail_msg">
> >> talking tens of terabytes here.<br class="gmail_msg">
> >><br class="gmail_msg">
> >> /tony<br class="gmail_msg">
> >><br class="gmail_msg">
> >> --<br class="gmail_msg">
> >> Best regards,<br class="gmail_msg">
> >><br class="gmail_msg">
> >> Tony Albers<br class="gmail_msg">
> >> Systems administrator, IT-development<br class="gmail_msg">
> >> Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.<br class="gmail_msg">
> >> Tel: <a href="tel:+45%2025%2066%2023%2083" value="+4525662383" class="gmail_msg" target="_blank">+45 2566 2383</a> / <a href="tel:+45%2089%2046%2023%2016" value="+4589462316" class="gmail_msg" target="_blank">+45 8946 2316</a><br class="gmail_msg">
> >> _______________________________________________<br class="gmail_msg">
> >> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" class="gmail_msg" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br class="gmail_msg">
> >> To change your subscription (digest mode or unsubscribe) visit<br class="gmail_msg">
> >> <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" class="gmail_msg" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br class="gmail_msg">
> >><br class="gmail_msg">
> >><br class="gmail_msg">
> >> _______________________________________________<br class="gmail_msg">
> >> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" class="gmail_msg" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br class="gmail_msg">
> >> To change your subscription (digest mode or unsubscribe) visit<br class="gmail_msg">
> >> <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" class="gmail_msg" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br class="gmail_msg">
> >><br class="gmail_msg">
> >> _______________________________________________<br class="gmail_msg">
> >> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" class="gmail_msg" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br class="gmail_msg">
> >> To change your subscription (digest mode or unsubscribe) visit<br class="gmail_msg">
> >> <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" class="gmail_msg" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br class="gmail_msg">
> ><br class="gmail_msg">
> > --<br class="gmail_msg">
> > ‘[A] talent for following the ways of yesterday, is not sufficient to<br class="gmail_msg">
> > improve the world of today.’<br class="gmail_msg">
> ><br class="gmail_msg">
> > - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC<br class="gmail_msg">
> ><br class="gmail_msg">
> > _______________________________________________<br class="gmail_msg">
> > Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" class="gmail_msg" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br class="gmail_msg">
> > To change your subscription (digest mode or unsubscribe) visit<br class="gmail_msg">
> > <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" class="gmail_msg" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br class="gmail_msg">
> ><br class="gmail_msg">
> ><br class="gmail_msg">
> ><br class="gmail_msg">
> > _______________________________________________<br class="gmail_msg">
> > Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" class="gmail_msg" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br class="gmail_msg">
> > To change your subscription (digest mode or unsubscribe) visit<br class="gmail_msg">
> > <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" class="gmail_msg" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
--<br class="gmail_msg">
*************************************************************<br class="gmail_msg">
Dr. Jörg Saßmannshausen, MRSC<br class="gmail_msg">
University College London<br class="gmail_msg">
Department of Chemistry<br class="gmail_msg">
20 Gordon Street<br class="gmail_msg">
London<br class="gmail_msg">
WC1H 0AJ<br class="gmail_msg">
<br class="gmail_msg">
email: <a href="mailto:j.sassmannshausen@ucl.ac.uk" class="gmail_msg" target="_blank">j.sassmannshausen@ucl.ac.uk</a><br class="gmail_msg">
web: <a href="http://sassy.formativ.net" rel="noreferrer" class="gmail_msg" target="_blank">http://sassy.formativ.net</a><br class="gmail_msg">
<br class="gmail_msg">
Please avoid sending me Word or PowerPoint attachments.<br class="gmail_msg">
See <a href="http://www.gnu.org/philosophy/no-word-attachments.html" rel="noreferrer" class="gmail_msg" target="_blank">http://www.gnu.org/philosophy/no-word-attachments.html</a>_______________________________________________<br class="gmail_msg">
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" class="gmail_msg" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br class="gmail_msg">
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" class="gmail_msg" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br class="gmail_msg">
</blockquote></div></div></div><div dir="ltr">-- <br></div><div data-smartmail="gmail_signature"><div dir="ltr"><div>‘[A] talent for following the ways of yesterday, is not sufficient to improve the world of today.’</div><div> - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC</div></div></div>