<div dir="ltr">If I can help, I'm inside IBM. I'm the marketing lead for IBM Spectrum Scale (aka GPFS), but I have solid connections to the field tech support and development teams.<div><br></div><div>my corporate email is <a href="mailto:douglasof@us.ibm.com">douglasof@us.ibm.com</a></div><div><br></div><div>IBM just announced that HortonWorks will be supported on IBM Spectrum Scale. IBM has a lot of development focus on the Hadoop/Spark use case.</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 14, 2017 at 12:00 PM, Jeffrey Layton <span dir="ltr"><<a href="mailto:laytonjb@gmail.com" target="_blank">laytonjb@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><div><div><div>Of course there are tons of options depending upon what you want and your IO patterns of the applications.<br><br></div>Doug's comments about HDFS are great - he's a very good expert in this area.<br><br></div>Depending upon your IO patterns and workload, NFS may work well. I've found it work quite well unless you have a bunch of clients really hammering it. There are some tuning options you can use to improve this behavior (i.e. more clients beating on it before it collapses). It's good to have lots of memory in the NFS server. Google for "Dell, NSS" and you should find some documents on tuning options that Dell created that work VERY well.<br><br></div>Another option for NFS is to consider using async mounts. This can definitely increase performance but you just have to be aware of the downside - if the server goes down, you could lose data from the clients (data in flight). But I've seen some massive performance gains when using async mounts.<br><br></div>BTW - if you have IB, consider using NFS with IPoIB. This can boost performance as well. The recent kernels have RDMA capability for NFS.<br><br></div>If you need encryption over the wire, then consider sshfs. It uses FUSE so you can mount directories from any host you have SSH access (be sure to NOT use password-less SSH :) ). There are some pretty good tuning options for it as well.<br><br></div>For distributed file systems there are some good options: Lustre, BeeGFS, OrangeFS, Ceph, Gluster, Moose, OCFS2, etc. (my apologies to any open-source file systems that I've forgotten). I personally like all of them :) I've used Lustre, BeeGFS, and OrangeFS in current and past lives. I've found BeeGFS to be very easy to configure. The performance seems to be on par with Lustre for the limited testing I did but it's always best to test your own applications (that's true for any file system or storage solution).<br><br></div>There are also commercial solutions that should not be ignored if you want to go that route. There are bunch of them out there - GPFS, Panasas, Scality, and others.<br><br></div>I hope some of these pointers help.<br><br></div><div>Jeff<br><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 14, 2017 at 5:47 AM, John Hanks <span dir="ltr"><<a href="mailto:griznog@gmail.com" target="_blank">griznog@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Should have included this in my last message:<div><br></div><div><a href="https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS" target="_blank">https://github.com/zfsonlinux/<wbr>zfs/wiki/RHEL-%26-CentOS</a></div><div><br></div><div>One other aspect of ZFS I overlooked in my earlier messages is the built in compression. At one point I backed up 460TB of data from our GPFS system onto ~300TB of space on a ZFS system using gzip-9 compression on the target filesystem, thereby gaining compression that was transparent to the users. The benefits of ZFS are really too numerous to cover and the flexibility it adds for managing storage open up whole new solution spaces to explore. For me it is the go-to filesystem for the first layer on the disks.</div><div><br></div><div>jbh<div><div class="m_4998092624883033101h5"><br><div><br></div><div><br><br><div class="gmail_quote"><div dir="ltr">On Tue, Feb 14, 2017 at 4:16 PM Tony Brian Albers <<a href="mailto:tba@kb.dk" target="_blank">tba@kb.dk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 2017-02-14 11:44, Jörg Saßmannshausen wrote:<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> Hi John,<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> thanks for the very interesting and informative post.<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> I am looking into large storage space right now as well so this came really<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> timely for me! :-)<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> One question: I have noticed you were using ZFS on Linux (CentOS 6.8). What<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> are you experiences with this? Does it work reliable? How did you configure the<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> file space?<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> From what I have read is the best way of setting up ZFS is to give ZFS direct<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> access to the discs and then install the ZFS 'raid5' or 'raid6' on top of<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> that. Is that what you do as well?<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> You can contact me offline if you like.<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> All the best from London<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> Jörg<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
> On Tuesday 14 Feb 2017 10:31:00 John Hanks wrote:<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> I can't compare it to Lustre currently, but in the theme of general, we<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> have 4 major chunks of storage:<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> 1. (~500 TB) DDN SFA12K running gridscaler (GPFS) but without GPFS clients<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> on nodes, this is presented to the cluster through cNFS.<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> 2. (~250 TB) SuperMicro 72 bay server. Running CentOS 6.8, ZFS presented<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> via NFS<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> 3. (~ 460 TB) SuperMicro 90 dbay JBOD fronted by a SuperMIcro 2u server<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> with 2 x LSI 3008 SAS/SATA cards. Running CentOS 7.2, ZFS and BeeGFS<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> 2015.xx. BeeGFS clients on all nodes.<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> 4. (~ 12 TB) SuperMicro 48 bay NVMe server, running CentOS 7.2, ZFS<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> presented via NFS<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> Depending on your benchmark, 1, 2 or 3 may be faster. GPFS falls over<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> wheezing under load. ZFS/NFS single server falls over wheezing under<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> slightly less load. BeeGFS tends to fall over a bit more gracefully under<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> load. Number 4, NVMe doesn't care what you do, your load doesn't impress<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> it at all, bring more.<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> We move workloads around to whichever storage has free space and works best<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> and put anything metadata or random I/O-ish that will fit onto the NVMe<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> based storage.<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> Now, in the theme of specific, why are we using BeeGFS and why are we<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> currently planning to buy about 4 PB of supermicro to put behind it? When<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> we asked about improving the performance of the DDN, one recommendation was<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> to buy GPFS client licenses for all our nodes. The quoted price was about<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> 100k more than we wound up spending on the 460 additional TB of Supermicro<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> storage and BeeGFS, which performs as well or better. I fail to see the<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> inherent value of DDN/GPFS that makes it worth that much of a premium in<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> our environment. My personal opinion is that I'll take hardware over<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> licenses any day of the week. My general grumpiness towards vendors isn't<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> improved by the DDN looking suspiciously like a SuperMicro system when I<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> pull the shiny cover off. Of course, YMMV certainly applies here. But<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> there's also that incident where we had to do an offline fsck to clean up<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> some corrupted GPFS foo and the mmfsck tool had an assertion error, not a<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> warm fuzzy moment...<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> Last example, we recently stood up a small test cluster built out of<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> workstations and threw some old 2TB drives in every available slot, then<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> used BeeGFS to glue them all together. Suddenly there is a 36 TB filesystem<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> where before there was just old hardware. And as a bonus, it'll do<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> sustained 2 GB/s for streaming large writes. It's worth a look.<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
>> jbh<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
That sounds very interesting, I'd like to hear more about that. How did<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
you manage to use zfs on centos ?<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
/tony<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
--<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
Best regards,<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
Tony Albers<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
Systems administrator, IT-development<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
Tel: <a href="tel:+45%2025%2066%2023%2083" value="+4525662383" class="m_4998092624883033101m_-5798300952700928386gmail_msg" target="_blank">+45 2566 2383</a> / <a href="tel:+45%2089%2046%2023%2016" value="+4589462316" class="m_4998092624883033101m_-5798300952700928386gmail_msg" target="_blank">+45 8946 2316</a><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
______________________________<wbr>_________________<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" class="m_4998092624883033101m_-5798300952700928386gmail_msg" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" class="m_4998092624883033101m_-5798300952700928386gmail_msg" target="_blank">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br class="m_4998092624883033101m_-5798300952700928386gmail_msg">
</blockquote></div></div></div></div></div></div><div class="m_4998092624883033101HOEnZb"><div class="m_4998092624883033101h5"><div dir="ltr">-- <br></div><div data-smartmail="gmail_signature"><div dir="ltr"><div>‘[A] talent for following the ways of yesterday, is not sufficient to improve the world of today.’</div><div> - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC</div></div></div>
</div></div><br>______________________________<wbr>_________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br>
<br></blockquote></div><br></div>
</div></div><br>______________________________<wbr>_________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/<wbr>mailman/listinfo/beowulf</a><br>
<br></blockquote></div><br></div>