[Beowulf] NFS alternative for 200 core compute (beowulf) cluster

leo camilo lhcamilo at gmail.com
Fri Aug 11 05:30:05 UTC 2023


hi there,

Thanks for the advice.

>From the messages here I think I have grokked on how to proceed

- Swap the HDDs with NVME
- replace 1GB ethernet IB
- Configure NFS to use IPoIB or RDMA
- Tune NFS

I will need to get my hands on Lustre eventually, but that can wait.

Thanks for the help

On Thu, 10 Aug 2023 at 23:47, Renfro, Michael <Renfro at tntech.edu> wrote:

> As the definitely-not-proud owner of a 2016 purchase of a 60-bay disk
> shelf attached to a single server with an Infiniband connection back to 54
> compute nodes, NFS on spinning disks can definitely handle 5 40-core jobs,
> but your particular setup really can’t. Mine has hit its limits at times as
> well, but it’s about the IOPS from the disk array, the speed of the SAS
> cable connecting the disk shelf to the server, everything **but** NFS
> itself.
>
>
>
> Swapping to NVMe should make a world of difference on its own, as long as
> you don’t have a bottleneck of 1 Gb Ethernet between your storage and the
> compute capacity.
>
>
>
> *From: *Beowulf <beowulf-bounces at beowulf.org> on behalf of leo camilo <
> lhcamilo at gmail.com>
> *Date: *Thursday, August 10, 2023 at 3:04 PM
> *To: *Jeff Johnson <jeff.johnson at aeoncomputing.com>
> *Cc: *Bernd Schubert <bernd.schubert at fastmail.fm>, Beowulf at beowulf.org <
> Beowulf at beowulf.org>
> *Subject: *Re: [Beowulf] NFS alternative for 200 core compute (beowulf)
> cluster
>
> *External Email Warning*
>
> *This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.*
> ------------------------------
>
> Awesome, thanks for the info!
>
> Best,
>
>
>
> leo
>
>
>
> On Thu, 10 Aug 2023 at 22:01, Jeff Johnson <jeff.johnson at aeoncomputing.com>
> wrote:
>
> Leo,
>
>
>
> Both BeeGFS and Lustre require a backend file system on the disks
> themselves. Both Lustre and BeeGFS support ZFS backend.
>
>
>
> --Jeff
>
>
>
>
>
> On Thu, Aug 10, 2023 at 1:00 PM leo camilo <lhcamilo at gmail.com> wrote:
>
> Hi there,
>
> thanks for your response.
>
>
>
> BeeGFS indeed looks like a good call option, though realistically I can
> only afford to use a single node/server for it.
>
> Would it be feasible to use zfs as volume manager coupled with BeeGFS for
> the shares, or should I write zfs off all together?
>
> thanks again,
>
> best,
>
> leo
>
>
>
> On Thu, 10 Aug 2023 at 21:29, Bernd Schubert <bernd.schubert at fastmail.fm>
> wrote:
>
>
>
> On 8/10/23 21:18, leo camilo wrote:
> > Hi everyone,
> >
> > I was hoping I would seek some sage advice from you guys.
> >
> > At my department we have build this small prototyping cluster with 5
> > compute nodes,1 name node and 1 file server.
> >
> > Up until now, the name node contained the scratch partition, which
> > consisted of 2x4TB HDD, which form an 8 TB striped zfs pool. The pool is
> > shared to all the nodes using nfs. The compute nodes and the name node
> > and compute nodes are connected with both cat6 ethernet net cable and
> > infiniband. Each compute node has 40 cores.
> >
> > Recently I have attempted to launch computation from each node (40 tasks
> > per node), so 1 computation per node.  And the performance was abysmal.
> > I reckon I might have reached the limits of NFS.
> >
> > I then realised that this was due to very poor performance from NFS. I
> > am not using stateless nodes, so each node has about 200 GB of SSD
> > storage and running directly from there was a lot faster.
> >
> > So, to solve the issue,  I reckon I should replace NFS with something
> > better. I have ordered 2x4TB NVMEs  for the new scratch and I was
> > thinking of :
> >
> >   * using the 2x4TB NVME in a striped ZFS pool and use a single node
> >     GlusterFS to replace NFS
> >   * using the 2x4TB NVME with GlusterFS in a distributed arrangement
> >     (still single node)
> >
> > Some people told me to use lustre,but I reckon that might be overkill.
> > And I would only use a single fileserver machine(1 node).
> >
> > Could you guys give me some sage advice here?
> >
>
> So glusterfs is using fuse, which doesn't have the best performance
> reputation (although hopefully not for long - feel free to search for
> "fuse" + "uring").
>
> If you want to avoid complexity of Lustre, maybe look into BeeGFS. Well,
> I would recommend to look into it anyway (as former developer I'm biased
> again ;) ).
>
>
> Cheers,
> Bernd
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
>
>
>
> --
>
> ------------------------------
> Jeff Johnson
> Co-Founder
> Aeon Computing
>
> jeff.johnson at aeoncomputing.com
> www.aeoncomputing.com
> t: 858-412-3810 x1001   f: 858-412-3845
> m: 619-204-9061
>
> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
>
>
>
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20230811/05a9c358/attachment-0001.htm>


More information about the Beowulf mailing list