[Beowulf] NFS alternative for 200 core compute (beowulf) cluster

Michael DiDomenico mdidomenico4 at gmail.com
Thu Aug 10 21:51:44 UTC 2023


i would definitely look more at tuning nfs/backend disks rather then
going down the rabbit hole of gluster/lustre/beegfs.  you only have
five nodes.  nfs is a hog, but you're not likely to bottleneck the nfs
protocol with only five nodes

but for anyone here to give you better advice you'd have to share what
you're doing in more detail :)

On Thu, Aug 10, 2023 at 4:04 PM leo camilo <lhcamilo at gmail.com> wrote:
>
> Awesome, thanks for the info!
>
> Best,
>
> leo
>
> On Thu, 10 Aug 2023 at 22:01, Jeff Johnson <jeff.johnson at aeoncomputing.com> wrote:
>>
>> Leo,
>>
>> Both BeeGFS and Lustre require a backend file system on the disks themselves. Both Lustre and BeeGFS support ZFS backend.
>>
>> --Jeff
>>
>>
>> On Thu, Aug 10, 2023 at 1:00 PM leo camilo <lhcamilo at gmail.com> wrote:
>>>
>>> Hi there,
>>>
>>> thanks for your response.
>>>
>>> BeeGFS indeed looks like a good call option, though realistically I can only afford to use a single node/server for it.
>>>
>>> Would it be feasible to use zfs as volume manager coupled with BeeGFS for the shares, or should I write zfs off all together?
>>>
>>> thanks again,
>>>
>>> best,
>>>
>>> leo
>>>
>>> On Thu, 10 Aug 2023 at 21:29, Bernd Schubert <bernd.schubert at fastmail.fm> wrote:
>>>>
>>>>
>>>>
>>>> On 8/10/23 21:18, leo camilo wrote:
>>>> > Hi everyone,
>>>> >
>>>> > I was hoping I would seek some sage advice from you guys.
>>>> >
>>>> > At my department we have build this small prototyping cluster with 5
>>>> > compute nodes,1 name node and 1 file server.
>>>> >
>>>> > Up until now, the name node contained the scratch partition, which
>>>> > consisted of 2x4TB HDD, which form an 8 TB striped zfs pool. The pool is
>>>> > shared to all the nodes using nfs. The compute nodes and the name node
>>>> > and compute nodes are connected with both cat6 ethernet net cable and
>>>> > infiniband. Each compute node has 40 cores.
>>>> >
>>>> > Recently I have attempted to launch computation from each node (40 tasks
>>>> > per node), so 1 computation per node.  And the performance was abysmal.
>>>> > I reckon I might have reached the limits of NFS.
>>>> >
>>>> > I then realised that this was due to very poor performance from NFS. I
>>>> > am not using stateless nodes, so each node has about 200 GB of SSD
>>>> > storage and running directly from there was a lot faster.
>>>> >
>>>> > So, to solve the issue,  I reckon I should replace NFS with something
>>>> > better. I have ordered 2x4TB NVMEs  for the new scratch and I was
>>>> > thinking of :
>>>> >
>>>> >   * using the 2x4TB NVME in a striped ZFS pool and use a single node
>>>> >     GlusterFS to replace NFS
>>>> >   * using the 2x4TB NVME with GlusterFS in a distributed arrangement
>>>> >     (still single node)
>>>> >
>>>> > Some people told me to use lustre,but I reckon that might be overkill.
>>>> > And I would only use a single fileserver machine(1 node).
>>>> >
>>>> > Could you guys give me some sage advice here?
>>>> >
>>>>
>>>> So glusterfs is using fuse, which doesn't have the best performance
>>>> reputation (although hopefully not for long - feel free to search for
>>>> "fuse" + "uring").
>>>>
>>>> If you want to avoid complexity of Lustre, maybe look into BeeGFS. Well,
>>>> I would recommend to look into it anyway (as former developer I'm biased
>>>> again ;) ).
>>>>
>>>>
>>>> Cheers,
>>>> Bernd
>>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>>
>>
>> --
>> ------------------------------
>> Jeff Johnson
>> Co-Founder
>> Aeon Computing
>>
>> jeff.johnson at aeoncomputing.com
>> www.aeoncomputing.com
>> t: 858-412-3810 x1001   f: 858-412-3845
>> m: 619-204-9061
>>
>> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
>>
>> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


More information about the Beowulf mailing list