[Beowulf] NFS alternative for 200 core compute (beowulf) cluster

Thu Aug 10 19:29:06 UTC 2023

On 8/10/23 21:18, leo camilo wrote:
> Hi everyone,
> 
> I was hoping I would seek some sage advice from you guys.
> 
> At my department we have build this small prototyping cluster with 5 
> compute nodes,1 name node and 1 file server.
> 
> Up until now, the name node contained the scratch partition, which 
> consisted of 2x4TB HDD, which form an 8 TB striped zfs pool. The pool is 
> shared to all the nodes using nfs. The compute nodes and the name node 
> and compute nodes are connected with both cat6 ethernet net cable and 
> infiniband. Each compute node has 40 cores.
> 
> Recently I have attempted to launch computation from each node (40 tasks 
> per node), so 1 computation per node.  And the performance was abysmal. 
> I reckon I might have reached the limits of NFS.
> 
> I then realised that this was due to very poor performance from NFS. I 
> am not using stateless nodes, so each node has about 200 GB of SSD 
> storage and running directly from there was a lot faster.
> 
> So, to solve the issue,  I reckon I should replace NFS with something 
> better. I have ordered 2x4TB NVMEs  for the new scratch and I was 
> thinking of :
> 
>   * using the 2x4TB NVME in a striped ZFS pool and use a single node
>     GlusterFS to replace NFS
>   * using the 2x4TB NVME with GlusterFS in a distributed arrangement
>     (still single node)
> 
> Some people told me to use lustre,but I reckon that might be overkill. 
> And I would only use a single fileserver machine(1 node).
> 
> Could you guys give me some sage advice here?
> 

So glusterfs is using fuse, which doesn't have the best performance 
reputation (although hopefully not for long - feel free to search for 
"fuse" + "uring").

If you want to avoid complexity of Lustre, maybe look into BeeGFS. Well, 
I would recommend to look into it anyway (as former developer I'm biased 
again ;) ).

Cheers,
Bernd