[Beowulf] NFS alternative for 200 core compute (beowulf) cluster

leo camilo lhcamilo at gmail.com
Thu Aug 10 19:53:21 UTC 2023


Hi Robert,

Thanks for your reply.

I am pretty sure the storage is going over ethernet (cat6 gigabit, 10gig
copper is comming soon, maybe). I was not aware I could use NFS over IB.

I will try running the tests over the weekend.

thanks for the tip.

Best,

leo

On Thu, 10 Aug 2023 at 21:43, Robert Taylor <rgt at wi.mit.edu> wrote:

> Two 4tb spinning drives are not going to have a lot of throughput, and
> with 40 tasks all working on different files, if it's random IO, I think
> they will get crushed.
>
> What are the sequential read and write rates from any one node doing
> single threaded io to the nfs server?
>
> Can you do a dd test?
> This should write a 1gig file straight from memory on the node it is run
> on.
>
> dd if=/dev/zero of=/mnt/nfsshare bs=1M count=1000
>
> (make sure zfs compression is off, or that will give bogus numbers)
> You should get a time summary, and a throughput speed.
> That is pure sequential IO that comes from memory, which is probably the
> best that one machine can do, (unless the dd becomes cpu bound).
>
> We have some high end netapp and isilon storage systems where I work, and
> I've gotten between 400MB/s to 1GB/s out of nfs, and the 1gig I believe was
> bottlenecked at the source node, because all it had was a 10 gig connection
> to the network. Once I can get the nodes to 25g, I will test again, but I'm
> not there yet.
>
> Also are you sure the storage is going over IB and not gige? (is the cat6e
> 1gig ethernet, or do you have copper 10gig)
>
> rgt
>
>
>
>
>
> On Thu, Aug 10, 2023 at 3:29 PM Bernd Schubert <bernd.schubert at fastmail.fm>
> wrote:
>
>>
>>
>> On 8/10/23 21:18, leo camilo wrote:
>> > Hi everyone,
>> >
>> > I was hoping I would seek some sage advice from you guys.
>> >
>> > At my department we have build this small prototyping cluster with 5
>> > compute nodes,1 name node and 1 file server.
>> >
>> > Up until now, the name node contained the scratch partition, which
>> > consisted of 2x4TB HDD, which form an 8 TB striped zfs pool. The pool
>> is
>> > shared to all the nodes using nfs. The compute nodes and the name node
>> > and compute nodes are connected with both cat6 ethernet net cable and
>> > infiniband. Each compute node has 40 cores.
>> >
>> > Recently I have attempted to launch computation from each node (40
>> tasks
>> > per node), so 1 computation per node.  And the performance was abysmal.
>> > I reckon I might have reached the limits of NFS.
>> >
>> > I then realised that this was due to very poor performance from NFS. I
>> > am not using stateless nodes, so each node has about 200 GB of SSD
>> > storage and running directly from there was a lot faster.
>> >
>> > So, to solve the issue,  I reckon I should replace NFS with something
>> > better. I have ordered 2x4TB NVMEs  for the new scratch and I was
>> > thinking of :
>> >
>> >   * using the 2x4TB NVME in a striped ZFS pool and use a single node
>> >     GlusterFS to replace NFS
>> >   * using the 2x4TB NVME with GlusterFS in a distributed arrangement
>> >     (still single node)
>> >
>> > Some people told me to use lustre,but I reckon that might be overkill.
>> > And I would only use a single fileserver machine(1 node).
>> >
>> > Could you guys give me some sage advice here?
>> >
>>
>> So glusterfs is using fuse, which doesn't have the best performance
>> reputation (although hopefully not for long - feel free to search for
>> "fuse" + "uring").
>>
>> If you want to avoid complexity of Lustre, maybe look into BeeGFS. Well,
>> I would recommend to look into it anyway (as former developer I'm biased
>> again ;) ).
>>
>>
>> Cheers,
>> Bernd
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20230810/7d2bbe1b/attachment.htm>


More information about the Beowulf mailing list