[Beowulf] Considering BeeGFS for parallel file system

Mon Mar 18 12:32:37 PDT 2019

On Mon, Mar 18, 2019 at 8:52 AM Will Dennis <wdennis at nec-labs.com> wrote:
>
> I am considering using BeeGFS for a parallel file system for one (and if successful, more) of our clusters here. Just wanted to get folks’ opinions on that, and if there is any “gotchas” or better-fit solutions out there... The first cluster I am considering it for has ~50TB storage off a single ZFS server serving the data over NFS currently; looking to increase not only storage capacity, but also I/O speed. The cluster nodes that are consuming the storage have 10GbaseT interconnects, as does the ZFS server. As we are a smaller shop, want to keep the solution simple. BeeGFS was recommended to me as a good solution off another list, and wanted to get people’s opinions off this list.

We're in the midst of migrating our cluster storage from a, err,
network appliance to BeeGFS.  We currently have 4 storage servers (2
HA pairs) and 2 metadata servers (each running 4 metadata threads,
mirrored between the servers) serving 1.4PB of available space.  As
configured, we've seen the system put out over 600,000 IOPS and
aggregrate read speeds of over 12,000MB/s.  We're actually going to be
adding 6 more storage servers and 2 more metadata servers in the near
future.  So, yeah, we're pretty happy with it.  One rather nice
feature is the ability to see, at any point, which users and/or hosts
are generating the most load.

That being said, there are currently a few of gotchas/pain points:

1) We're using ZFS under BeeGFS, and the storage servers are rather
cycle hungry.  If you go that route, get boxes with lots of fast
cores.

2) In previous versions, you could mix and match point releases
between servers and clients -- as long as the major version was the
same, you were fine.  As of v7, that's no longer the case.  IOW,
moving from 7.0 to 7.1 requires unmounting all the clients, shutting
down all the daemons, updating all the software, and then restarting
everything.  Painful.

3) Also as of v7, the mgmtd service is *critical*.  Any communication
interruption to/from the mgmtd results in the clients immediately
hanging.  And, unlike storage and metadata, there is currently no
mirroring/HA mechanism within BeeGFS for the mgmtd.

We do have a support contract and the folks from Thinkparq are
responsive.  If you have more questions, please feel free to ask away.

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF