[Beowulf] Considering BeeGFS for parallel file system

Mon Mar 18 21:48:33 PDT 2019

Hi,

I suggest also to read the license, because it is not a standard open source one. Depending on your situation this might not be an issue. As far as I remember:
- As a service provider you need a xontract with Thinkparq to provide BeeGFS to others. 
- Thinkparq reserves for themselves the copyright on changes you perform in the source code. 
Just somw things to be aware of. 

In comparison, GPFS  is totally closed source, but Lustre is GPL (or was it LGPL?). 

Cheerio, Jan 
-- 
Jan Wender - j.wender at web.de

> Am 18.03.2019 um 20:32 schrieb Joshua Baker-LePain <joshua.bakerlepain at gmail.com>:
> 
>> On Mon, Mar 18, 2019 at 8:52 AM Will Dennis <wdennis at nec-labs.com> wrote:
>> 
>> I am considering using BeeGFS for a parallel file system for one (and if successful, more) of our clusters here. Just wanted to get folks’ opinions on that, and if there is any “gotchas” or better-fit solutions out there... The first cluster I am considering it for has ~50TB storage off a single ZFS server serving the data over NFS currently; looking to increase not only storage capacity, but also I/O speed. The cluster nodes that are consuming the storage have 10GbaseT interconnects, as does the ZFS server. As we are a smaller shop, want to keep the solution simple. BeeGFS was recommended to me as a good solution off another list, and wanted to get people’s opinions off this list.
> 
> We're in the midst of migrating our cluster storage from a, err,
> network appliance to BeeGFS.  We currently have 4 storage servers (2
> HA pairs) and 2 metadata servers (each running 4 metadata threads,
> mirrored between the servers) serving 1.4PB of available space.  As
> configured, we've seen the system put out over 600,000 IOPS and
> aggregrate read speeds of over 12,000MB/s.  We're actually going to be
> adding 6 more storage servers and 2 more metadata servers in the near
> future.  So, yeah, we're pretty happy with it.  One rather nice
> feature is the ability to see, at any point, which users and/or hosts
> are generating the most load.
> 
> That being said, there are currently a few of gotchas/pain points:
> 
> 1) We're using ZFS under BeeGFS, and the storage servers are rather
> cycle hungry.  If you go that route, get boxes with lots of fast
> cores.
> 
> 2) In previous versions, you could mix and match point releases
> between servers and clients -- as long as the major version was the
> same, you were fine.  As of v7, that's no longer the case.  IOW,
> moving from 7.0 to 7.1 requires unmounting all the clients, shutting
> down all the daemons, updating all the software, and then restarting
> everything.  Painful.
> 
> 3) Also as of v7, the mgmtd service is *critical*.  Any communication
> interruption to/from the mgmtd results in the clients immediately
> hanging.  And, unlike storage and metadata, there is currently no
> mirroring/HA mechanism within BeeGFS for the mgmtd.
> 
> We do have a support contract and the folks from Thinkparq are
> responsive.  If you have more questions, please feel free to ask away.
> 
> -- 
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf