[Beowulf] Lustre Upgrades

Thu Jul 26 01:24:46 PDT 2018

Hi John,

thanks. I should have said that this was one of the reasons I became 
interested in BeeGFS and this experience is some years ago. I believe at the 
time I was not aware of BeeGFS.
In any case, that was at the old workplace and at the current one we don't 
have these demands on the hardware. 

All the best

Jörg

Am Donnerstag, 26. Juli 2018, 09:53:35 BST schrieb John Hearns:
> Jorg,
> you should look at BeeGFS and BeeOnDemand  https://www.beegfs.io/wiki/BeeOND
> 
> On Thu, 26 Jul 2018 at 09:15, Jörg Saßmannshausen <
> 
> sassy-work at sassy.formativ.net> wrote:
> > Dear all,
> > 
> > I once had this idea as well: using the spinning discs which I have in the
> > compute nodes as part of a distributed scratch space. I was using
> > glusterfs
> > for that as I thought it might be a good idea. It was not. The reason
> > behind
> > it is that as soon as a job is creating say 700 GB of scratch data (real
> > job
> > not some fictional one!), the performance of the node which is hosting
> > part of
> > that data approaches zero due to the high disc IO. This meant that the job
> > which was running there was affected. So in the end this led to an
> > installation which got a separate file server for the scratch space.
> > I also should add that this was a rather small setup of 8 nodes and it was
> > a
> > few years back.
> > The problem I found in computational chemistry is that some jobs require
> > either large amount of memory, i.e. significantly more than the usual 2 GB
> > per
> > core, or large amount of scratch space (if there is insufficient memory).
> > You
> > are in trouble if it requires both. :-)
> > 
> > All the best from a still hot London
> > 
> > Jörg
> > 
> > Am Dienstag, 24. Juli 2018, 17:02:43 BST schrieb John Hearns via Beowulf:
> > > Paul, thanks for the reply.
> > > I would like to ask, if I may. I rather like Glustre, but have not
> > 
> > deployed
> > 
> > > it in HPC. I have heard a few people comment about Gluster not working
> > 
> > well
> > 
> > > in HPC. Would you be willing to be more specific?
> > > 
> > > One research site I talked to did the classic 'converged infrastructure'
> > > idea of attaching storage drives to their compute nodes and distributing
> > > Glustre storage. They were not happy with that IW as told, and I can
> > > very
> > > much understand why. But Gluster on dedicated servers I would be
> > 
> > interested
> > 
> > > to hear about.
> > > 
> > > On Tue, 24 Jul 2018 at 16:41, Paul Edmon <pedmon at cfa.harvard.edu> wrote:
> > > > While I agree with you in principle, one also has to deal with the
> > 
> > reality
> > 
> > > > as you find yourself in.  In our case we have more experience with
> > 
> > Lustre
> > 
> > > > than Ceph in an HPC and we got burned pretty badly by Gluster.  While
> > > > I
> > > > like Ceph in principle I haven't seen it do what Lustre can do in a
> > > > HPC
> > > > setting over IB.  Now it may be able to do that, which is great.
> > 
> > However
> > 
> > > > then you have to get your system set up to do that and prove that it
> > 
> > can.
> > 
> > > > After all users have a funny way of breaking things that work
> > > > amazingly
> > > > well in controlled test environs, especially when you have no control
> > 
> > how
> > 
> > > > they will actually use the system (as in a research environment).
> > > > Certainly we are working on exploring this option too as it would be
> > > > awesome and save many headaches.
> > > > 
> > > > Anyways no worries about you being a smartarse, it is a valid point.
> > 
> > One
> > 
> > > > just needs to consider the realities on the ground in ones own
> > > > environment.
> > > > 
> > > > -Paul Edmon-
> > > > 
> > > > On 07/24/2018 10:31 AM, John Hearns via Beowulf wrote:
> > > > 
> > > > Forgive me for saying this, but the philosophy for software defined
> > > > storage such as CEPH and Gluster is that forklift style upgrades
> > > > should
> > > > not
> > > > be necessary.
> > > > When a storage server is to be retired the data is copied onto the new
> > > > server then the old one taken out of service. Well, copied is not the
> > > > correct word, as there are erasure-coded copies of the data.
> > 
> > Rebalanced is
> > 
> > > > probaby a better word.
> > > > 
> > > > Sorry if I am seeming to be a smartarse. I have gone through the pain
> > 
> > of
> > 
> > > > forklift style upgrades in the past when storage arrays reach End of
> > 
> > Life.
> > 
> > > > I just really like the Software Defined Storage mantra - no component
> > > > should be a point of failure.
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> > 
> > Computing
> > 
> > > > To change your subscription (digest mode or unsubscribe) visit
> > > > http://www.beowulf.org/mailman/listinfo/beowulf
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> > 
> > Computing
> > 
> > > > To change your subscription (digest mode or unsubscribe) visit
> > > > http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf