[Beowulf] Lustre Upgrades

Paul Edmon pedmon at cfa.harvard.edu
Mon Jul 23 11:19:04 PDT 2018


The main issue we see is that OST's get hung up occassionally which 
causes writes to hang as the OST flaps connecting and disconnecting with 
the MDS.  Rebooting the OSS's fixes the issue as it forces the remount.  
It seems to only happen when the system is full (i.e. above 95% usage) 
and under heavy load.  Previous to our CentOS7 upgrade we didn't see 
this issue so we are convinced it is due to mismatch in the Lustre 
version.  Though it is most certainly the case that the fullness of the 
filesystem is contributing as it seems to go away when the filesystem 
usage is lower.  Still I have seen it a few times when the filesystem 
was at 85%.

Anyways the obvious culprit is the version mismatch.  It may also be 
that some of the addition features/enhancements in the 2.5.34 are 
conflicting with the mainline version as the 2.5.34 is something we got 
from Intel for the IEEL appliance we have been running.

Odds are you systems are fine as they aren't taking quite the pounding 
ours is.  The problem doesn't happen that frequently.

-Paul Edmon-


On 07/23/2018 02:03 PM, Michael Di Domenico wrote:
> On Mon, Jul 23, 2018 at 1:34 PM, Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>> Yeah we've found out firsthand that its problematic as we have been seeing
>> issues :).  Hence the urge to upgrade.
> what issues are you seeing?  I have 2.10.4 clients pointing at 2.5.1
> servers, haven't seen any obvious issues and it's been running for
> sometime now.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list