[Beowulf] Lustre Upgrades
pedmon at cfa.harvard.edu
Mon Jul 23 11:19:04 PDT 2018
The main issue we see is that OST's get hung up occassionally which
causes writes to hang as the OST flaps connecting and disconnecting with
the MDS. Rebooting the OSS's fixes the issue as it forces the remount.
It seems to only happen when the system is full (i.e. above 95% usage)
and under heavy load. Previous to our CentOS7 upgrade we didn't see
this issue so we are convinced it is due to mismatch in the Lustre
version. Though it is most certainly the case that the fullness of the
filesystem is contributing as it seems to go away when the filesystem
usage is lower. Still I have seen it a few times when the filesystem
was at 85%.
Anyways the obvious culprit is the version mismatch. It may also be
that some of the addition features/enhancements in the 2.5.34 are
conflicting with the mainline version as the 2.5.34 is something we got
from Intel for the IEEL appliance we have been running.
Odds are you systems are fine as they aren't taking quite the pounding
ours is. The problem doesn't happen that frequently.
On 07/23/2018 02:03 PM, Michael Di Domenico wrote:
> On Mon, Jul 23, 2018 at 1:34 PM, Paul Edmon <pedmon at cfa.harvard.edu> wrote:
>> Yeah we've found out firsthand that its problematic as we have been seeing
>> issues :). Hence the urge to upgrade.
> what issues are you seeing? I have 2.10.4 clients pointing at 2.5.1
> servers, haven't seen any obvious issues and it's been running for
> sometime now.
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf