[Beowulf] Putting /home on Lusture of GPFS

Prentice Bisbal prentice.bisbal at rutgers.edu
Wed Dec 24 07:54:12 PST 2014


Thanks for the feedback you've provided to my query below. I'm glad I'm 
not the only one who thought of this, and a lot of you raised very good 
points I haven't thought about. While I've been following parallel 
filesystems for years, I have very little experience actually managing 
them up to this point. (My BG/P came with GPFS filesystem for /scratch, 
but everything was already setup before I got here, so I've only had to 
deal with it when something breaks).

You've all convinced me that this may not be an ideal solution 
arrangement, but if I go this route, GPFS might be a better fit for this 
than Lustre (mainly because Chris Samuels has proven it *is* possible 
with GPFS, and GPFS has snapshotting).

Joe Landman, as always, has provided a wealth of information, and the 
rest of you have pointed out other potential pitfalls. with this approach.

Thanks again for the feedback, and please keep the conversation going.


On 12/23/2014 12:12 PM, Prentice Bisbal wrote:
> Beowulfers,
> I have limited experience managing parallel filesytems like GPFS or 
> Lustre. I was discussing putting /home and /usr/local for my cluster 
> on a GPFS or Lustre filesystem, in addition to using it just for 
> /scratch. I've never done this before, but it doesn't seem like all 
> that bad an idea. My logic for this is the following:
> 1. Users often try to run programs from in /home, which leads to 
> errors, no matter how many times I tell them not to do that. This 
> would make the system more user-friendly. I could use quotas/policies 
> to encourage them to use 'steer' them to use other filesystems if needed.
> 2. Having one storage system to manage is much better than 3.
> 3. Profit?
> Anyway, another person in the conversation felt that this would be 
> bad, because if someone was running a job that would hammer the 
> fileystem, it would make the filesystem unresponsive, and keep other 
> people from logging in and doing work. I'm not buying this concern for 
> the following reasons:
> If a job can hammer your parallel filesystem so that the login nodes 
> become unresponsive, you've got bigger problems, because that means 
> other jobs can't run on the cluster, and the job hitting the 
> filesystem hard has probably slowed down to a crawl, too.
> I know there are some concerns  with the stability of parallel 
> filesystems, so if someone wants to comment on the dangers of that, 
> too, I'm all ears. I think that the relative instability of parallel 
> filesystems compared to NFS would be the biggest concern, not 
> performance.

More information about the Beowulf mailing list