[Beowulf] Putting /home on Lusture of GPFS

Wed Dec 24 07:58:35 PST 2014

On 12/24/2014 10:54 AM, Prentice Bisbal wrote:
> Everyone,
>
> Thanks for the feedback you've provided to my query below. I'm glad 
> I'm not the only one who thought of this, and a lot of you raised very 
> good points I haven't thought about. While I've been following 
> parallel filesystems for years, I have very little experience actually 
> managing them up to this point. (My BG/P came with GPFS filesystem for 
> /scratch, but everything was already setup before I got here, so I've 
> only had to deal with it when something breaks).
>
> You've all convinced me that this may not be an ideal solution 
> arrangement, but if I go this route, GPFS might be a better fit for 
> this than Lustre (mainly because Chris Samuels has proven it *is* 
> possible with GPFS, and GPFS has snapshotting).
>
> Joe Landman, as always, has provided a wealth of information, and the 
> rest of you have pointed out other potential pitfalls. with this 
> approach.
>
My pleasure ... I do think asking James Cuff, Chris Dwan, and others 
running/managing big kit (and the teams running the kit), what they are 
doing and why would be quite instructive in a bigger picture sense.

Which to a degree suggests that mebbe a devops/best practices BoF or 
talk series, or educational workshop at SC15 wouldn't be a bad thing 
...  I'd be happy to submit a proposal for this for this year.

Let me know ...

> Thanks again for the feedback, and please keep the conversation going.
>
> Prentice
>
> On 12/23/2014 12:12 PM, Prentice Bisbal wrote:
>> Beowulfers,
>>
>> I have limited experience managing parallel filesytems like GPFS or 
>> Lustre. I was discussing putting /home and /usr/local for my cluster 
>> on a GPFS or Lustre filesystem, in addition to using it just for 
>> /scratch. I've never done this before, but it doesn't seem like all 
>> that bad an idea. My logic for this is the following:
>>
>> 1. Users often try to run programs from in /home, which leads to 
>> errors, no matter how many times I tell them not to do that. This 
>> would make the system more user-friendly. I could use quotas/policies 
>> to encourage them to use 'steer' them to use other filesystems if 
>> needed.
>>
>> 2. Having one storage system to manage is much better than 3.
>>
>> 3. Profit?
>>
>> Anyway, another person in the conversation felt that this would be 
>> bad, because if someone was running a job that would hammer the 
>> fileystem, it would make the filesystem unresponsive, and keep other 
>> people from logging in and doing work. I'm not buying this concern 
>> for the following reasons:
>>
>> If a job can hammer your parallel filesystem so that the login nodes 
>> become unresponsive, you've got bigger problems, because that means 
>> other jobs can't run on the cluster, and the job hitting the 
>> filesystem hard has probably slowed down to a crawl, too.
>>
>> I know there are some concerns  with the stability of parallel 
>> filesystems, so if someone wants to comment on the dangers of that, 
>> too, I'm all ears. I think that the relative instability of parallel 
>> filesystems compared to NFS would be the biggest concern, not 
>> performance.
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
twtr : @scalableinfo
phone: +1 734 786 8423 x121
cell : +1 734 612 4615