[Beowulf] Putting /home on Lusture of GPFS

Prentice Bisbal prentice.bisbal at rutgers.edu
Wed Dec 24 08:38:57 PST 2014


On 12/24/2014 10:58 AM, Joe Landman wrote:
>
> On 12/24/2014 10:54 AM, Prentice Bisbal wrote:
>> Everyone,
>>
>> Thanks for the feedback you've provided to my query below. I'm glad 
>> I'm not the only one who thought of this, and a lot of you raised 
>> very good points I haven't thought about. While I've been following 
>> parallel filesystems for years, I have very little experience 
>> actually managing them up to this point. (My BG/P came with GPFS 
>> filesystem for /scratch, but everything was already setup before I 
>> got here, so I've only had to deal with it when something breaks).
>>
>> You've all convinced me that this may not be an ideal solution 
>> arrangement, but if I go this route, GPFS might be a better fit for 
>> this than Lustre (mainly because Chris Samuels has proven it *is* 
>> possible with GPFS, and GPFS has snapshotting).
>>
>> Joe Landman, as always, has provided a wealth of information, and the 
>> rest of you have pointed out other potential pitfalls. with this 
>> approach.
>>
> My pleasure ... I do think asking James Cuff, Chris Dwan, and others 
> running/managing big kit (and the teams running the kit), what they 
> are doing and why would be quite instructive in a bigger picture sense.
>
> Which to a degree suggests that mebbe a devops/best practices BoF or 
> talk series, or educational workshop at SC15 wouldn't be a bad thing 
> ...  I'd be happy to submit a proposal for this for this year.
>
> Let me know ...

Actually, several other System Admins and I are trying to get more 
emphasis on System Administration at the SC conferences, and to even 
have a SysAdmin track. Talking about practical issues about managing 
filesystems, like those brought up here, would be a great topic to 
include in this.

>
>
>> Thanks again for the feedback, and please keep the conversation going.
>>
>> Prentice
>>
>> On 12/23/2014 12:12 PM, Prentice Bisbal wrote:
>>> Beowulfers,
>>>
>>> I have limited experience managing parallel filesytems like GPFS or 
>>> Lustre. I was discussing putting /home and /usr/local for my cluster 
>>> on a GPFS or Lustre filesystem, in addition to using it just for 
>>> /scratch. I've never done this before, but it doesn't seem like all 
>>> that bad an idea. My logic for this is the following:
>>>
>>> 1. Users often try to run programs from in /home, which leads to 
>>> errors, no matter how many times I tell them not to do that. This 
>>> would make the system more user-friendly. I could use 
>>> quotas/policies to encourage them to use 'steer' them to use other 
>>> filesystems if needed.
>>>
>>> 2. Having one storage system to manage is much better than 3.
>>>
>>> 3. Profit?
>>>
>>> Anyway, another person in the conversation felt that this would be 
>>> bad, because if someone was running a job that would hammer the 
>>> fileystem, it would make the filesystem unresponsive, and keep other 
>>> people from logging in and doing work. I'm not buying this concern 
>>> for the following reasons:
>>>
>>> If a job can hammer your parallel filesystem so that the login nodes 
>>> become unresponsive, you've got bigger problems, because that means 
>>> other jobs can't run on the cluster, and the job hitting the 
>>> filesystem hard has probably slowed down to a crawl, too.
>>>
>>> I know there are some concerns  with the stability of parallel 
>>> filesystems, so if someone wants to comment on the dangers of that, 
>>> too, I'm all ears. I think that the relative instability of parallel 
>>> filesystems compared to NFS would be the biggest concern, not 
>>> performance.
>>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
>



More information about the Beowulf mailing list