[Beowulf] { hopefully on topic } tools for measuring dir sizes and growth trends on 30TB+ scale-out NAS?

Jeff White jaw171 at pitt.edu
Tue Jul 2 08:34:24 PDT 2013

On 07/02/2013 10:36 AM, Hearns, John wrote:
> > Someone somewhere in beowulf land has certainly dealt with this before
> > ... looking for a clue, tip or URL pointer if possible
> > For well curated directories that use year/month/day in the directory
> > names a simple "du -mcs" recursing to a certain depths works fine for
> > printing out a CSV with a directory name, a size in MB and the
> > year/month it was last modified. That's all I'm really looking for. I
> > want to show growth month-by-month and year-by-year and break it down
> > by top-level directories that match either genome sequencing platform
> > types or big project names...
> I use agedu to display usage charts on storage:
> http://www.chiark.greenend.org.uk/~sgtatham/agedu/ 
> <http://www.chiark.greenend.org.uk/%7Esgtatham/agedu/>
> Its a great tool - you can 'drill down' into each directory and get 
> the underlying usage.
> Web interface naturally.
> Great for standing over users and pointing them towards the amounts of 
> storage they are using.
> Sadly I don't think it does the historic trending that you are looking 
> for.
> You COULD run an agedu scan every day on a cron job and squirrel the 
> results away, but I guess comparing the historic plots would not be easy.
> The contents of this e-mail are confidential and for the exclusive use 
> of the intended recipient. If you are not the intended recipient you 
> should not read, copy, retransmit or disclose its contents. If you 
> have received this email in error please delete it from your system 
> immediately and notify us either by email or telephone. The views 
> expressed in this communication may not necessarily be the views held 
> by McLaren Racing Limited.
> McLaren Racing Limited | McLaren Technology Centre | Chertsey Road | 
> Woking | Surrey | GU21 4YH | UK | Company Number: 01517478 

Rather than using du you can enable disk quotas on the filesystem. Just 
don't set any actual quota and use it just for reporting.  Use xfs_quota 
or whatever to report usage by user group or (if your filesystem 
supports it) directory.  That way you don't have to scan the filesystem 
to get the current usage.

More information about the Beowulf mailing list