[Beowulf] { hopefully on topic } tools for measuring dir sizes and growth trends on 30TB+ scale-out NAS?

ChrisDag dag at sonsorol.org
Tue Jul 2 07:29:42 PDT 2013

Someone somewhere in beowulf land has certainly dealt with this before 
... looking for a clue, tip or URL pointer if possible

I'm trying to stay on top of capacity planning and generating useful 
reports on how an expensive scale-out NAS volume is being used 
month-to-month. The 200TB namespace is filling up fast and I'm 
particularly interested in tracking about 30TB in genomic data that is 
growing at about 1.5TB/mo with both instrument and human generated data.

For well curated directories that use year/month/day in the directory 
names a simple "du -mcs" recursing to a certain depths works fine for 
printing out a CSV with a directory name, a size in MB and the 
year/month it was last modified. That's all I'm really looking for. I 
want to show growth month-by-month and year-by-year and break it down by 
top-level directories that match either genome sequencing platform types 
or big project names...

The manual / hacked / du methods are starting to fall over and/or just 
take too much human time to deal with.

Anyone aware of an open source or freely available system for reporting 
on NAS usage trends? Something that can dump into a database so I could 
do custom reporting off of the results?

I figure someone somewhere has written a smart and efficient filesystem 
trawler that can dump into a mySQL table or similar.  I could hack 
something together myself but even a quick look at the CLI tools and 
misc perl/python modles for file and dir statistics seems to indicate 
that there are a lot of possibilities to make a dumb novice mistakes in 
the traversal, the size summing or the reporting. I'd like to avoid my 
own bad coding if at all possible.

Anyone aware of systems that do something like this?


More information about the Beowulf mailing list