[Beowulf] { hopefully on topic } tools for measuring dir sizes and growth trends on 30TB+ scale-out NAS?
ChrisDag
dag at sonsorol.org
Tue Jul 2 07:29:42 PDT 2013
Someone somewhere in beowulf land has certainly dealt with this before
... looking for a clue, tip or URL pointer if possible
I'm trying to stay on top of capacity planning and generating useful
reports on how an expensive scale-out NAS volume is being used
month-to-month. The 200TB namespace is filling up fast and I'm
particularly interested in tracking about 30TB in genomic data that is
growing at about 1.5TB/mo with both instrument and human generated data.
For well curated directories that use year/month/day in the directory
names a simple "du -mcs" recursing to a certain depths works fine for
printing out a CSV with a directory name, a size in MB and the
year/month it was last modified. That's all I'm really looking for. I
want to show growth month-by-month and year-by-year and break it down by
top-level directories that match either genome sequencing platform types
or big project names...
The manual / hacked / du methods are starting to fall over and/or just
take too much human time to deal with.
Anyone aware of an open source or freely available system for reporting
on NAS usage trends? Something that can dump into a database so I could
do custom reporting off of the results?
I figure someone somewhere has written a smart and efficient filesystem
trawler that can dump into a mySQL table or similar. I could hack
something together myself but even a quick look at the CLI tools and
misc perl/python modles for file and dir statistics seems to indicate
that there are a lot of possibilities to make a dumb novice mistakes in
the traversal, the size summing or the reporting. I'd like to avoid my
own bad coding if at all possible.
Anyone aware of systems that do something like this?
Regards,
Chris
More information about the Beowulf
mailing list