[Beowulf] Small files

Kilian Cavalotti kilian.cavalotti.work at gmail.com
Thu Jun 12 10:19:48 PDT 2014

Hi Tom,

On Wed, Jun 11, 2014 at 12:03 PM, Tom Harvill <tom.harvill at unl.edu> wrote:
> I want to ask this general question: how does your shop deal with the
> general problem of
> small files in filesystems on (beowulf) compute clusters? Specifically,
> files that users expect
> to actively use for read and write operations for their research.
> Do you distinguish and segregate them (and/or the people that use them) on
> special
> hardware/filesystems?

Segregating small files on their own filesystem could be an idea. You
could also enforce usage quotas on inodes, so that their overall
number stays in reasonable ranges.

Other than that, you could also use Robinhood
(http://robinhood.sf.net) to track and monitor your filesystem usage.
It's especially developed for Lustre (although you can use it on any
kind of POSIX filesystem) and can take advantage of Lustre changelogs
for an always up-to-date view of your filesystems. It's a Policy
Engine, so you can define file classes and actions or alerts that you
can trigger on specific conditions (for instance if a directory
contains more than a certain amount of files). That can be very handy
to determine if users are within your site best practices or not, and
to help them adapt their workflow if needed.

See: http://www.hpcwire.com/off-the-wire/cea-releases-robinhood-2-5/
and http://opensfs.org/wp-content/uploads/2013/04/lug13-robinhood.pdf

All those solutions require the same level of communication and user
education, though. :)


More information about the Beowulf mailing list