[Beowulf] Small files

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Fri Jun 13 11:46:21 PDT 2014



On 6/13/14, 7:03 AM, "Ellis H. Wilson III" <ellis at cse.psu.edu> wrote:

>On 06/13/2014 09:31 AM, Joe Landman wrote:
>> On 06/13/2014 09:17 AM, Skylar Thompson wrote:
>>> We've recently implemented a quota of 1 million files per 1TB of
>>> filesystem space. And yes, we had to clean up a number of groups' and
>>> individuals' spaces before implementing that. There seems to be a trend
>>> in the bioinformatics community for using the filesystem as a database.
>>
>> I wasn't going to say anything about this, but, yes, there are some
>> significant abuses of file systems going on in this community.  But this
>> is nothing new, sadly ...  I've seen this since the late 90's.
>
>I think we're all probably too close to the tool in question (HPC
>storage).  Ultimately this is just a hammer for scientists and other
>non-CS/IT types, so of course they are going to scoff when we tell them
>they are holding the hammer such that it hits sideways.  "Who's to tell
>me how to hold the hammer?!  This side has more metallic surface area
>anyhow, making it easier to hit the nail this way!"
>
>So you can either:
>a) Fix it transparently with automatic policies/FS's in the back-end.
>(I know of at least one FS that packs small files with metadata
>transparently on SSDs to expedite small file IOPS, but message me
>off-list for that as I start work for that shop soon and don't want to
>so blatantly advertise).  There are limits to how much these





Let¹s not let ³concern for efficiency² get in the way of ³users solving
problems².  I suspect that for a LOT of problems, buying more/faster
hardware is more cost effective than changing how the
scientist/engineer/user works.

Sure, there are HPC applications which are run repeatedly and for which
performance is very important (numerical weather simulations, for
instance).

If it¹s that big a deal, why not make it transparent:  as Ellis gave an
example of a system that ³blocks² small transactions into better ones
transparently.  That is the way it should be:  the user doesn¹t care how
it happens.

Do you manually manage memory allocation and cacheing?  Or do you let the
OS take care of it.  Heartbleed is a fine example of what happens someone
tries to ³optimize² the performance.

Obviously, if you¹re a ³developer of HPC² as a opposed to a ³user of HPC²,
then understanding what works better or worse or is more or less efficient
is important.  But there¹s a LOT more ³users of HPC² who are NOT
³developers of HPC², and that¹s who should be the focus.

Doesn¹t this harken back to the perennial assembler vs high level language
dispute.  I think you should spend your time making better optimizing
compilers (or better languages for specifying what it is you want to do)
rather than advocating programming in assembler.




>



More information about the Beowulf mailing list