[Beowulf] Small files
Lux, Jim (337C)
james.p.lux at jpl.nasa.gov
Fri Jun 13 11:46:21 PDT 2014
On 6/13/14, 7:03 AM, "Ellis H. Wilson III" <ellis at cse.psu.edu> wrote:
>On 06/13/2014 09:31 AM, Joe Landman wrote:
>> On 06/13/2014 09:17 AM, Skylar Thompson wrote:
>>> We've recently implemented a quota of 1 million files per 1TB of
>>> filesystem space. And yes, we had to clean up a number of groups' and
>>> individuals' spaces before implementing that. There seems to be a trend
>>> in the bioinformatics community for using the filesystem as a database.
>>
>> I wasn't going to say anything about this, but, yes, there are some
>> significant abuses of file systems going on in this community. But this
>> is nothing new, sadly ... I've seen this since the late 90's.
>
>I think we're all probably too close to the tool in question (HPC
>storage). Ultimately this is just a hammer for scientists and other
>non-CS/IT types, so of course they are going to scoff when we tell them
>they are holding the hammer such that it hits sideways. "Who's to tell
>me how to hold the hammer?! This side has more metallic surface area
>anyhow, making it easier to hit the nail this way!"
>
>So you can either:
>a) Fix it transparently with automatic policies/FS's in the back-end.
>(I know of at least one FS that packs small files with metadata
>transparently on SSDs to expedite small file IOPS, but message me
>off-list for that as I start work for that shop soon and don't want to
>so blatantly advertise). There are limits to how much these
Let¹s not let ³concern for efficiency² get in the way of ³users solving
problems². I suspect that for a LOT of problems, buying more/faster
hardware is more cost effective than changing how the
scientist/engineer/user works.
Sure, there are HPC applications which are run repeatedly and for which
performance is very important (numerical weather simulations, for
instance).
If it¹s that big a deal, why not make it transparent: as Ellis gave an
example of a system that ³blocks² small transactions into better ones
transparently. That is the way it should be: the user doesn¹t care how
it happens.
Do you manually manage memory allocation and cacheing? Or do you let the
OS take care of it. Heartbleed is a fine example of what happens someone
tries to ³optimize² the performance.
Obviously, if you¹re a ³developer of HPC² as a opposed to a ³user of HPC²,
then understanding what works better or worse or is more or less efficient
is important. But there¹s a LOT more ³users of HPC² who are NOT
³developers of HPC², and that¹s who should be the focus.
Doesn¹t this harken back to the perennial assembler vs high level language
dispute. I think you should spend your time making better optimizing
compilers (or better languages for specifying what it is you want to do)
rather than advocating programming in assembler.
>
More information about the Beowulf
mailing list