[Beowulf] help for metadata-intensive jobs (imagenet)
Joe Landman
joe.landman at gmail.com
Fri Jun 28 10:51:09 PDT 2019
On 6/28/19 1:47 PM, Mark Hahn wrote:
> Hi all,
> I wonder if anyone has comments on ways to avoid metadata bottlenecks
> for certain kinds of small-io-intensive jobs. For instance, ML on
> imagenet,
> which seems to be a massive collection of trivial-sized files.
>
> A good answer is "beef up your MD server, since it helps everyone".
> That's a bit naive, though (no money-trees here.)
>
> How about things like putting the dataset into squashfs or some other
> image that can be loop-mounted on demand? sqlite? perhaps even a format
> that can simply be mmaped as a whole?
>
> personally, I tend to dislike the approach of having a job stage tons of
> stuff onto node storage (when it exists) simply because that guarantees a
> waste of cpu/gpu/memory resources for however long the stagein takes...
I'd suggest something akin to a collection of ramdisks using zram
distributed across your nodes. Then put a beegfs file system atop
those. Stage in the images. Run.
This is cheap compared to building the storage you actually need for
this workload.
--
Joe Landman
e: joe.landman at gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman
More information about the Beowulf
mailing list