[Beowulf] help for metadata-intensive jobs (imagenet)

Fri Jun 28 10:51:08 PDT 2019

i'm not familiar with the imagenet set, but i'm suprised you'd see a
bottleneck.  my understanding of the ML image sets is that they're
mostly read.  do you have things like noatime set on the filesystem?
do you know specifically which ops are pounding the metadata?

On Fri, Jun 28, 2019 at 1:47 PM Mark Hahn <hahn at mcmaster.ca> wrote:
>
> Hi all,
> I wonder if anyone has comments on ways to avoid metadata bottlenecks
> for certain kinds of small-io-intensive jobs.  For instance, ML on imagenet,
> which seems to be a massive collection of trivial-sized files.
>
> A good answer is "beef up your MD server, since it helps everyone".
> That's a bit naive, though (no money-trees here.)
>
> How about things like putting the dataset into squashfs or some other
> image that can be loop-mounted on demand?  sqlite?  perhaps even a format
> that can simply be mmaped as a whole?
>
> personally, I tend to dislike the approach of having a job stage tons of
> stuff onto node storage (when it exists) simply because that guarantees a
> waste of cpu/gpu/memory resources for however long the stagein takes...
>
> thanks, mark hahn.
> --
> operator may differ from spokesperson.              hahn at mcmaster.ca
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf