[Beowulf] help for metadata-intensive jobs (imagenet)

Sat Jun 29 01:55:14 PDT 2019

I've not had any issues training with ImageNet in the past. We're using
a ZFS box with a large L2ARC over 10GbE. If you are having problems, you
might consider creating a HDF5 of ImageNet? There may even be one on
Academic Torrents or something. I suspect this may help quite a bit.

Interested to hear if you try this!

Thanks,
Aaron

Mark Hahn writes:

> Hi all,
> I wonder if anyone has comments on ways to avoid metadata bottlenecks
> for certain kinds of small-io-intensive jobs.  For instance, ML on imagenet,
> which seems to be a massive collection of trivial-sized files.
>
> A good answer is "beef up your MD server, since it helps everyone".
> That's a bit naive, though (no money-trees here.)
>
> How about things like putting the dataset into squashfs or some other 
> image that can be loop-mounted on demand?  sqlite?  perhaps even a format
> that can simply be mmaped as a whole?
>
> personally, I tend to dislike the approach of having a job stage tons of
> stuff onto node storage (when it exists) simply because that guarantees a
> waste of cpu/gpu/memory resources for however long the stagein takes...
>
> thanks, mark hahn.

-- 
Aaron Jackson - M6PIU
Researcher at University of Nottingham
http://aaronsplace.co.uk/