[Beowulf] help for metadata-intensive jobs (imagenet)

Sat Jun 29 02:42:01 PDT 2019

Hi John, great to hear from you. I assume you are asking about image
augmentation and pre-processing.
There are more or less standard steps to organise the downloaded images. If
you google you should be able to find suitable scripts. I recalled I
followed the ones provided by Soumith Chintala but he also used bits
provided by someone else. The thing is you do it once and then forget about
it. You can also remove some bad images. I recall there are some which give
a warning on read due to bad EXIF info etc, these can be over-written.
Cropping to the relevant area using the bounding boxes might be an
interesting option.
Augmentation is more interesting. There are many papers covering the
overall training process from scratch. Reading "Training ImageNet in one
hour " could be one starting option https://arxiv.org/abs/1706.02677
Then follow the references on data augmentation and you'll end up with a
few key papers which everyone references.
The ResNet "school" does things slightly differently than VGG.
Horovod provides examples for starters
https://github.com/horovod/horovod/tree/master/examples
What they don't do is random cropping.
Also keep in mind how the final quality of the training is assessed -
random crop, central crop, nine crops + reflection etc.

Thanks for the pointer to the new meetup. I love both HPC and AI. However I
don't see the announcement about the meeting on 21 August. Hope it will
appear later.

On Sat, 29 Jun 2019 at 07:49, John Hearns via Beowulf <beowulf at beowulf.org>
wrote:

> Igor, if there are any papers published on what you are doing with these
> images I would be very interested.
> I went to the new London HPC and AI Meetup on Thursday, one talk was by
> Odin Vision which was excellent.
> Recommend the new Meetup to anyone in the area. Next meeting 21st August.
>
> And a plug to Verne Global - they provided free Icelandic beer.
>
> On Sat, 29 Jun 2019 at 05:43, INKozin via Beowulf <beowulf at beowulf.org>
> wrote:
>
>> Converting the files to TF records or similar would be one obvious
>> approach if you are concerned about meta data. But then I d understand why
>> some people would not want that (size, augmentation process). I assume you
>> are are doing the training in a distributed fashion using MPI via Horovod
>> or similar and it might be tempting to do file partitioning across the
>> nodes. However doing so introduces a bias into minibatches (and custom
>> preprocessing). If you partition carefully by mapping classes to nodes it
>> may work but I also understand why some wouldn't be totally happy with
>> that. Ive trained keras/TF/horovod models on imagenet using up to 6 nodes
>> each with four p100/v100 and it worked reasonably well. As the training
>> still took a few days copying to local NVMe disks was a good option.
>> Hth
>>
>> On Fri, 28 Jun 2019, 18:47 Mark Hahn, <hahn at mcmaster.ca> wrote:
>>
>>> Hi all,
>>> I wonder if anyone has comments on ways to avoid metadata bottlenecks
>>> for certain kinds of small-io-intensive jobs.  For instance, ML on
>>> imagenet,
>>> which seems to be a massive collection of trivial-sized files.
>>>
>>> A good answer is "beef up your MD server, since it helps everyone".
>>> That's a bit naive, though (no money-trees here.)
>>>
>>> How about things like putting the dataset into squashfs or some other
>>> image that can be loop-mounted on demand?  sqlite?  perhaps even a format
>>> that can simply be mmaped as a whole?
>>>
>>> personally, I tend to dislike the approach of having a job stage tons of
>>> stuff onto node storage (when it exists) simply because that guarantees a
>>> waste of cpu/gpu/memory resources for however long the stagein takes...
>>>
>>> thanks, mark hahn.
>>> --
>>> operator may differ from spokesperson.              hahn at mcmaster.ca
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190629/d907de2c/attachment-0001.html>