Deep Learning Weekly Issue #133
Recommendation at Instagram, Snapchat's age GAN, fast sparse ConvNets, Apollo 11 deepfakes, and more...
Happy (almost) Thanksgiving from us in the US. This week in deep learning we bring you a peek into Instagram’s Explore recommender system, new age GANs from Snapchat, Apollo 11 deepfakes from MIT starring Richard Nixon, and some tips on how to recognize AI snakeoil (pdf).
In addition to turkey, you may also enjoy a new paper from FAIR generalizing the lottery ticket hypothesis to other ML tasks, Hypertunity: a hyperparameter optimization library, an exploration of depth estimation models applied to Escher illusions, fast sparse convolutional neural networks, a guide to audio analysis, and a new few-shot face re-enactment model.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Facebook provides details on how it generates Instagram’s content recommendations.
Environments for training RL algorithms with safety constraints from OpenAI.
Deepfakes used to generate video of Nixon reading the (thankfully) unused draft of a speech prepared in case Apollo 11 ended in tragedy.
Snapchat expands its GAN-powered filters with a really impressive age transformer.
An interesting paper shows correlations between real-world use of apps (Messages, email, Google, etc.) and cognitive impairment.
A nice reproduction and expansion of work on the small submodules within neural networks that account for the majority of accuracy.
Exploring what depth estimation models see when pointed at illusions like Escher drawings.
Some heuristics for evaluating myriad AI claims flying around these days.
Getting started with audio analysis. Examples use sklearn and Keras but are very general.
Generating talking head videos with only a few images from the target identity.
Nice introduction to BERT models for those who are visually inclined.
A new dataset of security camera footage containing violence.
Libraries & Code
A toolset for black-box hyperparameter optimisation.
Learning Spatial Fusion for Single-Shot Object Detection
A simple and uniform way for weights and activations quantization by formulating it as a differentiable non-linear function
Papers & Publications
Abstract: …. In this work, we further expand the arsenal of efficient building blocks for neural network architectures; but instead of combining standard primitives (such as convolution), we advocate for the replacement of these dense primitives with their sparse counterparts. While the idea of using sparsity to decrease the parameter count is not new, the conventional wisdom is that this reduction in theoretical FLOPs does not translate into real-world efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly, which we open-source for the benefit of the community as part of the XNNPACK library. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet v1, MobileNet v2 and EfficientNet architectures substantially outperform strong dense baselines on the efficiency-accuracy curve. On Snapdragon 835 our sparse networks outperform their dense equivalents by 1.3−2.4× -- equivalent to approximately one entire generation of MobileNet-family improvement.
Abstract: Adversarial examples are commonly viewed as a threat to ConvNets. Here we present an opposite perspective: adversarial examples can be used to improve image recognition models if harnessed in the right manner. We propose AdvProp, an enhanced adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting. Key to our method is the usage of a separate auxiliary batch norm for adversarial examples, as they have different underlying distributions to normal examples.
We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger. For instance, by applying AdvProp to the latest EfficientNet-B7  on ImageNet, we achieve significant improvements on ImageNet (+0.7%), ImageNet-C (+6.5%), ImageNet-A (+7.0%), Stylized-ImageNet (+4.8%). With an enhanced EfficientNet-B8, our method achieves the state-of-the-art 85.5% ImageNet top-1 accuracy without extra data. This result even surpasses the best model in  which is trained with 3.5B Instagram images (~3000X more than ImageNet) and ~9.4X more parameters.