Deep Learning Weekly Issue #116
Distilling BERT, deep learning containers, Stephen Wolfram's testimony on AI, and more...
This week in deep learning we bring you a new shoe try on app from Gucci, a TensorBoard replacement from Microsoft, deep learning containers from Google, and a new recommendation model from Facebook.
You may also enjoy Fast.ai’s new course on Swift for TensorFlow, Stephen Wolfram’s testimony to congress on AI, a look at what neural classifiers use to make devices, and the State of AI report.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Gucci’s iOS app lets you try on shoes remotely in AR [VentureBeat]
Gucci has added an ML-powered home try on feature to their app.
Controversial deepfake app DeepNude shuts down hours after being exposed [The Verge]
The creators of an app that used GANs to replace women’s clothes with nude bodies has shut down. This is not controversial and we’re glad this app got shutdown—it shouldn’t have been made in the first place.
Microsoft makes AI debugging and visualization tool TensorWatch open source
Microsoft officially launches a TensorBoard replacement.
Google: Introducing Deep Learning Containers
Google Cloud users can now use pre-packaged containers with popular deep learning libraries pre-installed.
DLRM: An advanced, open source deep learning recommendation model
Facebook releases a deep learning model suited for sparse categorical data often found in recommender systems.
Fast.ai’s Deep Learning from the Foundations with Swift for TensorFlow
Fast.ai’s new course using Swift for TensorFlow is out.
New AI programming language goes beyond deep learning [MIT News]
Researchers at MIT design a new programing language specifically tailored for AI model developement.
Stephen Wolfram: Testifying at the Senate about A.I.
Stephen Wolfram on the future of content curation and other challenges related to AI.
Distilling BERT — How to achieve BERT performance using Logistic Regression
Interesting application of distillation to reduce the computation cost of large transformers.
Where We See Shapes, AI Sees Textures [Quanta Magazine]
New research suggests that ImageNet models derive most of their predictive power from texture, not shapes.
Nathan Branch and Ian Hogarth have released their annual AI report.
Neural Style Transfer with Adversarially Robust Classifiers
Great blog post exploring the differences in style transfer quality with and without robust feature extractors.
Libraries & Code
Monocular Total Capture: Posing Face, Body, and Hands in the Wild
A Core ML compatible version of BERT.
Multi-Level Intermediate Representation (MLIR) comes to TensorFlow
The first pieces are in place for compiling TensorFlow code down to MLIR.
Papers & Publications
Importance Estimation for Neural Network Pruning
Abstract: Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-of-the-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet.
Learning Data Augmentation Strategies for Object Detection
Abstract: ….[W]e investigate how learned, specialized data augmentation policies improve generalization performance for detection models. Importantly, these augmentation policies only affect training and leave a trained model unchanged during evaluation. Experiments on the COCO dataset indicate that an optimized data augmentation policy improves detection accuracy by more than +2.3 mAP, and allow a single inference model to achieve a state-of-the-art accuracy of 50.7 mAP. Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy…