Deep Learning Weekly Issue #116
Distilling BERT, deep learning containers, Stephen Wolfram's testimony on AI, and more...
This week in deep learning we bring you a new shoe try on app from Gucci, a TensorBoard replacement from Microsoft, deep learning containers from Google, and a new recommendation model from Facebook.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Gucci has added an ML-powered home try on feature to their app.
The creators of an app that used GANs to replace women’s clothes with nude bodies has shut down. This is not controversial and we’re glad this app got shutdown—it shouldn’t have been made in the first place.
Microsoft officially launches a TensorBoard replacement.
Google Cloud users can now use pre-packaged containers with popular deep learning libraries pre-installed.
Facebook releases a deep learning model suited for sparse categorical data often found in recommender systems.
Fast.ai’s new course using Swift for TensorFlow is out.
Researchers at MIT design a new programing language specifically tailored for AI model developement.
Stephen Wolfram on the future of content curation and other challenges related to AI.
Interesting application of distillation to reduce the computation cost of large transformers.
New research suggests that ImageNet models derive most of their predictive power from texture, not shapes.
Nathan Branch and Ian Hogarth have released their annual AI report.
Great blog post exploring the differences in style transfer quality with and without robust feature extractors.
Libraries & Code
Monocular Total Capture: Posing Face, Body, and Hands in the Wild
A Core ML compatible version of BERT.
The first pieces are in place for compiling TensorFlow code down to MLIR.
Papers & Publications
Abstract: Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-of-the-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet.
Abstract: ….[W]e investigate how learned, specialized data augmentation policies improve generalization performance for detection models. Importantly, these augmentation policies only affect training and leave a trained model unchanged during evaluation. Experiments on the COCO dataset indicate that an optimized data augmentation policy improves detection accuracy by more than +2.3 mAP, and allow a single inference model to achieve a state-of-the-art accuracy of 50.7 mAP. Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy…