Deep Learning Weekly Issue #113
TensorFlow 2.0, PyTorch Hub, training sparse graphs, self-attension for images, visualizing BERT and more...
This week in deep learning we bring you news of the TensorFlow 2.0 beta, a look at the carbon footprint of AI, a new reinforcement learning paper from DeepMind, and a look at PyTorch Hub.
You may also enjoy MelNet, an unconditional frequency-based text-to-speech model, visualizations of BERT embeddings, a deep dive into what EfficientNet looks at to make predictions, a new method for finding sparse subnetworks, and Selfie, an application of self-supervised pretraining to image embedding.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
TensorFlow 2.0 has finally reached beta. The alpha has been out since the TFDev Summit earlier this year, but with the beta, the APIs are locked and final.
Training a single AI model can emit as much carbon as five cars in their lifetimes [MIT Technology Review]
A new study finds that training a single state-of-the-art NLP model has nearly 17 times the average carbon footprint of a typical American over a year’s time, mostly due to neural architecture search.
Capture the Flag: the emergence of complex cooperative agents [DeepMind]
DeepMind describes a new RL algorithm that beats human players in the Quake III Arena.
Towards Reproducible Research with PyTorch Hub
Similar to TensorFlow Hub, PyTorch Hub will allow developers to import graphs and pre-trained weights from a simple API.
Intel researchers compress AI models without compromising accuracy [VentureBeat]
A new sparse network training algorithm by from researchers at Intel.
MelNet: A Generative Model for Audio in the Frequency Domain
Text-to-speech using unconditional spectrogram generation. Works with both voice and music.
Text-based Editing of Talking-head Video
Update the text of a transcript, neural networks generate audio and video frames to match.
Language, trees, and geometry in neural networks
A visualization technique to understand BERT.
A nice deep dive into on-device training and supported operations in Apple’s machine learning framework.
Libraries & Code
An interesting visualization of what Google’s new EfficientNet is looking at in comparison to other popular model architectures.
A new technique to find small, efficient subnetworks in over-parameterized models to reduce training time and model size.
SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver
Papers & Publications
Butterfly Transform: An Efficient FFT Based Neural Architecture Design
Abstract: In this paper, we introduce the Butterfly Transform (BFT), a light weight channel fusion method that reduces the computational complexity of point-wise convolutions from O(n^2) of conventional solutions to O(n log n) with respect to the number of channels while improving the accuracy of the networks under the same range of FLOPs. The proposed BFT generalizes the Discrete Fourier Transform in a way that its parameters are learned at training time. Our experimental evaluations show that replacing channel fusion modules with results in significant accuracy gains at similar FLOPs across a wide range of network architectures….
Selfie: Self-supervised Pretraining for Image Embedding
We introduce a pretraining technique called Selfie, which stands for SELF-supervised Image Embedding. Selfie generalizes the concept of masked language modeling to continuous data, such as images. Given masked-out patches in an input image, our method learns to select the correct patch, among other "distractor" patches sampled from the same image, to fill in the masked location. This classification objective sidesteps the need for predicting exact pixel values of the target patches….Our pretraining method provides consistent improvements to ResNet-50 across all settings compared to the standard supervised training of the same network….Our pretraining method also improves ResNet-50 training stability, especially on low data regime, by significantly lowering the standard deviation of test accuracies across datasets.