Deep Learning Weekly Issue #113
TensorFlow 2.0, PyTorch Hub, training sparse graphs, self-attension for images, visualizing BERT and more...
You may also enjoy MelNet, an unconditional frequency-based text-to-speech model, visualizations of BERT embeddings, a deep dive into what EfficientNet looks at to make predictions, a new method for finding sparse subnetworks, and Selfie, an application of self-supervised pretraining to image embedding.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
TensorFlow 2.0 has finally reached beta. The alpha has been out since the TFDev Summit earlier this year, but with the beta, the APIs are locked and final.
A new study finds that training a single state-of-the-art NLP model has nearly 17 times the average carbon footprint of a typical American over a year’s time, mostly due to neural architecture search.
DeepMind describes a new RL algorithm that beats human players in the Quake III Arena.
Similar to TensorFlow Hub, PyTorch Hub will allow developers to import graphs and pre-trained weights from a simple API.
A new sparse network training algorithm by from researchers at Intel.
Text-to-speech using unconditional spectrogram generation. Works with both voice and music.
Update the text of a transcript, neural networks generate audio and video frames to match.
A visualization technique to understand BERT.
A nice deep dive into on-device training and supported operations in Apple’s machine learning framework.
Libraries & Code
An interesting visualization of what Google’s new EfficientNet is looking at in comparison to other popular model architectures.
A new technique to find small, efficient subnetworks in over-parameterized models to reduce training time and model size.
SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver
Papers & Publications
Abstract: In this paper, we introduce the Butterfly Transform (BFT), a light weight channel fusion method that reduces the computational complexity of point-wise convolutions from O(n^2) of conventional solutions to O(n log n) with respect to the number of channels while improving the accuracy of the networks under the same range of FLOPs. The proposed BFT generalizes the Discrete Fourier Transform in a way that its parameters are learned at training time. Our experimental evaluations show that replacing channel fusion modules with results in significant accuracy gains at similar FLOPs across a wide range of network architectures….
We introduce a pretraining technique called Selfie, which stands for SELF-supervised Image Embedding. Selfie generalizes the concept of masked language modeling to continuous data, such as images. Given masked-out patches in an input image, our method learns to select the correct patch, among other "distractor" patches sampled from the same image, to fill in the masked location. This classification objective sidesteps the need for predicting exact pixel values of the target patches….Our pretraining method provides consistent improvements to ResNet-50 across all settings compared to the standard supervised training of the same network….Our pretraining method also improves ResNet-50 training stability, especially on low data regime, by significantly lowering the standard deviation of test accuracies across datasets.