Deep Learning Weekly Issue #113

TensorFlow 2.0, PyTorch Hub, training sparse graphs, self-attension for images, visualizing BERT and more...

Hey folks,

This week in deep learning we bring you news of the TensorFlow 2.0 beta, a look at the carbon footprint of AI, a new reinforcement learning paper from DeepMind, and a look at PyTorch Hub.

You may also enjoy MelNet, an unconditional frequency-based text-to-speech model, visualizations of BERT embeddings, a deep dive into what EfficientNet looks at to make predictions, a new method for finding sparse subnetworks, and Selfie, an application of self-supervised pretraining to image embedding.

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


TensorFlow 2.0 Beta

TensorFlow 2.0 has finally reached beta. The alpha has been out since the TFDev Summit earlier this year, but with the beta, the APIs are locked and final.

Training a single AI model can emit as much carbon as five cars in their lifetimes [MIT Technology Review]

A new study finds that training a single state-of-the-art NLP model has nearly 17 times the average carbon footprint of a typical American over a year’s time, mostly due to neural architecture search.

Capture the Flag: the emergence of complex cooperative agents [DeepMind]

DeepMind describes a new RL algorithm that beats human players in the Quake III Arena.

Towards Reproducible Research with PyTorch Hub

Similar to TensorFlow Hub, PyTorch Hub will allow developers to import graphs and pre-trained weights from a simple API.

Intel researchers compress AI models without compromising accuracy [VentureBeat]

A new sparse network training algorithm by from researchers at Intel.


MelNet: A Generative Model for Audio in the Frequency Domain

Text-to-speech using unconditional spectrogram generation. Works with both voice and music.

Text-based Editing of Talking-head Video

Update the text of a transcript, neural networks generate audio and video frames to match.

Language, trees, and geometry in neural networks

A visualization technique to understand BERT.

An in-depth look at Core ML 3

A nice deep dive into on-device training and supported operations in Apple’s machine learning framework.

Libraries & Code

[Github] sidml/EfficientNet-GradCam-Visualization

An interesting visualization of what Google’s new EfficientNet is looking at in comparison to other popular model architectures.

Targeted Dropout

A new technique to find small, efficient subnetworks in over-parameterized models to reduce training time and model size.

[Github] locuslab/SATNet

SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver

Papers & Publications

Butterfly Transform: An Efficient FFT Based Neural Architecture Design

Abstract: In this paper, we introduce the Butterfly Transform (BFT), a light weight channel fusion method that reduces the computational complexity of point-wise convolutions from O(n^2) of conventional solutions to O(n log n) with respect to the number of channels while improving the accuracy of the networks under the same range of FLOPs. The proposed BFT generalizes the Discrete Fourier Transform in a way that its parameters are learned at training time. Our experimental evaluations show that replacing channel fusion modules with results in significant accuracy gains at similar FLOPs across a wide range of network architectures….

Selfie: Self-supervised Pretraining for Image Embedding

We introduce a pretraining technique called Selfie, which stands for SELF-supervised Image Embedding. Selfie generalizes the concept of masked language modeling to continuous data, such as images. Given masked-out patches in an input image, our method learns to select the correct patch, among other "distractor" patches sampled from the same image, to fill in the masked location. This classification objective sidesteps the need for predicting exact pixel values of the target patches….Our pretraining method provides consistent improvements to ResNet-50 across all settings compared to the standard supervised training of the same network….Our pretraining method also improves ResNet-50 training stability, especially on low data regime, by significantly lowering the standard deviation of test accuracies across datasets.