Deep Learning Weekly Issue #128
TensorFlow 2.0, NLP Transforms, Berkeley's Deep RL course, learning video representations, and more...
Hey folks,
This week in deep learning we bring you TensorFlow 2.0, a facial recognition policy proposal from Amazon, CI/CD for machine learning models from Paperspace, and the Hugging Face Transformer library for TensorFlow.
You may also enjoy Berkeley’s course on deep reinforcement learning, research from Facebook rethinking neural network architectures, a code search dataset and challenge from GitHub, a new movement generating GAN, and a very tiny BERT model.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
TensorFlow 2.0 is now available
At long last, the next major release of TensorFlow is here, making Keras and eager execution first class citizens.
Hugging Face launches popular Transformers NLP library for TensorFlow
The team behind the PyTorch-Transformers library has ported their work over to TensorFlow 2.0.
Everything Amazon announced at its Alexa event
Amazon announced new, Alexa-powered devices running small neural networks, making it possible to talk to your glasses, oven, and even a ring on your finger.
Paperspace adds machine learning model development pipeline to GPU service
Paperspace announces CI/CD tools for machine learning models to complement their GPU cloud.
Jeff Bezos says Amazon is writing its own facial recognition laws to pitch to lawmakers
Companies are looking to get ahead of regulation.
Learning
Contributing Data to Deepfake Detection Research
Google joins other tech giants in supporting research into deepfake detection.
On Network Design Spaces for Visual Recognition
Recent work from Facebook that re-examines neural architecture search and identifies structures that have been overlooked.
DistInit: Learning Video Representations Without a Single Labeled Video
Using pre-trained image recognition models to teach video recognition models using unlabeled videos.
CS 285 at UC Berkeley: Deep Reinforcement Learning
Lectures, notes, and assignments for Berkeley’s reinforcement learning course.
Datasets
Introducing the CodeSearchNet challenge
GitHub releases a corpus of code and evaluation environment for models performing code search.
Libraries & Code
[GitHub] svip-lab/impersonator
PyTorch implementation of our ICCV 2019 paper: Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis
[GitHub] moabitcoin/ig65m-pytorch
PyTorch 3D video classification models pre-trained on 65 million Instagram videos
[GitHub] szymonmaszke/torchfunc
PyTorch functions to improve performance, analyze and make your deep learning life easier.
Papers & Publications
Extreme Language Model Compression with Optimal Subwords and Shared Projections
Abstract: ….We introduce a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions. Specifically, we employ a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary. We combine this approach with learning shared projection matrices that transfer layer-wise knowledge from the teacher model to the student model. Our method is able to compress the BERT_BASE model by more than 60x, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7MB. Experimental results also demonstrate higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques.
High Fidelity Speech Synthesis with Adversarial Networks
Abstract: …[W]e introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech. Our architecture is composed of a conditional feed-forward generator producing raw speech audio, and an ensemble of discriminators which operate on random windows of different sizes. The discriminators analyse the audio both in terms of general realism, as well as how well the audio corresponds to the utterance that should be pronounced. To measure the performance of GAN-TTS, we employ both subjective human evaluation (MOS - Mean Opinion Score), as well as novel quantitative metrics (Fréchet DeepSpeech Distance and Kernel DeepSpeech Distance), which we find to be well correlated with MOS. We show that GAN-TTS is capable of generating high-fidelity speech with naturalness comparable to the state-of-the-art models, and unlike autoregressive models, it is highly parallelisable thanks to an efficient feed-forward generator.