Deep Learning Weekly Issue #128
TensorFlow 2.0, NLP Transforms, Berkeley's Deep RL course, learning video representations, and more...
|Jameson Toole||Oct 2, 2019|
This week in deep learning we bring you TensorFlow 2.0, a facial recognition policy proposal from Amazon, CI/CD for machine learning models from Paperspace, and the Hugging Face Transformer library for TensorFlow.
You may also enjoy Berkeley’s course on deep reinforcement learning, research from Facebook rethinking neural network architectures, a code search dataset and challenge from GitHub, a new movement generating GAN, and a very tiny BERT model.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
At long last, the next major release of TensorFlow is here, making Keras and eager execution first class citizens.
The team behind the PyTorch-Transformers library has ported their work over to TensorFlow 2.0.
Amazon announced new, Alexa-powered devices running small neural networks, making it possible to talk to your glasses, oven, and even a ring on your finger.
Paperspace announces CI/CD tools for machine learning models to complement their GPU cloud.
Companies are looking to get ahead of regulation.
Google joins other tech giants in supporting research into deepfake detection.
Recent work from Facebook that re-examines neural architecture search and identifies structures that have been overlooked.
Using pre-trained image recognition models to teach video recognition models using unlabeled videos.
Lectures, notes, and assignments for Berkeley’s reinforcement learning course.
GitHub releases a corpus of code and evaluation environment for models performing code search.
Libraries & Code
PyTorch implementation of our ICCV 2019 paper: Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis
PyTorch 3D video classification models pre-trained on 65 million Instagram videos
PyTorch functions to improve performance, analyze and make your deep learning life easier.
Papers & Publications
Abstract: ….We introduce a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions. Specifically, we employ a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary. We combine this approach with learning shared projection matrices that transfer layer-wise knowledge from the teacher model to the student model. Our method is able to compress the BERT_BASE model by more than 60x, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7MB. Experimental results also demonstrate higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques.
Abstract: …[W]e introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech. Our architecture is composed of a conditional feed-forward generator producing raw speech audio, and an ensemble of discriminators which operate on random windows of different sizes. The discriminators analyse the audio both in terms of general realism, as well as how well the audio corresponds to the utterance that should be pronounced. To measure the performance of GAN-TTS, we employ both subjective human evaluation (MOS - Mean Opinion Score), as well as novel quantitative metrics (Fréchet DeepSpeech Distance and Kernel DeepSpeech Distance), which we find to be well correlated with MOS. We show that GAN-TTS is capable of generating high-fidelity speech with naturalness comparable to the state-of-the-art models, and unlike autoregressive models, it is highly parallelisable thanks to an efficient feed-forward generator.