Deep Learning Weekly Issue #111

Samsung's avatar generator, Google's clever depth model, a primer on word embedding, and more

Hey folks,

This week in deep learning we bring you a few-shot avatar generator from Samsung, a look into Facebook’s content moderation fight, a new edge AI platform from Nvidia, and a look into Intel’s AI roadmap.

We also recommend a clever 3D depth estimation model from Google, a nice primer on word embeddings, illustrated cheat sheets for Stanford's AI course, a TensorBoard alternative, and more.

As always happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Samsung’s new AI can create talking avatars with a single photo [TNW]

New research out of Samsung’s AI lab synthesizes videos of faces using a few-shot learning technique that combines facial embedding, keypoint recognition, and image synthesis.

Facebook’s A.I. Whiz Now Faces the Task of Cleaning It Up. Sometimes That Brings Him to Tears. [New York Times]

Facebook’s CTO talks about the challenges of moderating content with AI.

Nvidia EGX takes AI computing to the edge of the network [VentureBeat]

Nvidia launched a new platform to help engineers deploy models on devices outside of data centers.

Intel’s present and future AI chip business [VentureBeat]

Intel unveils a roadmap for AI chip development and the corresponding software stack.


Artificial Intelligence cheat sheets for CS 221

Illustrated Artificial Intelligence cheat sheets covering the content of Stanford's CS 221 Artificial Intelligence class.

Optimizing Steering Car Paths with PyTorch

A neat use of PyTorch’s automatic differentiation capabilities to solve optimization problems.

Moving Camera, Moving People: A Deep Learning Approach to Depth Prediction

Very clever project by Google that uses videos of people doing the mannequin challenge to train a 3D depth model for people.

Intuition & Use-Cases of Embeddings in NLP & beyond

A nice presentation on word embeddings.

Deep Geometric Learning of Big Data and Applications

Interesting set of presentations from a recent workshop on deep geometric learning.

Libraries & Code

[Github] microsoft/tensorwatch

Debugging, monitoring, and visualization for Deep Learning and Reinforcement Learning from Microsoft Research. A TensorBoard alternative.

[Github] hellochick/MarioO_O-flow-curioisty

Playing SuperMario via flow-based curiosity exploration & RL agent.

Papers & Publications

FastSpeech: Fast, Robust and Controllable Text to Speech

Abstract: ….In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS. Specifically, we extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length regulator to expand the source phoneme sequence to match the length of target mel-sprectrogram sequence for parallel mel-sprectrogram generation. Experiments on the LJSpeech dataset show that our parallel model matches autoregressive models in terms of speech quality, nearly eliminates the skipped words and repeated words, and can adjust voice speed smoothly. Most importantly, compared with autoregressive models, our model speeds up the mel-sprectrogram generation by 270x….

ERNIE: Enhanced Language Representation with Informative Entities

Abstract: …. In this paper, we utilize both large-scale textual corpora and [Knowledge Graphs] to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks.