Deep Learning Weekly Issue #111
Samsung's avatar generator, Google's clever depth model, a primer on word embedding, and more
Hey folks,
This week in deep learning we bring you a few-shot avatar generator from Samsung, a look into Facebook’s content moderation fight, a new edge AI platform from Nvidia, and a look into Intel’s AI roadmap.
We also recommend a clever 3D depth estimation model from Google, a nice primer on word embeddings, illustrated cheat sheets for Stanford's AI course, a TensorBoard alternative, and more.
As always happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Samsung’s new AI can create talking avatars with a single photo [TNW]
New research out of Samsung’s AI lab synthesizes videos of faces using a few-shot learning technique that combines facial embedding, keypoint recognition, and image synthesis.
Facebook’s CTO talks about the challenges of moderating content with AI.
Nvidia EGX takes AI computing to the edge of the network [VentureBeat]
Nvidia launched a new platform to help engineers deploy models on devices outside of data centers.
Intel’s present and future AI chip business [VentureBeat]
Intel unveils a roadmap for AI chip development and the corresponding software stack.
Learning
Artificial Intelligence cheat sheets for CS 221
Illustrated Artificial Intelligence cheat sheets covering the content of Stanford's CS 221 Artificial Intelligence class.
Optimizing Steering Car Paths with PyTorch
A neat use of PyTorch’s automatic differentiation capabilities to solve optimization problems.
Moving Camera, Moving People: A Deep Learning Approach to Depth Prediction
Very clever project by Google that uses videos of people doing the mannequin challenge to train a 3D depth model for people.
Intuition & Use-Cases of Embeddings in NLP & beyond
A nice presentation on word embeddings.
Deep Geometric Learning of Big Data and Applications
Interesting set of presentations from a recent workshop on deep geometric learning.
Libraries & Code
[Github] microsoft/tensorwatch
Debugging, monitoring, and visualization for Deep Learning and Reinforcement Learning from Microsoft Research. A TensorBoard alternative.
[Github] hellochick/MarioO_O-flow-curioisty
Playing SuperMario via flow-based curiosity exploration & RL agent.
Papers & Publications
FastSpeech: Fast, Robust and Controllable Text to Speech
Abstract: ….In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS. Specifically, we extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length regulator to expand the source phoneme sequence to match the length of target mel-sprectrogram sequence for parallel mel-sprectrogram generation. Experiments on the LJSpeech dataset show that our parallel model matches autoregressive models in terms of speech quality, nearly eliminates the skipped words and repeated words, and can adjust voice speed smoothly. Most importantly, compared with autoregressive models, our model speeds up the mel-sprectrogram generation by 270x….
ERNIE: Enhanced Language Representation with Informative Entities
Abstract: …. In this paper, we utilize both large-scale textual corpora and [Knowledge Graphs] to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks.