Deep Learning Weekly Issue #111
Samsung's avatar generator, Google's clever depth model, a primer on word embedding, and more
This week in deep learning we bring you a few-shot avatar generator from Samsung, a look into Facebook’s content moderation fight, a new edge AI platform from Nvidia, and a look into Intel’s AI roadmap.
As always happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
New research out of Samsung’s AI lab synthesizes videos of faces using a few-shot learning technique that combines facial embedding, keypoint recognition, and image synthesis.
Facebook’s CTO talks about the challenges of moderating content with AI.
Nvidia launched a new platform to help engineers deploy models on devices outside of data centers.
Intel unveils a roadmap for AI chip development and the corresponding software stack.
Illustrated Artificial Intelligence cheat sheets covering the content of Stanford's CS 221 Artificial Intelligence class.
A neat use of PyTorch’s automatic differentiation capabilities to solve optimization problems.
Very clever project by Google that uses videos of people doing the mannequin challenge to train a 3D depth model for people.
A nice presentation on word embeddings.
Interesting set of presentations from a recent workshop on deep geometric learning.
Libraries & Code
Debugging, monitoring, and visualization for Deep Learning and Reinforcement Learning from Microsoft Research. A TensorBoard alternative.
Playing SuperMario via flow-based curiosity exploration & RL agent.
Papers & Publications
Abstract: ….In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS. Specifically, we extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length regulator to expand the source phoneme sequence to match the length of target mel-sprectrogram sequence for parallel mel-sprectrogram generation. Experiments on the LJSpeech dataset show that our parallel model matches autoregressive models in terms of speech quality, nearly eliminates the skipped words and repeated words, and can adjust voice speed smoothly. Most importantly, compared with autoregressive models, our model speeds up the mel-sprectrogram generation by 270x….
Abstract: …. In this paper, we utilize both large-scale textual corpora and [Knowledge Graphs] to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks.