Deep Learning Weekly Issue #135
Everyone goes to NeurIPS 2019, a new DeepSpeech release from Mozilla, GANs in S4T, NLP recipes, and more...
Hey folks,
This week in deep learning we bring you lots of research from NeurIPS 2019, one of if not the most popular deep learning conferences. We’ve collected research agendas from many of the big names (Google, Apple, Facebook, Microsoft, Amazon, and NVIDIA) as well as the full list of conference talks and slides.
You may also enjoy a new DeepSpeech model from Mozilla, a Global AI Survey from McKinsey, GANs in Swift for TensorFlow, a deep dive into on-device portrait segmentation, a new dataset containing typos checked into GitHub, an updated version of AI Dungeon based on the 1.5 billion parameter GPT-2 model, and NLP recipes from Microsoft.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
NeurIPS 2019
Videos from this year’s NeurIPS conference. Slides also available.
An index of papers by Google researchers being presented at NeurIPS this year.
An index of papers by Apple researchers being presented at NeurIPS this year.
An index of papers by Facebook researchers being presented at NeurIPS this year.
An index of papers by Microsoft researchers being presented at NeurIPS this year.
An index of papers by Amazon researchers being presented at NeurIPS this year.
An index of papers by NVIDIA researchers being presented at NeurIPS this year.
Industry
DeepSpeech 0.6: Mozilla’s Speech-to-Text Engine Gets Fast, Lean, and Ubiquitous
Mozilla releases a new version of their DeepSpeech ASR model with on-device support via TensorFlow Lite.
Global AI Survey: AI proves its worth, but few scale impact
McKinsey surveys enterprises on AI adoption. Many are seeing positive returns, but have yet to scale deployments.
Datasets
[GitHub] mhagiwara/github-typo-corpus
GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors
Learning
The second version of the AI-generated text adventure game has been released. The core model was trained using the 1.5 billion parameter GPT-2 model.
OpenAI documents a curious phenomena where performance initially improves, then worsens, then improves again as model size is increased.
[GitHub] anilsathyan7/Portrait-Segmentation
Extremely comprehensive writeup on optimizing a portrait segmentation model for inference on mobile devices.
An introduction to Generative Adversarial Networks (in Swift for TensorFlow)
Simple GAN implementation to generate handwritten digits in Swift for TensorFlow.
KingSoft WPS: document image dewarping based on TensorFlow
Neat writeup on de-warping pictures of book pages using neural networks.
Libraries & Code
[GitHub] microsoft/nlp-recipes
Natural language processing best practices and examples from Microsoft.
Papers & Publications
Improving Policies via Search in Cooperative Partially Observable Games
Abstract: ….In this paper we propose two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game. The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy. In contrast, in multi-agent search all agents carry out the same common-knowledge search procedure whenever doing so is computationally feasible, and fall back to playing according to the agreed-upon policy otherwise. We prove that these search procedures are theoretically guaranteed to at least maintain the original performance of the agreed-upon policy (up to a bounded approximation error). In the benchmark challenge problem of Hanabi, our search technique greatly improves the performance of every agent we tested and when applied to a policy trained using RL achieves a new state-of-the-art score of 24.61 / 25 in the game, compared to a previous-best of 24.08 / 25.
WaveFlow: A Compact Flow-based Model for Raw Audio
Abstract: In this work, we present WaveFlow, a small-footprint generative flow for raw audio, which is trained with maximum likelihood without probability density distillation and auxiliary losses as used in Parallel WaveNet and ClariNet. It provides a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow as special cases. We systematically study these likelihood-based generative models for raw waveforms in terms of test likelihood and speech fidelity. We demonstrate that WaveFlow can synthesize high-fidelity speech as WaveNet, while only requiring a few sequential steps to generate very long waveforms with hundreds of thousands of time-steps. Furthermore, WaveFlow closes the significant likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has 5.91M parameters and can generate 22.05kHz high-fidelity speech 42.6 times faster than real-time on a GPU without engineered inference kernels.