Deep Learning Weekly Issue #110

Facebook's robot lab, AI cancer detection, SF bans face recognition, new tools for optimizing CNNs and more...

Hey folks,

This week in deep learning, we bring you a look inside Facebook's new robotics lab, an AI model from Google that detects lung cancer better than doctors, a new neural engine from LG, and an explainer on SF's new facial recognition ban.

We also recommend Figure Eight's “State of AI and Machine Learning” report, a new framework from Facebook to help with training for visual question answering, tricks for training image enhancement models from DeOldify's creator, and more.

As always happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Inside Facebook’s new robotics lab, where AI and machines friend one another [Wired]
A look into Facebook’s nascent robotics skunkworks.

A.I. Took a Test to Detect Lung Cancer. It Got an A. [New York Times]
Google’s new lung cancer detection algorithm bests doctors on detection performance.

San Francisco’s facial recognition technology ban, explained [Vox]
San Francisco has become the first major city to ban local government agencies from using facial recognition after growing concerns over ethics and unintended consequences.

Microsoft aims to train and certify 15,000 workers on AI skills by 2022 [TechCrunch]
Microsoft is partnering with General Assembly to train thousands of new workers in ML, AI, and Data Science.

Google has opened its first Africa Artificial Intelligence lab in Ghana [CNN]
Google has opened a new AI lab in Accra, Ghana, to help promote new approaches to areas such as agriculture, health, and education.

LG developed its own AI chip to make its smart home products even smarter [TechCrunch]
Electronics giant LG will begin shipping everything from phones to appliances with a dedicated AI accelerator dubbed the LG Neural Engine.


Figure Eight: The State of AI and Machine Learning
A survey of over 300 individuals (technical and non-technical) on the state of AI adoption across industries.

Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model
Google provides details on a new approach to speech translation that doesn’t rely on first converting audio into text.

How Dementia Affects Conversation: Building a More Accessible Conversational AI
A great overview of how we might build a conversational AI that is robust to errors related to cognitive impairments of users.

New Approaches to Image and Video Reconstruction Using Deep Learning
The creator of DeOldify explains some of the recent improvements to boost training speeds of image enhancement models.


Announcing Open Images V5 and the ICCV 2019 Open Images Challenge
Open Images V5 now contain 2.8 million segmentation masks.

Libraries & Code

Facebook releases deep learning framework Pythia for image and language models [Facebook]
Pythia is a deep learning framework, built on PyTorch, that supports multitasking in the vision and language domain.

TensorFlow Model Optimization Toolkit — Pruning API
A new API for TensorFlow helps shrink models by pruning weights.

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
A PyTorch implementation of AUTOVC, a many-to-many non-parallel voice conversion framework.

Papers & Publications

Fixup Initialization: Residual Learning Without Normalization
Abstract: Normalization layers are a staple in state-of-the-art deep neural network architectures…. In this work, we challenge the commonly-held beliefs by showing that none of the perceived benefits is unique to normalization. Specifically, we propose fixed-update initialization (Fixup), an initialization motivated by solving the exploding and vanishing gradient problem at the beginning of training via properly rescaling a standard initialization. We find training residual networks with Fixup to be as stable as training with normalization -- even for networks with 10,000 layers.

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
Abstract: Several recent works have shown how highly realistic human head images can be obtained by training convolutional neural networks to generate them. In order to create a personalized talking head model, these works require training on a large dataset of images of a single person. However, in many practical scenarios, such personalized talking head models need to be learned from a few image views of a person, potentially even a single image. Here, we present a system with such few-shot capability. It performs lengthy meta-learning on a large dataset of videos, and after that is able to frame few- and one-shot learning of neural talking head models of previously unseen people as adversarial training problems with high capacity generators and discriminators…