Deep Learning Weekly Issue #127

Google absorbs DeepMind Health, Apple details Overton, PyTorch gets a data API, and more...

Hey folks,

This week in deep learning we bring you an AI lab in India and a DeepMind reorg from Google, a profile on those ImageNet Roulette memes you’ve been seeing, a look at the AI giants of China, and Facebook’s latest acquisition of a brain input device startup.

You may also enjoy a fine-tuned GPT-2 model from OpenAI, a rare publication on machine learning pipelines from Apple, a unique dataset containing hundreds of thousands of medical images, a data API for PyTorch, and more.

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Behind the Rise of China’s Facial-Recognition Giants [Wired]

A deep dive into the current generation of AI startups in China

‘Nerd,’ ‘Nonsmoker,’ ‘Wrongdoer’: How Might A.I. Label You? [New York Times]

The NYTs on the ImageNet Roulette posts you’ve probably been seeing all over social media.

Facebook buys startup building neural monitoring armband [TechCrunch]

Facebook is rumored to be buying CTRL-labs, a startup using machine learning models to control software via brain activity.

Google Research India: an AI lab in Bangalore

Google opens its first AI research lab in India.

DeepMind’s health team joins Google Health

A major reorg for Google and DeepMind nearly 5 years after the acquisition.


Fine-Tuning GPT-2 from Human Preferences

OpenAI releases a new iteration of GPT-2 models fine-tuned on a large corpus of human-annotated preference data.

[Apple] Overton: A Data System for Monitoring and Improving Machine-Learned Products

A rare paper from researchers at Apple detailing a high level architecture for training, deploying, and monitoring deep learning pipelines.

Fast Sample Efficient Q-Learning With Recurrent IQN

A great explainer on a new reinforcement learning technique with SOTA results on Atari games.

This Picture Does Not Exist

Infinite-resolution image generation in the browser using CPPNs, GANs.

Anatomy of a high-performing convolution

A deep dive into efficient, low-level implementations of a convolution layer.


xBD Dataset - Annotated high-resolution satellite imagery for building damage assessment

A dataset of satellite imagery and annotations for over 500k buildings in 6 disaster zones.

MIMIC-CXR Database

The dataset contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA.

Libraries & Code

[GitHub] Tony607/Keras-Trigger-Word

How to do Real Time Trigger Word Detection with Keras

[GitHub] RTIInternational/gobbli

A uniform interface to various deep learning models for text

[GitHub] szymonmaszke/torchdata

Implement (and extend) with PyTorch's

Papers & Publications

3D Ken Burns Effect from a Single Image

Abstract: .... In this paper, we introduce a framework that synthesizes the 3D Ken Burns effect from a single image, supporting both a fully automatic mode and an interactive mode with the user controlling the camera. Our framework first leverages a depth prediction pipeline, which estimates scene depth that is suitable for view synthesis tasks....[W]e develop a semantic-aware neural network for depth prediction, couple its estimate with a segmentation-based depth adjustment process, and employ a refinement neural network that facilitates accurate depth predictions at object boundaries. According to this depth estimate, our framework then maps the input image to a point cloud and synthesizes the resulting video frames by rendering the point cloud from the corresponding camera positions. To address disocclusions while maintaining geometrically and temporally coherent synthesis results, we utilize context-aware color- and depth-inpainting to fill in the missing information in the extreme views of the camera path, thus extending the scene geometry of the point cloud….

Learning Question-Guided Video Representation for Multi-Turn Video Question Answering

Abstract: ....Our proposed question-guided video representation module efficiently generates the token-level video summary guided by each word in the question. The learned representations are then fused with the question to generate the answer. Through empirical evaluation on the Audio Visual Scene-aware Dialog (AVSD) dataset (Alamri et al., 2019a), our proposed models in single-turn and multiturn question answering achieve state-of-the art performance on several automatic natural language generation evaluation metrics.