Deep Learning Weekly Issue #127
Google absorbs DeepMind Health, Apple details Overton, PyTorch gets a data API, and more...
This week in deep learning we bring you an AI lab in India and a DeepMind reorg from Google, a profile on those ImageNet Roulette memes you’ve been seeing, a look at the AI giants of China, and Facebook’s latest acquisition of a brain input device startup.
You may also enjoy a fine-tuned GPT-2 model from OpenAI, a rare publication on machine learning pipelines from Apple, a unique dataset containing hundreds of thousands of medical images, a data API for PyTorch, and more.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
A deep dive into the current generation of AI startups in China
The NYTs on the ImageNet Roulette posts you’ve probably been seeing all over social media.
Facebook is rumored to be buying CTRL-labs, a startup using machine learning models to control software via brain activity.
Google opens its first AI research lab in India.
A major reorg for Google and DeepMind nearly 5 years after the acquisition.
OpenAI releases a new iteration of GPT-2 models fine-tuned on a large corpus of human-annotated preference data.
A rare paper from researchers at Apple detailing a high level architecture for training, deploying, and monitoring deep learning pipelines.
A great explainer on a new reinforcement learning technique with SOTA results on Atari games.
Infinite-resolution image generation in the browser using CPPNs, GANs.
A deep dive into efficient, low-level implementations of a convolution layer.
A dataset of satellite imagery and annotations for over 500k buildings in 6 disaster zones.
The dataset contains 377,110 images corresponding to 227,835 radiographic studies performed at the Beth Israel Deaconess Medical Center in Boston, MA.
Libraries & Code
How to do Real Time Trigger Word Detection with Keras
A uniform interface to various deep learning models for text
Implement (and extend) tensorflow.data.Dataset with PyTorch's torch.utils.data.Dataset
Papers & Publications
Abstract: .... In this paper, we introduce a framework that synthesizes the 3D Ken Burns effect from a single image, supporting both a fully automatic mode and an interactive mode with the user controlling the camera. Our framework first leverages a depth prediction pipeline, which estimates scene depth that is suitable for view synthesis tasks....[W]e develop a semantic-aware neural network for depth prediction, couple its estimate with a segmentation-based depth adjustment process, and employ a refinement neural network that facilitates accurate depth predictions at object boundaries. According to this depth estimate, our framework then maps the input image to a point cloud and synthesizes the resulting video frames by rendering the point cloud from the corresponding camera positions. To address disocclusions while maintaining geometrically and temporally coherent synthesis results, we utilize context-aware color- and depth-inpainting to fill in the missing information in the extreme views of the camera path, thus extending the scene geometry of the point cloud….
Abstract: ....Our proposed question-guided video representation module efficiently generates the token-level video summary guided by each word in the question. The learned representations are then fused with the question to generate the answer. Through empirical evaluation on the Audio Visual Scene-aware Dialog (AVSD) dataset (Alamri et al., 2019a), our proposed models in single-turn and multiturn question answering achieve state-of-the art performance on several automatic natural language generation evaluation metrics.