Deep Learning Weekly Issue #143
Acquisitions by Ikea and Niantic, Huawei's AI framework, Facebook's RegNet model, Core ML + TFLite and more...
This week in deep learning we bring you acquisitions by Ikea and Niantic, a PyTorch / TensorFlow competitor from Huawei, a look at TensorFlow Lite’s new Core ML delegate, and an impressive new background matting technique from researchers at the University of Washington.
You may also enjoy Facebook’s new GPU optimized RegNet architecture, AutoAugment at Waymo, a PyTorch implementation of lossless image compression with super-resolution, a new method for simultaneous object detection and tracking, using neural radiance fields for view synthesis, a great webinar on tinyML, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Coronavirus has decimated many industries. AI startups in the early days of revenue growth are particularly vulnerable right now.
An interesting application of deep learning to identify candidate fragrances for perfume makers.
Retailers are continuing to invest in new technologies to help consumers try and buy items.
Niantic (of Pokemon Go fame) has acquired 6D.ai, an AR startup with technology for rapidly building 3D representations of physical spaces.
Huawei open-sources a PyTorch / TensorFlow competitor that is (naturally) optimized for their hardware.
Mobile + Edge
TensorFlow Lite will soon gain access to the Apple Neural Engine on iOS devices via a Core ML delegate.
Pete Warden gives a nice introduction to tinyML, running ML models on embedded devices.
TFJS is re-organizing backends to make deploys smaller.
A new style transfer training method produces results suitable for multi-view applications like VR.
Dive into Deep Learning: an interactive deep learning book on Jupyter notebooks, using the NumPy interface.
Waymo improves car perception by using AutoAugment during model training.
New research out of Facebook finds a much faster architecture using a mix of bespoke and computer aided network design.
Libraries & Code
PyTorch Implementation of "Lossless Image Compression through Super-Resolution"
Simultaneous object detection and tracking using center points.
A PyTorch library for the reproducibility of GAN research.
Keras-like shape inference for PyTorch layers.
Papers & Publications
Abstract: We propose a method for creating a matte – the per-pixel foreground color and alpha – of a person by taking photos or videos in an everyday setting with a handheld camera. Most existing matting methods require a green screen background or a manually created trimap to produce a good matte. Automatic, trimap-free methods are appearing, but are not of comparable quality. In our trimap free approach, we ask the user to take an additional photo of the background without the subject at the time of capture. This step requires a small amount of foresight but is far less time consuming than creating a trimap. We train a deep network with an adversarial loss to predict the matte. We first train a matting network with supervised loss on ground truth data with synthetic composites. To bridge the domain gap to real imagery with no labeling, we train another matting network guided by the first network and by a discriminator that judges the quality of composites. We demonstrate results on a wide variety of photos and videos and show significant improvement over the state of the art.
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x,y,z) and viewing direction (θ,ϕ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.