Deep Learning Weekly Issue #148
Pac-Man generated by a neural network, a YOLOv3 implementation, new mobile chip designs from Arm, and more
This week in deep learning, we bring you voice matching to secure Google Assistant purchases, a Pac-Man game generated by a neural network, a public autonomous driving dataset, a platform that enables developers to collect data, train models, and deploy them to Arduino devices, and more!
You may also enjoy these papers and their public implementations: Instance-aware Image Colorization (code) and Rethinking Performance Estimation in Neural Architecture Search (code).
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
The provider of the world’s leading AI training data platform now able to accelerate AI initiatives across all agencies in the federal government and intelligence community.
GameGAN, a generative adversarial network trained on 50,000 PAC-MAN episodes, produces a fully functional version of the dot-munching classic without an underlying game engine.
With fleets out of commission, COVID-19 left self-driving car and truck companies frozen in time. Now they’re finding new life in old data.
Google Assistant’s Voice Match feature can now secure purchases made through its voice assistant as part of a limited pilot program.
This model can detect age-related macular degeneration from 3-D eye scans as well as human experts.
Mobile + Edge
Supporting 4k30 encode and 4k60 streaming from an on-board camera module, plus 4 TOPS of compute, the megaAI is small but mighty.
The lineup of new chips includes two GPUs and the Ethos-N78: an accelerator optimized for the sole purpose of speeding up AI-powered apps.
Edge Impulse enables developers to collect real-world sensor data, train ML models using this data in the cloud, and then deploy the model back to an Arduino device.
This post shows how to run an object detection model on a live video feed using ImageAI and OpenCV Python packages.
BLEURT is a new evaluation metric for natural language generation models that is a reliable substitute for labor-intensive human evaluation.
Big Transfer (BiT) is a recipe for pre-training image classification models on large supervised datasets and efficiently fine-tuning them on any given target task.
PixelCNN is a generative model that produces novel output based on user-supplied input. For example, given a single image of an unseen face, it generates a variety of new portraits of the same person.
This dataset includes data collected using Hesai’s forward-facing PandarGT LiDAR with image-like resolution, as well as its mechanical spinning LiDAR known as Pandar64.
Libraries & Code
YOLOv3 implementation in Keras and TensorFlow 2.2.
Implementation of the colorization model and data pipelines for the Instance-aware Image Colorization paper in the Papers section below.
Implementation of the methods outlined in the paper Rethinking Performance Estimation in Neural Architecture Search in the Papers section below.
Papers & Publications
Abstract: Image colorization is inherently an ill-posed problem with multi-modal uncertainty. Previous methods leverage the deep neural network to map input grayscale images to plausible color outputs directly. Although these learning-based methods have shown impressive performance, they usually fail on the input images that contain multiple objects. The leading cause is that existing models perform learning and colorization on the entire image. In the absence of a clear figure-ground separation, these models cannot effectively locate and learn meaningful object-level semantics. In this paper, we propose a method for achieving instance-aware colorization. Our network architecture leverages an off-the-shelf object detector to obtain cropped object images and uses an instance colorization network to extract object-level features. We use a similar network to extract the full-image features and apply a fusion module to full object-level and image-level features to predict the final colors. Both colorization networks and fusion modules are learned from a large-scale dataset. Experimental results show that our work outperforms existing methods on different quality metrics and achieves state-of-the-art performance on image colorization.
Abstract: Neural architecture search (NAS) remains a challenging problem, which is attributed to the indispensable and time-consuming component of performance estimation (PE). In this paper, we provide a novel yet systematic rethinking of PE in a resource constrained regime, termed budgeted PE (BPE), which precisely and effectively estimates the performance of an architecture sampled from an architecture space. Since searching an optimal BPE is extremely time-consuming as it requires to train a large number of networks for evaluation, we propose a Minimum Importance Pruning (MIP) approach. Given a dataset and a BPE search space, MIP estimates the importance of hyper-parameters using random forest and subsequently prunes the minimum one from the next iteration. In this way, MIP effectively prunes less important hyper-parameters to allocate more computational resources on more important ones, thus achieving an effective exploration. By combining BPE with various search algorithms including reinforcement learning, evolution algorithm, random search, and differentiable architecture search, we achieve 1,000x of NAS speed up with a negligible performance drop comparing to the SOTA.