Deep Learning Weekly Issue #144
Reducing the carbon footprint of deep learning, Google's chips designed by AI, YOLOV4, and more...
This week in deep learning we bring you highlights from AWS & Facebook’s open-source model server for PyTorch, a warning from MIT about relaxing quarantine rules, and Apple’s learn-by-listening smart devices.
You may also enjoy Intel & Udacity’s new Edge AI program, reducing the carbon footprint of artificial intelligence by MIT, real-time 3D object detection on mobile devices with MediaPipe, Facebook’s training with quantization noise for extreme model compression, Google’s AI-designed chips, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
New MIT machine learning model shows relaxing quarantine rules will spike COVID-19 cases
Newly-developed MIT model of the spread of COVID-19 indicates that immediate or near-term reversal of quarantine measures would lead to an exponential growth in the number of infections.
AWS and Facebook have announced two open-source projects around PyTorch in order to make it easier for developers to put their work into production.
MIT system cuts the energy required for training and running neural networks.
Andrew Ng’s startup Landing AI has created a new workplace monitoring tool that issues an alert when anyone is less than the desired distance from a colleague.
Mobile + Edge
Scientists at Google describe a learning-based approach to chip design that can learn from past experience.
Researchers from Apple and Carnegie Mellon University present a system for embedded AI that learns by listening to noises in the environment, without providing training data.
Pushing the limits of on-device machine learning.
Google’s foray into unexplored areas of vision on edge devices using ML pipelines.
Intel and Audacity launch a course that will accelerate the development and deployment of artificial intelligence (AI) models at the edge by leveraging the Intel Distribution of the OpenVINO toolkit.
The Quant-Noise technique enables extreme compression of models that deliver high performance when deployed in practical applications.
Ahmed Gad investigates this machine learning technique's ability to guarantee data privacy.
A step-by-step guide to using Q-learning to solve a simple Taxi-3 environment in OpenAI Gym.
Creators explore artificial intelligence’s potential to innovate.
A practical guide for beginners to understand and implement a neural network.
In this post, you will discover degrees of freedom in statistics and machine learning.
Libraries & Code
Yolo v4 (v3/v2) - Windows and Linux version of Darknet Neural Networks for object detection (Tensor Cores are used).
Clone a voice in 5 seconds to generate arbitrary speech in real-time.
The PyTorch re-implement of the official EfficientDet with SOTA performance in real-time and pre-trained weights.
A collection of machine learning examples and tutorials
Papers & Publications
Training with Quantization Noise for Extreme Model Compression
Abstract: We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods where the approximations introduced by STE are severe, such as Product Quantization. Our proposal is to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights. Controlling the amount of noise and its form allows for extreme compression rates while maintaining the performance of the original model. As a result we establish new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. For example, applying our method to state-of-the-art Transformer and ConvNet architectures, we can achieve 82.5% accuracy on MNLI by compressing RoBERTa to 14MB and 80.0 top-1 accuracy on ImageNet by compressing an EfficientNet-B3 to 3.3MB.
YOLOv4: Optimal Speed and Accuracy of Object Detection
Abstract: There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.
Abstract: In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. To achieve these results, we pose placement as a Reinforcement Learning (RL) problem and train an agent to place the nodes of a chip netlist onto a chip canvas. To enable our RL policy to generalize to unseen blocks, we ground representation learning in the supervised task of predicting placement quality. By designing a neural architecture that can accurately predict reward across a wide variety of netlists and their placements, we are able to generate rich feature embeddings of the input netlists. We then use this architecture as the encoder of our policy and value networks to enable transfer learning. Our objective is to minimize PPA (power, performance, and area), and we show that, in under 6 hours, our method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks.
Abstract: The notion of memory capacity, originally introduced for echo state and linear networks with independent inputs, is generalized to nonlinear recurrent networks with stationary but dependent inputs. The presence of dependence in the inputs makes natural the introduction of the network forecasting capacity, that measures the possibility of forecasting time series values using network states. Generic bounds for memory and forecasting capacities are formulated in terms of the number of neurons of the network and the autocovariance function of the input. These bounds generalize well-known estimates in the literature to a dependent inputs setup. Finally, for linear recurrent networks and independent inputs it is proved that the memory capacity is given by the rank of the associated controllability matrix.
Curated by Derrick Mwiti