Deep Learning Weekly Issue #152
ML updates from WWDC, Image GPT, Google Brain rethinks pre-training and self-training, and more
This week in deep learning, we bring you OpenAI's Image GPT: a large transformer model trained on pixel sequences; Neural Magic: the startup making deep learning possible without specialized hardware; and Amazon's ‘distance assistants’ that help warehouse workers practice social distancing.
You may also be interested in learning about Google Brain’s SimCLRv2 model that achieved a new SOTA in semi-supervised learning on ImageNet, how GPUs accelerate deep learning, or how to implement a simple GAN in PyTorch.
On the mobile side, we bring you this video from WWDC about using model deployment and security with Core ML, this post about on-device machine learning solutions with ML Kit, and this article about why you should use Kotlin for ML on Android.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Following up on the February release of its contrastive learning framework SimCLR, the same team of Google Brain researchers guided by Turing Award honoree Dr. Geoffrey Hinton has presented SimCLRv2, an upgraded approach that boosts the SOTA results by 21.6 percent.
The startup making deep learning possible without specialized hardware
GPUs have long been the chip of choice for performing AI tasks. Neural Magic wants to change that.
Image GPT from OpenAI
OpenAI found that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples.
AI researchers say scientific publishers help perpetuate racist algorithms
Since George Floyd’s death sparked an international movement for racial justice, the AI field and the tech industry at large have faced a reckoning about the role they have played in reinforcing structural racism.
Amazon deploys AI ‘distance assistants’ to notify warehouse workers if they get too close
TV screens give workers live feedback on social distancing.
Mobile + Edge
SqueezeBERT is a mobile NLP neural network architecture that is 4.3 times faster than BERT on a Pixel 3 smartphone while achieving accuracy similar to MobileBERT in GLUE benchmark tasks.
Apple’s ARKit 4 introduces new depth capabilities and expands face tracking to more devices
The latest ARKit from Apple includes a Depth API that creates a new way to access depth information on the iPad Pro, and also face tracking capabilities for any device with the Apple Neural Engine and a front-facing camera.
On-device machine learning solutions with ML Kit, now even easier to use
The original version of ML Kit was tightly integrated with Firebase, but now all the on-device APIs are available in a new standalone ML Kit SDK that no longer requires a Firebase project.
Use model deployment and security with Core ML
This video covers how to deploy Core ML models outside of your app binary, protecting models with encryption, and previewing performance in Xcode.
Why Should You Use Kotlin For Machine Learning on Android?
Powerful array manipulations make it simple.
Check out RepNet, a single model that analyzes video to provide counting statistics and identify changes in patterns for a broad range of repeating processes — exercising, a bird flapping its wings, pendulums swinging, and more.
How GPUs accelerate deep learning
The embarrassingly parallel nature of neural networks.
Google Brain Rethinks Pre-training and Self-training
A team of researchers from Google Brain have proposed a rethink of the dominant computer vision paradigm of pre-training.
PyTorch and GANs: A Micro Tutorial
Building the Simplest of GANs in PyTorch.
PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
PIFuHD is a multi-level framework that infers 3D geometry of clothed humans at an unprecedentedly high 1k image resolution in a pixel-aligned manner, retaining the details in the original inputs without any post-processing.
DriveSeg contains precise, pixel-level representations of many common road objects, but through the lens of a continuous video driving scene.
Libraries & Code
Hummingbird converts trained ML models including scikit-learn Decision Trees and Random Forests, and also LightGBM and XGBoost Classifiers/Regressors into PyTorch.
This repository contains the implementation of Differentiable Augmentation (DiffAugment) in both PyTorch and TensorFlow. It can be used to significantly improve the data efficiency for GAN training. See the paper below
This repository contains the implementation of SimCLRv2 from the paper Big Self-Supervised Models are Strong Semi-Supervised Learners.
Papers & Publications
Abstract: The performance of generative adversarial networks (GANs) heavily deteriorates given a limited amount of training data. This is mainly because the discriminator is memorizing the exact training set. To combat it, we propose Differentiable Augmentation (DiffAugment), a simple method that improves the data efficiency of GANs by imposing various types of differentiable augmentations on both real and fake samples. Previous attempts to directly augment the training data manipulate the distribution of real images, yielding little benefit; DiffAugment enables us to adopt the differentiable augmentation for the generated samples, effectively stabilizes training, and leads to better convergence. Experiments demonstrate consistent gains of our method over a variety of GAN architectures and loss functions for both unconditional and class-conditional generation. With DiffAugment, we achieve a state-of-the-art FID of 6.80 with an IS of 100.8 on ImageNet 128x128. Furthermore, with only 20% training data, we can match the top performance on CIFAR-10 and CIFAR-100. Finally, our method can generate high-fidelity images using only 100 images without pre-training, while being on par with existing transfer learning algorithms. Code is available at this https URL.
Big Self-Supervised Models are Strong Semi-Supervised Learners
Abstract: One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to most previous approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of a big (deep and wide) network during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2 (a modification of SimCLR), supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9% ImageNet top-1 accuracy with just 1% of the labels (≤13 labeled images per class) using ResNet-50, a 10x improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with our method achieves 77.5% top-1 accuracy, outperforming standard supervised training with all of the labels.