Deep Learning Weekly: Issue #193

Accelerating CNNs on edge devices, uses cases for SOTA speech-to-text models, a new book about the history of AI, a guide to model inference optimization, and more

Sponsored by Ray Summit

Ray Summit | June 22-24

Want to learn the best way to scale ML? Find out how Ray is being used for large-scale machine learning. Topics include: ML in production, MLOps, deep learning, reinforcement learning, cloud computing, serverless & Ray libraries. Register for free to join live & on-demand.


Hey folks,

This week in deep learning, we bring you a new method to speed up drug development, a self-supervised learning framework for hyperparameter tuning, a few tricks to accelerate convolutional neural network inference on mobile devices, as well as nice use cases built with the latest speech-to-text models.

You may also enjoy learning about the history of AI, Snorkel AI’s fundraising to make data labeling more efficient, and a nice overview of model inference optimization!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Industry

Error-riddled data sets are warping our sense of how good AI really is

Recent studies have found that datasets used in AI research can contain serious flaws, like racist or wrong labels. This is very probably distorting our understanding of the field’s progress.

Alignment of Language Agents

The DeepMind Safety Research team analyzes the harms that can arise when a language AI system is misspecified. The misspecification can come from the training data, from the training process, or from differences between training and deployment environments.

Snorkel AI scores $35M Series B to automate data labeling in machine learning

Snorkel AI wants to make it easier for subject matter experts to build labeled datasets, and announced a new tool to build common ML applications.

Unique AI method for generating proteins will speed up drug development

In a recently published work, researchers developed a new method to generate novel proteins. This offers fantastic potential for a number of future applications, such as faster and more cost-efficient development of drugs.

Large-scale forecasting: Self-supervised learning framework for hyperparameter tuning

Facebook Research introduces a new self-supervised learning framework for model selection and hyperparameter tuning, which works much faster than baseline algorithms.

Artificial intelligence bias can be countered, if not erased

This article details where the algorithmic biases of AI systems come from and how they can be mitigated.

Mobile+Edge

Improve PyTorch App Performance with Android NNAPI Support

This post is a step-by-step look at using PyTorch Mobile with the Android Neural Networks API (NNAPI) to run state-of-the-art computer vision models on mobile devices.

Google's Apollo AI for Chip Design Improves Deep Learning Performance by 25%

This article summarizes how ML is used to design accelerator hardware for improving AI inference. The latest significant work in this field is Google Research’s APOLLO, which achieves up to 25% speedup over baseline algorithms.

Maximizing Edge AI Performance

A very concise post describing a few simple steps to accelerate the edge inference of convolutional neural networks.

Learning

“Chain-linking” NLP tasks With Wav2Vec2 & Transformers

This tutorial outlines two trials using Wav2Vec2, made possible by its addition in Hugging Face’s library: speech-to-text-to-translation and speech-to-text-to-summarization.

When are Neural Networks more powerful than Neural Tangent Kernels?

Salesforce Research presents the latest techniques they’ve developed to perform a theoretical analysis of wide neural networks.

Genius Makers: The Mavericks Who Brought A.I. to Google, Facebook, and the World

This book is a comprehensive history of AI through the lives of its major players and through the companies bringing to life those technologies.

Deep learning model compression

This post covers the optimization of a deep learning model’s inference. It includes engineering topics like model quantization and binarization, more research-oriented topics like knowledge distillation, as well as well-known hacks.

Libraries & Code

DL Translate

A deep learning-based translation library built on Hugging Face transformers and Facebook's mBART-Large model.

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

The official Implementation of StyleCLIP, a method to manipulate images using a driving text.

Language Interpretability Tool (LIT)

LIT is an open-source platform developed by Google Research for visualizing and understanding NLP models. Major improvements have been added recently.

Papers & Publications

CvT: Introducing Convolutions to Vision Transformers

Abstract: We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. This is accomplished through two primary modifications: a hierarchy of Transformers containing a new convolutional token embedding, and a convolutional Transformer block leveraging a convolutional projection. These changes introduce desirable properties of convolutional neural networks (CNNs) to the ViT architecture (ie shift, scale, and distortion invariance) while maintaining the merits of Transformers (ie dynamic attention, global context, and better generalization). We validate CvT by conducting extensive experiments, showing that this approach achieves state-of-the-art performance over other Vision Transformers and ResNets on ImageNet-1k, with fewer parameters and lower FLOPs. In addition, performance gains are maintained when pretrained on larger datasets (eg ImageNet-22k) and fine-tuned to downstream tasks. Pre-trained on ImageNet-22k, our CvT-W24 obtains a top-1 accuracy of 87.7\% on the ImageNet-1k val set. Finally, our results show that the positional encoding, a crucial component in existing Vision Transformers, can be safely removed in our model, simplifying the design for higher resolution vision tasks.

Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

Abstract: We identify label errors in the test sets of 10 of the most commonly-used computer vision, natural language, and audio datasets, and subsequently study the potential for these label errors to affect benchmark results. Errors in test sets are numerous and widespread: we estimate an average of 3.4% errors across the 10 datasets, where for example 2916 label errors comprise 6% of the ImageNet validation set. Putative label errors are identified using confident learning algorithms and then human-validated via crowdsourcing (54% of the algorithmically-flagged candidates are indeed erroneously labeled). Traditionally, machine learning practitioners choose which model to deploy based on test accuracy - our findings advise caution here, proposing that judging models over correctly labeled test sets may be more useful, especially for noisy real-world datasets. Surprisingly, we find that lower capacity models may be practically more useful than higher capacity models in real-world datasets with high proportions of erroneously labeled data. For example, on ImageNet with corrected labels: ResNet-18 outperforms ResNet50 if the prevalence of originally mislabeled test examples increases by just 6%. On CIFAR-10 with corrected labels: VGG-11 outperforms VGG-19 if the prevalence of originally mislabeled test examples increases by just 5%.

Learning advanced mathematical computations from examples

Abstract: Using transformers over large generated datasets, we train models to learn mathematical properties of differential systems, such as local stability, behavior at infinity and controllability. We achieve near perfect prediction of qualitative characteristics, and good approximations of numerical features of the system. This demonstrates that neural networks can learn to perform complex computations, grounded in advanced theory, from examples, without built-in mathematical knowledge.