Deep Learning Weekly: Issue #189
The 2021 AI Index Report, ML on Raspberry Pi, generative art with ML, the mathematical foundation of deep learning, a new approach to transformers, and more.
This week in deep learning we bring you the 2021 AI Index Report, and the latest research on self-supervised learning, as well as a dedicated Python library, machine learning on a Raspberry Pi, a new autonomous-driving dataset and a nice summary on generative modeling applied to art.
You may also enjoy learning about transformers, deep learning mathematical foundations, or integration of fairness approaches in production systems!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Let your data teams focus on core research and business objectives while our in-house, advanced workforce handles the data annotation work for you. iMerit is trusted by Microsoft, Autodesk, and TripAdvisor for their computer vision and natural language processing data annotation needs. Get a free quote today. (Sponsored Content)
FAIR’s latest paper describes an image classification system pretrained on unlabeled data using self-supervised learning. It outperforms state-of-the-art models on ImageNet. The researchers state that self-supervision is one step on the path to building machines with human-level intelligence.
This post presents the Deep Bootstrap framework to approach the generalization problem of deep learning models by connecting it to the field of online optimization.
Big news for Hugging Face, which will use this round to accelerate on building the most popular open source library for NLP developers.
Waymo is expanding the Waymo Open Dataset with the publication of a motion dataset for research into behavior prediction and motion forecasting for autonomous driving.
Lightmatter, a startup born at MIT, will release a new chip optimized for deep learning that uses light to perform calculations: it could run up to 10 times faster than the current fastest GPUs, while consuming 6 times less energy.
Learn how to build a bird detection engine on a Raspberry Pi.
Researchers at MIT have developed a new deep-learning based method to produce holograms nearly instantly on a smartphone.
The next iterations of Raspberry Pi CPUs will integrate lightweight accelerators for machine learning applications.
In this post, the author shares a few links to materials he used to better understand and implement transformer models from scratch.
This course, written by researchers, focuses on building an understanding of the engineering mathematics that drives deep learning. It covers a lot of models and techniques, from reinforcement learning and GANs to regularization techniques.
A very-well written tutorial about training transformers on a custom dataset and using pre-trained ones with Hugging Face.
An introduction to JAX, a library which speeds up Python machine learning code. It has been gaining more and more traction recently.
Libraries & Code
The basic idea of self-supervised learning is to automatically generate a supervised signal from an unlabeled dataset to solve a given task. FAIR open-sourced VISSL, a library for state-of-the-art self-supervised learning with PyTorch.
A list of Google Colab notebooks on machine learning for artistic purposes (music, image, and text generation).
Papers & Publications
Abstract: Political scientists commonly seek to make statements about how a word’s use and meaning varies over circumstances—whether that be time, partisan identity, or some other document-level covariate. A promising avenue is the use of domain-specific word embeddings, that simultaneously allow for statements of uncertainty and statistical inference. We introduce the `a la Carte on Text (ConText) embedding regression model for this purpose. We extend and validate a simple model-based linear method of refitting pre-trained embeddings to local contexts that requires minimal input data. It outperforms well-known competitors for studying changes in meaning across groups and time. Our approach allows us to speak descriptively of systematic differences across covariates in the way that words are used, and to comment on whether a particular use is statistically significantly different to another. We provide evidence of excellent relative performance of the model, and show how it might be used in substantive research.
Abstract: Many technical approaches have been proposed for ensuring that decisions made by machine learning systems are fair, but few of these proposals have been stress-tested in real-world systems. This paper presents an example of one team’s approach to the challenge of applying algorithmic fairness approaches to complex production systems within the context of a large technology company. We discuss how we disentangle normative questions of product and policy design (like, “how should the system trade off between different stakeholders’ interests and needs?”) from empirical questions of system implementation (like, “is the system achieving the desired tradeoff in practice?”). We also present an approach for answering questions of the latter sort, which allows us to measure how machine learning systems and human labelers are making these tradeoffs across different relevant groups. We hope our experience integrating fairness tools and approaches into large-scale and complex production systems will be useful to other practitioners facing similar challenges, and illuminating to academics and researchers looking to better address the needs of practitioners.
Abstract: Transformer is a type of self-attention-based neural networks originally applied for NLP tasks. Recently, pure transformer-based models are proposed to solve computer vision problems. These visual transformers usually view an image as a sequence of patches while they ignore the intrinsic structure information inside each patch. In this paper, we propose a novel Transformer-iN-Transformer (TNT) model for modeling both patch-level and pixel-level representation. In each TNT block, an outer transformer block is utilized to process patch embeddings, and an inner transformer block extracts local features from pixel embeddings. The pixel-level feature is projected to the space of patch embedding by a linear transformation layer and then added into the patch. By stacking the TNT blocks, we build the TNT model for image recognition. Experiments on ImageNet benchmark and downstream tasks demonstrate the superiority and efficiency of the proposed TNT architecture. For example, our TNT achieves 81.3% top-1 accuracy on ImageNet which is 1.5% higher than that of DeiT with similar computational cost.