Deep Learning Weekly: Issue #191
PyTorch Profiler, Transformers for longer sequences, a new deepfake singing app, a guide to transfer learning, and more
Before jumping in this week, we have a small favor to ask. We’re working to make Deep Learning Weekly even better, and we want your input. The survey linked below is short (< 5 minutes), and your feedback will be invaluable in helping us improve the content we deliver each week.
This week in deep learning, we bring you a new sparse attention mechanism used in Transformers, an experiment on how an AI can learn to talk about colors, a new framework for self-supervised learning, and contactless deep learning-based sleep sensing.
You may also enjoy this tutorial on deep learning for satellite imagery analysis, a practical introduction to transfer learning, and a toy library to understand what’s under the hood of deep learning frameworks.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Improved technologies and processes for data preparation are on the rise. iMerit, a leader in providing high-quality data, shares the latest in how annotation workflows, workforces, and tooling capabilities can scale with accuracy and precision. (Sponsored Content)
OpenAI highlights a few of the 300+ applications built on top of their GPT-3-powered search, conversation, and text completion APIs. The post also provides an overview of the main improvements the team is working on.
Facebook AI shows that when two AI systems are trained to create a way to communicate with each other about colors, they develop a system that balances complexity and accuracy, much as we do. This shows the potential of AI as an experimental tool to answer scientific questions about human language.
Facebook announced PyTorch Profiler, an improved performance debugging profiler for PyTorch. This open-source tool enables accurate and efficient performance analysis for large-scale deep learning models.
In this post, Google Research introduces BigBird, a sparse attention mechanism used in Transformers architecture to reduce the quadratic dependency on input length to linear. It achieves state-of-the-art results on long-sequence tasks.
In this interview, Andrew Ng reflects on how companies can use machine learning to transform their operations and solve critical problems.
Google introduced Sleep Sensing in its new smart assistant, Nest Hub. It consists of a radar that emits an ultra-low power radio wave and processes the reflected signal using deep learning algorithms.
Flex Logix, a startup designing reconfigurable AI accelerator chips, will use this funding round to accelerate the availability of its hardware and software for edge enterprise applications.
A new app based on the latest deep learning techniques makes just about any photo literally sing. It has become hugely popular on social media.
This course presents a scalable second-order optimization method that can be used to train deep learning models. It achieves state-of-the-art results on very large learning tasks such as language translation or image classification.
This post presents a step-by-step guide on how to build a deep learning system to perform land cover classification using satellite imagery.
This tutorial gives a simple introduction to transfer learning and presents practical applications for image classification and natural language processing.
This post is a walkthrough of training OpenAI’s CLIP (Contrastive Language–Image Pre-training) on Google Colab.
Libraries & Code
GPT Neo is an open-source implementation of GPT-2 and GPT-3-like models with the ability to scale up to GPT-3 sizes. It also provides pretrained models.
SmallPebble is a toy deep learning library written from scratch in Python, using NumPy / CuPy. It is ideal for learning and understanding what’s under the hood of more popular deep learning frameworks.
This open-source Python framework can be used to annotate and filter raw image data using the latest self-supervised learning and active learning techniques.
Papers & Publications
Abstract: In this work, we generalize the reaction-diffusion equation in statistical physics, Schrodinger equation in quantum mechanics, Helmholtz equation in paraxial optics into the neural partial differential equations (NPDE), which can be considered as the fundamental equations in the field of artificial intelligence research. We take finite difference method to discretize NPDE for finding numerical solution, and the basic building blocks of deep neural network architecture, including multi-layer perceptron, convolutional neural network and recurrent neural networks, are generated. The learning strategies, such as Adaptive moment estimation, L-BFGS, pseudoinverse learning algorithms and partial differential equation constrained optimization, are also presented. We believe it is of significance that presented clear physical image of interpretable deep neural networks, which makes it be possible for applying to analog computing device design, and pave the road to physical artificial intelligence.
Abstract: In the standard Markov decision process formalism, users specify tasks by writing down a reward function. However, in many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved. Motivated by this observation, we derive a control algorithm from first principles that aims to visit states that have a high probability of leading to successful outcomes, given only examples of successful outcome states. Prior work has approached similar problem settings in a two-stage process, first learning an auxiliary reward function and then optimizing this reward function using another reinforcement learning algorithm. In contrast, we derive a method based on recursive classification that eschews auxiliary reward functions and instead directly learns a value function from transitions and successful outcomes. Our method therefore requires fewer hyperparameters to tune and lines of code to debug. We show that our method satisfies a new data-driven Bellman equation, where examples take the place of the typical reward function term. Experiments show that our approach outperforms prior methods that learn explicit reward functions.
Abstract: When fine-tuning pretrained models for classification, researchers either use a generic model head or a task-specific prompt for prediction. Proponents of prompting have argued that prompts provide a method for injecting task-specific guidance, which is beneficial in low-data regimes. We aim to quantify this benefit through rigorous testing of prompts in a fair setting: comparing prompted and head-based fine-tuning in equal conditions across many tasks and data sizes. By controlling for many sources of advantage, we find that prompting does indeed provide a benefit, and that this benefit can be quantified per task. Results show that prompting is often worth 100s of data points on average across classification tasks.