Deep Learning Weekly: Issue #206
General AI, AI for drug discovery and self-driving vehicles, TinyML, PyTorch Mobile, Federated Learning, PyTorch 1.9 release, and more
This week in deep learning, we bring you DeepMind’s latest take on AGI, using AI for R&D on renewable energy storage, a self-driving bicycle and a paper benchmarking the different optimizers typically used in deep learning.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
In a new paper, DeepMind’s scientists argue that intelligence will emerge not from formulating and solving complicated problems but by sticking to a simple but powerful principle: reward maximization.
Insilico Medicine announced a Series C financing, reflecting a recent breakthrough for the company: proof that its AI-based platform can create a new target for a disease and begin the clinical trial process.
Waymo announced an investment round of $2.5 billion, with participation from Alphabet alongside investment funds. This round will be used to continue advancing their self-driving system.
This collaboration between Facebook AI and Carnegie Mellon University aims to use ML to accelerate the search for low-cost catalysts that can drive reactions to convert renewable energy to easily storable forms.
DeepMind introduces the latest iteration of WaveNet, a generative model trained on speech samples, able to generate speech incorporating natural-sounding elements like intonation, accents or emotion.
Hundreds of scientists worldwide are collaborating to understand the biases embedded in the largest language models like OpenAI’s GPT-3.
Mobile & Edge
A comprehensive list of papers, projects, articles and talks about TinyML, i.e. machine learning in ultra-low power systems.
This post presents a quick overview of PyTorch Mobile powered demo apps running various ML models spanning images, video, audio and text.
Nvidia announced it would be offering a new infrastructure able to power AI applications running on 5G.
A vlogger from Beijing built an autonomous bicycle based on a set of hardware sensors and chips and on customized perception and control algorithms.
This tutorial shows how to use Flower, a federated learning framework, to train a Convolutional Neural Network on the CIFAR-10 dataset.
A deep reflection on AI’s current ethics standards, detailing the problems we need to anticipate when deploying AI systems in real life, including surveillance, manipulation and bias.
A Natural Language Processing course teaching how to use libraries from the HuggingFace ecosystem: Transformers, Datasets, Tokenizers, Accelerate, and the HuggingFace Hub.
This book develops an effective approach to understanding deep neural networks using theoretical physics of dynamic systems.
Libraries & Code
Built by Facebook Research, Accentor consists of the human-annotated chit-chat additions to 23.8K dialogues.
PyTorch 1.9 has been released, with major improvements in scientific computing libraries and to the RPC framework used to support large-scale distributed training on GPUs.
This repository contains several implementations of transformer language models with JAX, as well as several pretrained models.
Papers & Publications
Choosing the optimizer is considered to be among the most crucial design decisions in deep learning, and it is not an easy one. The growing literature now lists hundreds of optimization methods. In the absence of clear theoretical guidance and conclusive empirical evidence, the decision is often made based on anecdotes. In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed heuristics. To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learning optimizers while giving a concise overview of the wide range of possible choices. Analyzing more than 50,000 individual runs, we contribute the following three points: (i) Optimizer performance varies greatly across tasks. (ii) We observe that evaluating multiple optimizers with default parameters works approximately as well as tuning the hyperparameters of a single, fixed optimizer. (iii) While we cannot discern an optimization method clearly dominating across all tested tasks, we identify a significantly reduced subset of specific optimizers and parameter choices that generally lead to competitive results in our experiments: Adam remains a strong contender, with newer methods failing to significantly and consistently outperform it. Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods without requiring any further computational efforts.
In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.
What is the computational model behind a Transformer? Where recurrent neural networks have direct parallels in finite state machines, allowing clear discussion and thought around architecture variants or trained models, Transformers have no such familiar parallel. In this paper we aim to change that, proposing a computational model for the transformer-encoder in the form of a programming language. We map the basic components of a transformer-encoder -- attention and feed-forward computation -- into simple primitives, around which we form a programming language: the Restricted Access Sequence Processing Language (RASP). We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer, and how a Transformer can be trained to mimic a RASP solution. In particular, we provide RASP programs for histograms, sorting, and Dyck-languages. We further use our model to relate their difficulty in terms of the number of required layers and attention heads: analyzing a RASP program implies a maximum number of heads and layers necessary to encode a task in a transformer. Finally, we see how insights gained from our abstraction might be used to explain phenomena seen in recent works.