Deep Learning Weekly: Issue #206
General AI, AI for drug discovery and self-driving vehicles, TinyML, PyTorch Mobile, Federated Learning, PyTorch 1.9 release, and more
Hey folks,
This week in deep learning, we bring you DeepMind’s latest take on AGI, using AI for R&D on renewable energy storage, a self-driving bicycle and a paper benchmarking the different optimizers typically used in deep learning.
You may also enjoy Insilico Medicine’s large round of funding, latest iteration of speech generation model WaveNet, HuggingFace’s NLP courses, a book on deep learning theory, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
DeepMind says reinforcement learning is ‘enough’ to reach general AI
In a new paper, DeepMind’s scientists argue that intelligence will emerge not from formulating and solving complicated problems but by sticking to a simple but powerful principle: reward maximization.
AI drug discovery platform Insilico Medicine announces $255 million in Series C funding
Insilico Medicine announced a Series C financing, reflecting a recent breakthrough for the company: proof that its AI-based platform can create a new target for a disease and begin the clinical trial process.
Transforming mobility with the confidence of world-class investors
Waymo announced an investment round of $2.5 billion, with participation from Alphabet alongside investment funds. This round will be used to continue advancing their self-driving system.
Open Catalyst Challenge: Using AI to discover catalysts for renewable energy storage
This collaboration between Facebook AI and Carnegie Mellon University aims to use ML to accelerate the search for low-cost catalysts that can drive reactions to convert renewable energy to easily storable forms.
DeepMind introduces the latest iteration of WaveNet, a generative model trained on speech samples, able to generate speech incorporating natural-sounding elements like intonation, accents or emotion.
The race to understand the exhilarating, dangerous world of language AI
Hundreds of scientists worldwide are collaborating to understand the biases embedded in the largest language models like OpenAI’s GPT-3.
Mobile & Edge
A comprehensive list of papers, projects, articles and talks about TinyML, i.e. machine learning in ultra-low power systems.
An Overview of the PyTorch Mobile Demo Apps
This post presents a quick overview of PyTorch Mobile powered demo apps running various ML models spanning images, video, audio and text.
Nvidia eyes ARM roadmap for AI, 5G integration from server to card to chip
Nvidia announced it would be offering a new infrastructure able to power AI applications running on 5G.
Beijing‘s Hardcore Vlogger Comes Up With Self-driving Bicycles
A vlogger from Beijing built an autonomous bicycle based on a set of hardware sensors and chips and on customized perception and control algorithms.
Learning
FedBN Example: PyTorch - From Centralized To Federated
This tutorial shows how to use Flower, a federated learning framework, to train a Convolutional Neural Network on the CIFAR-10 dataset.
A deep reflection on AI’s current ethics standards, detailing the problems we need to anticipate when deploying AI systems in real life, including surveillance, manipulation and bias.
A Natural Language Processing course teaching how to use libraries from the HuggingFace ecosystem: Transformers, Datasets, Tokenizers, Accelerate, and the HuggingFace Hub.
The Principles of Deep Learning Theory
This book develops an effective approach to understanding deep neural networks using theoretical physics of dynamic systems.
Libraries & Code
ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues
Built by Facebook Research, Accentor consists of the human-annotated chit-chat additions to 23.8K dialogues.
PyTorch 1.9 Release, including torch.linalg and Mobile Interpreter
PyTorch 1.9 has been released, with major improvements in scientific computing libraries and to the RPC framework used to support large-scale distributed training on GPUs.
This repository contains several implementations of transformer language models with JAX, as well as several pretrained models.
Papers & Publications
Descending through a Crowded Valley -- Benchmarking Deep Learning Optimizers
Abstract:
Choosing the optimizer is considered to be among the most crucial design decisions in deep learning, and it is not an easy one. The growing literature now lists hundreds of optimization methods. In the absence of clear theoretical guidance and conclusive empirical evidence, the decision is often made based on anecdotes. In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed heuristics. To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learning optimizers while giving a concise overview of the wide range of possible choices. Analyzing more than 50,000 individual runs, we contribute the following three points: (i) Optimizer performance varies greatly across tasks. (ii) We observe that evaluating multiple optimizers with default parameters works approximately as well as tuning the hyperparameters of a single, fixed optimizer. (iii) While we cannot discern an optimization method clearly dominating across all tested tasks, we identify a significantly reduced subset of specific optimizers and parameter choices that generally lead to competitive results in our experiments: Adam remains a strong contender, with newer methods failing to significantly and consistently outperform it. Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods without requiring any further computational efforts.
Abstract:
In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.
Abstract:
What is the computational model behind a Transformer? Where recurrent neural networks have direct parallels in finite state machines, allowing clear discussion and thought around architecture variants or trained models, Transformers have no such familiar parallel. In this paper we aim to change that, proposing a computational model for the transformer-encoder in the form of a programming language. We map the basic components of a transformer-encoder -- attention and feed-forward computation -- into simple primitives, around which we form a programming language: the Restricted Access Sequence Processing Language (RASP). We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer, and how a Transformer can be trained to mimic a RASP solution. In particular, we provide RASP programs for histograms, sorting, and Dyck-languages. We further use our model to relate their difficulty in terms of the number of required layers and attention heads: analyzing a RASP program implies a maximum number of heads and layers necessary to encode a task in a transformer. Finally, we see how insights gained from our abstraction might be used to explain phenomena seen in recent works.