Deep Learning Weekly: Issue #229
A machine learning and game-theory model for animal poaching, DeepMind's 280-billion-parameter model, new datasets to democratize speech recognition, a paper on the risks from language models, & more
This week in deep learning, we bring you a machine learning and game-theory model for animal poaching, DeepMind's 280-billion-parameter model named Gopher, new datasets to democratize speech recognition and a paper on the ethical and social risks from language models.
You may also enjoy Meta's AI method for bringing hand-drawn figures to life, the top five edge AI trends to watch in 2022, a deep dive into the implementation of Perceiver IO, a paper on partially local federated learning, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
The machine learning system, dubbed PAWS (Protection Assistant for Wildlife Security), uses data from past patrols to predict where poaching is likely to occur and a game-theory model to help generate randomized, unpredictable patrol routes.
DeepMind, which regularly feeds its work into Google products, has probed the capabilities of large language models by building a language model with 280 billion parameters, named Gopher.
Meta announces a first-of-its-kind method for automatically animating children’s hand-drawn figures of people and humanlike characters that bring these drawings to life in a matter of minutes using AI.
OctoML Inc. introduced a new release of its artificial intelligence platform that includes a collection of highly efficient neural networks.
Latitude, the startup behind the GPT-3 based text game called AI Dungeon, is expanding into a new AI-powered game platform called Voyage.
Mobile & Edge
A short blog highlighting the top five edge AI trends NVIDIA expects to see in 2022.
A technical and comprehensive article that dives into matrix storage/memory representation, introduces Cachegrind, explains memory formats supported by PyTorch Operators, and showcases best practices for model execution with XNNPACK.
Edge Impulse is announcing $34 million in Series B funding led by Coatue, tripling its 2022 market valuation and growth forecast.
MLCommons set out to create public datasets to ease two pressing bottlenecks for open source speech recognition resources.
A deep dive into the implementation of Perceiver IO, the first Transformer-based neural network that works on all kinds of modalities and combinations thereof.
A comprehensive study explaining how AI will fundamentally affect the nature of work in the near future.
A step-by-step guide on how to train a large GPT-2 model called CodeParrot, entirely from scratch.
Libraries & Code
Determined is an open-source deep learning training platform that makes building models fast and easy.
Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Google services, Picovoice runs entirely on-device while being more accurate.
SLEAP is an open source deep-learning based framework for estimating positions of animal body parts.
Papers & Publications
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.
This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences.
We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities.
In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs
Personalization methods in federated learning aim to balance the benefits of federated and local training for data availability, communication cost, and robustness to client heterogeneity. Approaches that require clients to communicate all model parameters can be undesirable due to privacy and communication constraints. Other approaches require always-available or stateful clients, impractical in large-scale cross-device settings. We introduce Federated Reconstruction, the first model-agnostic framework for partially local federated learning suitable for training and inference at scale. We motivate the framework via a connection to model-agnostic meta learning, empirically demonstrate its performance over existing approaches for collaborative filtering and next word prediction, and release an open-source library for evaluating approaches in this setting. We also describe the successful deployment of this approach at scale for federated collaborative filtering in a mobile keyboard application.