Deep Learning Weekly: Issue #266
Meta's model that can decode language directly from noninvasive brain recordings, best practices for testing ML pipelines, a unified interface for distributed computing, and more.
Hey Folks,
This week in deep learning, we bring you Meta's model that can decode language directly from noninvasive brain recordings, best practices for testing ML pipelines, a unified interface for distributed computing, and a paper on adaptive nesterov momentum algorithms for faster models.
You may also enjoy DALLE Outpainting, distributed training with EKS and Torch Distributed Elastic, a theoretical deep dive on contrastive representation learning, a paper on improving voice trigger detection using metric learning, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
DALL·E: Introducing Outpainting
OpenAI introduces Outpainting, a new feature which helps users extend their creativity via image manipulation and natural language descriptions.
Using AI to decode speech from brain activity
Meta shares research on an AI model that can decode language directly from noninvasive brain recordings.
Launching ML Model Registry and Deployment on DagsHub with MLflow
DagsHub launches support for zero-configuration MLflow artifact storage based on DagsHub storage, support for MLflow Model Registry, an MLflow UI built into every DagsHub project, and full support for the MLflow API.
Saving the bees just got a boost from AI
Researchers at Oregon State University have trained a machine learning model to analyze the molecular structure of proposed insecticides, herbicides, and fungicides to determine its potential danger to bees.
Announcing the Patent Phrase Similarity Dataset
Google announces the release of the Patent Phrase Similarity dataset, a new human-rated contextual phrase-to-phrase semantic matching dataset for phrase disambiguation, adversarial keyword matching, and hard negative keywords.
MLOps
Versioning Datasets with Git & DVC
In this blog that explores how DVC works in tandem with Git, and how to create experiments that track both code and data files.
Building ML Pipeline: 6 Problems & Solutions [From a Data Scientist’s Experience]
An article that introduces the common ML pipeline pitfalls that have been seen from other companies, and how these were managed to be solved.
Distributed training with Amazon EKS and Torch Distributed Elastic
A technical post on training PyTorch models using the Torch Distributed Elastic framework in a distributed data parallel fashion using Amazon Elastic Kubernetes Service (Amazon EKS).
How to Test Data & ML Pipelines and Make Testing Less Brittle
A technical deep dive into the best practices for data pipeline testing.
Learning
The Noisy Elephant - by Lydia Nemec
A blog post that explores the effect of noise (quality) and dataset size (quantity) on Gaussian processes regression.
An exploratory post on the various activities that encompass error analysis.
JAX on the Web with TensorFlow.js
The TensorFlow team demonstrates how to convert and run Python-based JAX functions and Flax machine learning models in the browser using TensorFlow.js.
Contrastive Representation Learning
A mathematical article on the theory of contrastive representation learning from contrastive training objects to the sentence embeddings.
Libraries & Code
A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask without any rewrites.
great-expectations/great_expectations
Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research.
Papers & Publications
In conversation with Artificial Intelligence: aligning language models with human values
Abstract:
Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values.
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Abstract:
Adaptive gradient algorithms borrow the moving average idea of heavy ball acceleration to estimate accurate first- and second-order moments of gradient for accelerating convergence. However, Nesterov acceleration which converges faster than heavy ball acceleration in theory and also in many empirical cases is much less investigated under the adaptive gradient setting. In this work, we propose the ADAptive Nesterov momentum algorithm, Adan for short, to speed up the training of deep neural networks effectively. Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra computation and memory overhead of computing gradient at the extrapolation point. Then Adan adopts NME to estimate the first- and second-order moments of the gradient in adaptive gradient algorithms for convergence acceleration. Besides, we prove that Adan finds an ϵ-approximate first-order stationary point within O(ϵ−3.5) stochastic gradient complexity on the nonconvex stochastic problems (e.g., deep learning problems), matching the best-known lower bound. Extensive experimental results show that Adan surpasses the corresponding SoTA optimizers on both vision transformers (ViTs) and CNNs, and sets new SoTAs for many popular networks, e.g., ResNet, ConvNext, ViT, Swin, MAE, LSTM, Transformer-XL, and BERT. More surprisingly, Adan can use half of the training cost (epochs) of SoTA optimizers to achieve higher or comparable performance on ViT and ResNet, e.t.c., and also shows great tolerance to a large range of minibatch size, e.g., from 1k to 32k. We hope Adan can contribute to the development of deep learning by reducing training cost and relieving engineering burden of trying different optimizers on various architectures.
Improving Voice Trigger Detection with Metric Learning
Abstract:
Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task. However, such a speaker independent voice trigger detector typically suffers from performance degradation on speech from underrepresented groups, such as accented speakers. In this work, we propose a novel voice trigger detector that can use a small number of utterances from a target speaker to improve detection accuracy. Our proposed model employs an encoder-decoder architecture. While the encoder performs speaker independent voice trigger detection, similar to the conventional detector, the decoder predicts a personalized embedding for each utterance. A personalized voice trigger score is then obtained as a similarity score between the embeddings of enrollment utterances and a test utterance. The personalized embedding allows adapting to target speaker's speech when computing the voice trigger score, hence improving voice trigger detection accuracy. Experimental results show that the proposed approach achieves a 38% relative reduction in a false rejection rate (FRR) compared to a baseline speaker independent voice trigger model.