Deep Learning Weekly: Issue #268
Private Optimal Energy Training for training BERT on small devices, regulating distribution shifts using density models and Lyapunov functions, incredibly fast BLOOM inference, and more.
This week in deep learning, we bring you Private Optimal Energy Training that allows the training of BERT on small devices, regulating distribution shifts by combining density models and Lyapunov functions, incredibly fast BLOOM inference, and a paper on neural radiance fields trained solely from data with only single views of each object.
You may also enjoy Metaflow Sandbox, learning search query representations at Pinterest, machine learning A/B tests, a paper in improving self-supervised speech representations by disentangling speakers, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
A new AI training method called Private Optimal Energy Training (POET) expands the training capabilities of smaller devices, potentially helping to preserve privacy.
The Bulwark, a US political news and analysis site, has quietly started using AI to help illustrate its articles.
Together, NVIDIA and Google are delighted to announce new milestones and plans to optimize TensorFlow and JAX for the Ampere and recently announced Hopper GPU architectures by leveraging the power of XLA.
Metaflow now offers a browser-based test environment for learning and evaluating workflows.
An article on Make and its functionalities in detail.
A comprehensive article that explains some of the concepts and issues that feature stores solve as if it was an in-house platform.
An article that highlights the practical considerations, design, and extensions of machine learning A/B tests.
This article shows how to get an incredibly fast per token throughput when generating with the 176B parameter BLOOM model.
In this article, you will learn how to create and finetune a Cohere sentiment analysis classification model, and generate predictions by making API calls to it using FastAPI.
An article that provides an overview of the TabTransformer paper, a deep dive into the model, and a technical guide.
A blog post that focuses on Pinterest’s SearchSage, the corresponding search query representation, and details on how it was built and launched for increasing relevance of search recommendations across organic Pins, Product Pins, and ads.
BAIR presents an approach which achieves the goal of regulating distribution shifts of learning-based controllers by combining features of density models and Lyapunov functions.
An article that goes through what positional encoding is, why it is important, how it is used in transformers, and how to code it using NumPy.
Libraries & Code
The open big data serving engine - Store, search, organize and make machine-learned inferences over big data at serving time.
A Python deep learning library for LAnguage-and-VISion intelligence research and applications.
Papers & Publications
We present a method for learning a generative 3D model based on neural radiance fields, trained solely from data with only single views of each object. While generating realistic images is no longer a difficult task, producing the corresponding 3D structure such that they can be rendered from different views is non-trivial. We show that, unlike existing methods, one does not need multi-view data to achieve this goal. Specifically, we show that by reconstructing many images aligned to an approximate canonical pose with a single network conditioned on a shared latent space, you can learn a space of radiance fields that models shape and appearance for a class of objects. We demonstrate this by training models to reconstruct object categories using datasets that contain only one view of each subject without depth or geometry information. Our experiments show that we achieve state-of-the-art results in novel view synthesis and high-quality results for monocular depth prediction.
Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted variations, such as speaker variations, from the content. However, disentangling speakers is very challenging, because removing the speaker information could easily result in a loss of content as well, and the damage of the latter usually far outweighs the benefit of the former. In this paper, we propose a new SSL method that can achieve speaker disentanglement without severe loss of content. Our approach is adapted from the HuBERT framework, and incorporates disentangling mechanisms to regularize both the teacher labels and the learned representations. We evaluate the benefit of speaker disentanglement on a set of content-related downstream tasks, and observe a consistent and notable performance advantage of our speaker-disentangled representations.
Class-incremental learning for semantic segmentation (CiSS) is presently a highly researched field which aims at updating a semantic segmentation model by sequentially learning new semantic classes. A major challenge in CiSS is overcoming the effects of catastrophic forgetting, which describes the sudden drop of accuracy on previously learned classes after the model is trained on a new set of classes. Despite latest advances in mitigating catastrophic forgetting, the underlying causes of forgetting specifically in CiSS are not well understood. Therefore, in a set of experiments and representational analyses, we demonstrate that the semantic shift of the background class and a bias towards new classes are the major causes of forgetting in CiSS. Furthermore, we show that both causes mostly manifest themselves in deeper classification layers of the network, while the early layers of the model are not affected. Finally, we demonstrate how both causes are effectively mitigated utilizing the information contained in the background, with the help of knowledge distillation and an unbiased cross-entropy loss.