Deep Learning Weekly: Issue #281
Dendrocentric Learning from Stanford, building a GitOps ML Model Registry with DVC and GTO, federated learning for tabular data & a paper on few-shot learning with retrieval augmented language models.
Hey Folks,
This week in deep learning, we bring you dendrocentric learning from Stanford, building a GitOps ML Model Registry with DVC and GTO, federated learning for tabular data using the Flower framework, and a paper on Atlas: few-shot learning with retrieval augmented language models.
You may also enjoy OpenAI's Point-E which can generate 3D objects from text prompts, MLflow SDK Implementation, understanding KS tests for data drift on profiled data, a paper on confident adaptive language modeling, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
OpenAI releases Point-E, an AI that generates 3D models
OpenAI just open-sourced Point-E, a machine learning model that creates a 3D object given a text prompt.
Dendrocentric AI Could Run on Watts, Not Megawatts
A neuromorphic engineer at Stanford proposes and begins to explore dendrocentric learning, a biologically inspired way for AI systems to both send fewer signals while conveying more information.
OpenAI's New and Improved Embedding Model
OpenAI announced a new embedding model which is significantly more capable. The new model, text-embedding-ada-002, replaces five separate models for text search, text similarity, and code search, and outperforms Davinci.
New Linux Foundation dataset aids in food traceability, carbon tracking and crop production
The Linux Foundation announced that its AgStack project will host a new open-source code base and computation engine that offers a dataset for agricultural fields to aid in food traceability, carbon tracking, crop production, and other field-level analytics.
MLOps
Scaling ML model development with MLflow
A technical blog post that deconstructs a possible MLflow SDK implementation for data science teams that need it.
Troubleshooting Productionalized Notebooks using Dagster and Noteable.
In this practical tutorial we demonstrate an approach for dramatically shortening testing cycles and reducing the number of reruns required, boosting developer/practitioner productivity, and reducing frustration on the team.
Faster Training and Inference: Habana Gaudi2 vs Nvidia A100 80GB
An article that covers the results and analysis of several benchmarks that were performed to assess the abilities of Gaudi, Gaudi2, and A100 80GB for both training and inference.
Building a GitOps ML Model Registry with DVC and GTO
A GitOps article that covers how to register semantic model versions, assign stages to them, and employ CI/CD to act on those with DVC and GTO.
Learning
Federated Learning for Tabular Data Using Flower Framework
An article that covers how to write a federated learning application for tabular data using the Flower federated learning framework.
Make Your Art Move with Stable Diffusion Animations
Introducing Giffusion, a simple web UI to create animated GIFs and Videos with Stable Diffusion.
10 Metrics to Evaluate Supervised Machine Learning Models
A compiled review of the common metrics used to evaluate supervised machine learning models.
Understanding Kolmogorov-Smirnov (KS) Tests for Data Drift on Profiled Data
A technical article (with code) that covers the KS test, how it is used for data drift, and the limitations of data profiling for KS drift detection.
Building an End-to-End Retail Analytics Application with NVIDIA DeepStream and NVIDIA TAO Toolkit
This post provides a tutorial on how to build a sample application that can perform real-time intelligent video analytics (IVA) in the retail domain using NVIDIA DeepStream SDK and NVIDIA TAO Toolkit.
Classification in ML: Lessons Learned From Building and Deploying a Large-Scale Model
An article that covers examples of large-scale classification models, how to mitigate data sparsity, how to implement deep metric learning, and many more.
Libraries & Code
An audio/acoustic activity detection and audio segmentation tool.
An open source library providing behavioral, "black-box" testing for recommender systems.
Papers & Publications
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Abstract:
Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings. In this work we present Atlas, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples. We perform evaluations on a wide range of tasks, including MMLU, KILT and NaturalQuestions, and study the impact of the content of the document index, showing that it can easily be updated. Notably, Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a 540B parameters model by 3% despite having 50x fewer parameters.
Confident Adaptive Language Modeling
Abstract:
Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks. These gains come with a drastic increase in the models' size, potentially leading to slow and costly use at inference time. In practice, however, the series of generations made by LLMs is composed of varying levels of difficulty. While certain predictions truly benefit from the models' full capacity, other continuations are more trivial and can be solved with reduced compute. In this work, we introduce Confident Adaptive Language Modeling (CALM), a framework for dynamically allocating different amounts of compute per input and generation timestep. Early exit decoding involves several challenges that we address here, such as: (1) what confidence measure to use; (2) connecting sequence-level constraints to local per-token exit decisions; and (3) attending back to missing hidden representations due to early exits in previous tokens. Through theoretical analysis and empirical experiments on three diverse text generation tasks, we demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to ×3 -- while provably maintaining high performance.
Abstract:
Multi-lingual language models (LM), such as mBERT, XLM-R, mT5, mBART, have been remarkably successful in enabling natural language tasks in low-resource languages through cross-lingual transfer from high-resource ones. In this work, we try to better understand how such models, specifically mT5, transfer any linguistic and semantic knowledge across languages, even though no explicit cross-lingual signals are provided during pre-training. Rather, only unannotated texts from each language are presented to the model separately and independently of one another, and the model appears to implicitly learn cross-lingual connections. This raises several questions that motivate our study, such as: Are the cross-lingual connections between every language pair equally strong? What properties of source and target language impact the strength of cross-lingual transfer? Can we quantify the impact of those properties on the cross-lingual transfer?
In our investigation, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by the model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us to interpret cross-lingual understanding of the mT5 model. Based on this, one can favorably choose the best source language for a task, and can anticipate its training data demands. A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer, significantly more than just the lexical similarity of languages. For a given language, we are able to predict zero-shot performance, that increases on a logarithmic scale with the number of few-shot target language data points.