Deep Learning Weekly: Issue #296

Stanford Institute for HAI's Index Report 2023, Meta's approach to measuring model maturity and tracking outcomes, a hands-on guide to train LLaMA with RLHF, a paper on Segment Anything, and many more

Apr 12, 2023

This week in deep learning, we bring you Stanford Institute for Human-Centered AI's Index Report 2023, Meta's approach to measuring model maturity and tracking outcomes, a hands-on guide to train LLaMA with RLHF, and a paper on Segment Anything.

You may also enjoy Vicuna, Baby AGI, Spiking Neural Networks, a paper on Continuous Pseudo-Labeling from the Start, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

From Deep Learning Foundations to Stable Diffusion

Fast.ai releases a new course, From Deep Learning Foundations to Stable Diffusion, which is part 2 of Practical Deep Learning for Coders.

AI Index Report 2023 - Artificial Intelligence Index

Stanford Institute for Human-Centered Artificial Intelligence (HAI) released their annual report for 2023.

Speeding up drug discovery with diffusion generative models

MIT researchers built DiffDock, a model that may one day be able to find new drugs faster than traditional methods and reduce the potential for adverse side effects.

Vicuna

A team with members from UC Berkeley, CMU, Stanford, and UC San Diego introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

NVIDIA Takes Inference to New Heights Across MLPerf Tests

NVIDIA H100 and L4 GPUs took generative AI and all other workloads to new levels in the latest MLPerf benchmarks, while Jetson AGX Orin made performance and efficiency gains.

Announcing OpenAI’s Bug Bounty Program

OpenAI announces an initiative which is essential to their commitment to develop safe and advanced AI.

MLOps

How Meta measures the management of its AI ecosystem

Meta walks through approaches they’ve developed for measuring the maturity of AI models and tracking management outcomes.

Constructing and Visualizing Kangas DataGrid on Kangas UI

Exploring datasets becomes more cumbersome as the dataset grows. Pandas can make tasks like grouping, filtering, and sorting a total nightmare — Instead try Kangas for large, complex queries!

ML Model Packaging

A comprehensive guide that explores the key concepts, challenges, and best practices for ML model packaging.

Deploy large language models on AWS Inferentia2 using large model inference containers

An article that explains how to deploy large language models on AWS Inferentia2 using large model inference containers.

Learning

Prompt Engineering, An introduction to “the career of the future”

An article that explains prompt engineering as a technique used to provide specific and relevant input instructions to large language models.

The Complete Guide to Spiking Neural Networks

An article on everything you need to know about Spiking Neural Networks from architecture, temporal behavior, and encoding to neuromorphic hardware.

Diffusion Models — DDPMs, DDIMs, and Classifier Free Guidance

A comprehensive article about the evolution of diffusion models from DDPMs to Classifier Free guidance

Experimenting with LLMs to Research, Reflect, and Plan

Eugene Yan shares his LLM experiments on building assistants and his observations on retrieval issues.

StackLLaMA: A hands-on guide to train LLaMA with RLHF

A blog post that shows all the steps involved in training a LlaMa model to answer questions on Stack Exchange with RLHF

Libraries & Code

yoheinakajima/babyagi

A system that uses OpenAI and Pinecone APIs to create, prioritize, and execute tasks.

CalculatedContent/WeightWatcher

An open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data

OptimalScale/LMFlow

An extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community.

Papers & Publications

Segment Anything | Meta AI Research

Abstract:

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at \href{https://segment-anything.com}{https://segment-anything.com} to foster research into foundation models for computer vision.

Continuous Pseudo-Labeling from the Start

Abstract:

Self-training (ST), or pseudo-labeling has sparked significant interest in the automatic speech recognition (ASR) community recently because of its success in harnessing unlabeled data. Unlike prior semi-supervised learning approaches that relied on iteratively regenerating pseudo-labels (PLs) from a trained model and using them to train a new model, recent state-of-the-art methods perform ‘continuous training’ where PLs are generated using a very recent version of the model being trained. Nevertheless, these approaches still rely on bootstrapping the ST using an initial supervised learning phase where the model is trained on labeled data alone. We believe this has the potential for over-fitting to the labeled dataset in low resource settings and that ST from the start of training should reduce over-fitting. In this paper we show how we can do this by dynamically controlling the evolution of PLs during the training process in ASR. To the best of our knowledge, this is the first study that shows the feasibility of generating PLs from the very start of the training. We are able to achieve this using two techniques that avoid instabilities which lead to degenerate models that do not generalize. Firstly, we control the evolution of PLs through a curriculum that uses the online changes in PLs to control the membership of the cache of PLs and improve generalization. Secondly, we find that by sampling transcriptions from the predictive distribution, rather than only using the best transcription, we can stabilize training further. With these techniques, our ST models match prior works without an external language model.

Scaling Vision Transformers to 22 Billion Parameters

Abstract:

The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al., 2022). We present a recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-22B) and perform a wide variety of experiments on the resulting model. When evaluated on downstream tasks (often with a lightweight linear model on frozen features), ViT-22B demonstrates increasing performance with scale. We further observe other interesting benefits of scale, including an improved tradeoff between fairness and performance, state-of-the-art alignment to human visual perception in terms of shape/texture bias, and improved robustness. ViT-22B demonstrates the potential for "LLM-like" scaling in vision, and provides key steps towards getting there.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Deep Learning Weekly: Issue #296

Stanford Institute for HAI's Index Report 2023, Meta's approach to measuring model maturity and tracking outcomes, a hands-on guide to train LLaMA with RLHF, a paper on Segment Anything, and many more

Industry

MLOps

Learning

Libraries & Code

Papers & Publications

Discussion about this post