Deep Learning Weekly: Issue #296
Stanford Institute for HAI's Index Report 2023, Meta's approach to measuring model maturity and tracking outcomes, a hands-on guide to train LLaMA with RLHF, a paper on Segment Anything, and many more
This week in deep learning, we bring you Stanford Institute for Human-Centered AI's Index Report 2023, Meta's approach to measuring model maturity and tracking outcomes, a hands-on guide to train LLaMA with RLHF, and a paper on Segment Anything.
You may also enjoy Vicuna, Baby AGI, Spiking Neural Networks, a paper on Continuous Pseudo-Labeling from the Start, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
From Deep Learning Foundations to Stable Diffusion
Fast.ai releases a new course, From Deep Learning Foundations to Stable Diffusion, which is part 2 of Practical Deep Learning for Coders.
AI Index Report 2023 - Artificial Intelligence Index
Stanford Institute for Human-Centered Artificial Intelligence (HAI) released their annual report for 2023.
Speeding up drug discovery with diffusion generative models
MIT researchers built DiffDock, a model that may one day be able to find new drugs faster than traditional methods and reduce the potential for adverse side effects.
A team with members from UC Berkeley, CMU, Stanford, and UC San Diego introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.
NVIDIA Takes Inference to New Heights Across MLPerf Tests
NVIDIA H100 and L4 GPUs took generative AI and all other workloads to new levels in the latest MLPerf benchmarks, while Jetson AGX Orin made performance and efficiency gains.
Announcing OpenAI’s Bug Bounty Program
OpenAI announces an initiative which is essential to their commitment to develop safe and advanced AI.
MLOps
How Meta measures the management of its AI ecosystem
Meta walks through approaches they’ve developed for measuring the maturity of AI models and tracking management outcomes.
Constructing and Visualizing Kangas DataGrid on Kangas UI
Exploring datasets becomes more cumbersome as the dataset grows. Pandas can make tasks like grouping, filtering, and sorting a total nightmare — Instead try Kangas for large, complex queries!
A comprehensive guide that explores the key concepts, challenges, and best practices for ML model packaging.
Deploy large language models on AWS Inferentia2 using large model inference containers
An article that explains how to deploy large language models on AWS Inferentia2 using large model inference containers.
Learning
Prompt Engineering, An introduction to “the career of the future”
An article that explains prompt engineering as a technique used to provide specific and relevant input instructions to large language models.
The Complete Guide to Spiking Neural Networks
An article on everything you need to know about Spiking Neural Networks from architecture, temporal behavior, and encoding to neuromorphic hardware.
Diffusion Models — DDPMs, DDIMs, and Classifier Free Guidance
A comprehensive article about the evolution of diffusion models from DDPMs to Classifier Free guidance
Experimenting with LLMs to Research, Reflect, and Plan
Eugene Yan shares his LLM experiments on building assistants and his observations on retrieval issues.
StackLLaMA: A hands-on guide to train LLaMA with RLHF
A blog post that shows all the steps involved in training a LlaMa model to answer questions on Stack Exchange with RLHF
Libraries & Code
A system that uses OpenAI and Pinecone APIs to create, prioritize, and execute tasks.
CalculatedContent/WeightWatcher
An open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data
An extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community.
Papers & Publications
Segment Anything | Meta AI Research
Abstract:
We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at \href{https://segment-anything.com}{https://segment-anything.com} to foster research into foundation models for computer vision.
Continuous Pseudo-Labeling from the Start
Abstract:
Self-training (ST), or pseudo-labeling has sparked significant interest in the automatic speech recognition (ASR) community recently because of its success in harnessing unlabeled data. Unlike prior semi-supervised learning approaches that relied on iteratively regenerating pseudo-labels (PLs) from a trained model and using them to train a new model, recent state-of-the-art methods perform ‘continuous training’ where PLs are generated using a very recent version of the model being trained. Nevertheless, these approaches still rely on bootstrapping the ST using an initial supervised learning phase where the model is trained on labeled data alone. We believe this has the potential for over-fitting to the labeled dataset in low resource settings and that ST from the start of training should reduce over-fitting. In this paper we show how we can do this by dynamically controlling the evolution of PLs during the training process in ASR. To the best of our knowledge, this is the first study that shows the feasibility of generating PLs from the very start of the training. We are able to achieve this using two techniques that avoid instabilities which lead to degenerate models that do not generalize. Firstly, we control the evolution of PLs through a curriculum that uses the online changes in PLs to control the membership of the cache of PLs and improve generalization. Secondly, we find that by sampling transcriptions from the predictive distribution, rather than only using the best transcription, we can stabilize training further. With these techniques, our ST models match prior works without an external language model.
Scaling Vision Transformers to 22 Billion Parameters
Abstract:
The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al., 2022). We present a recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-22B) and perform a wide variety of experiments on the resulting model. When evaluated on downstream tasks (often with a lightweight linear model on frozen features), ViT-22B demonstrates increasing performance with scale. We further observe other interesting benefits of scale, including an improved tradeoff between fairness and performance, state-of-the-art alignment to human visual perception in terms of shape/texture bias, and improved robustness. ViT-22B demonstrates the potential for "LLM-like" scaling in vision, and provides key steps towards getting there.
thanks