Deep Learning Weekly : Issue #324
Fuyu-8B, Conversational Memory in LangChain with Milvus, The Foundation Model Transparency Index, a paper on Sparse Fine-tuning for Inference Acceleration of LLMs, and many more!
This week in deep learning, we bring you Fuyu-8B: A Multimodal Architecture for AI Agents, Conversational Memory in LangChain with Milvus, The Foundation Model Transparency Index, and a paper on Sparse Fine-tuning for Inference Acceleration of Large Language Models.
You may also enjoy NVIDIA's AI Agent for Robotic Learning, DeepMind's Evaluation Repository for Generative AI Systems, The N Implementation Details of RLHF with PPO, a paper on Improved Baselines with Visual Instruction Tuning, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Fuyu-8B: A Multimodal Architecture for AI Agents
Adept open-sources Fuyu-8B, a small multimodal decoder-only transformer with no specialized image encoder.
NVIDIA Research Breakthrough Puts New Spin on Robot Learning
A new AI agent developed by NVIDIA Research that can teach robots complex skills has trained a robotic hand to perform rapid pen-spinning tricks.
A method to interpret AI might not be so interpretable after all
A study finds humans struggle to understand the outputs of formal specifications, a method that some researchers claim can be used to make AI decision-making interpretable to humans.
Taiwan’s Foxconn to build ‘AI factories’ with Nvidia
Taiwan’s Foxconn says it plans to build AI factories with Nvidia, as the electronics maker ramps up efforts to become a major global player in electric car manufacturing.
Chinese AI startup Zhipu raises $300M in funding
Chinese AI developer Zhipu has raised more than 2.5 billion yuan, or $342 million, from investors since the start of the year.
Signos raises $20M to sprinkle AI fairy dust on the weight-loss industry
Signos, a metabolic health platform that utilizes AI with a continuous glucose monitor for healthy weight management, has announced a successful $20 million Series B funding round.
MLOps & LLMOps
Branches Are All You Need: Our Opinionated ML Versioning Framework
A practical approach to versioning machine learning projects using Git Branches that simplifies workflows and organizes data and models.
Conversational Memory in LangChain with Milvus
A post on how to use conversational memory with LangChain and the open source vector database Milvus.
Deploy Idefics 9B & 80B on Amazon SageMaker
An article on how to deploy the Idefics model to Amazon SageMaker using a new purpose-built Inference Container.
Learning
LLMOps : Building Real World Applications With Large Language Models
Learn to build your own LLM-powered applications using real data, all while working with the best tools in the modern LLMOps ecosystem.
Introducing The Foundation Model Transparency Index
A new index – designed by a multidisciplinary team from Stanford, MIT, and Princeton – rates the transparency of 10 foundation model companies and finds them lacking.
The N Implementation Details of RLHF with PPO
A technical blog post that attempts to reproduce OpenAI’s RLHF, and presents a checklist of implementation details.
Announcing SteerLM: A Simple and Practical Technique to Customize LLMs During Inference
An in-depth look at NVIDIA’s new four-step technique for LLM customization and dynamic steering of model outputs.
Step-By-Step Walk-Through of Pytorch Lightning
A step-by-step tutorial on how to train a Convolutional Neural Network for Image Classification, including callbacks and loggers for monitoring model performance.
Libraries & Code
Evaluation Repository for 'Sociotechnical Safety Evaluation of Generative AI Systems
DeepMind’s comprehensive list of safety evaluations for Generative AI.
Stanford DSPy: The framework for programming with foundation models
OpenAgents: An Open Platform for Language Agents in the Wild
Papers & Publications
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Abstract:
We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fine-tuning pretrained LLMs on specialized tasks, while inducing sparsity in their weights. On the accuracy side, we observe that standard loss-based fine-tuning may fail to recover accuracy, especially at high sparsities. To address this, we perform a detailed study of distillation-type losses, determining an L2-based distillation approach we term SquareHead which enables accurate recovery even at higher sparsities, across all model types. On the practical efficiency side, we show that sparse LLMs can be executed with speedups by taking advantage of sparsity, for both CPU and GPU runtimes. While the standard approach is to leverage sparsity for computational reduction, we observe that in the case of memory-bound LLMs sparsity can also be leveraged for reducing memory bandwidth. We exhibit end-to-end results showing speedups due to sparsity, while recovering accuracy, on T5 (language translation), Whisper (speech translation), and open GPT-type (MPT for text generation). For MPT text generation, we show for the first time that sparse fine-tuning can reach 75% sparsity without accuracy drops, provide notable end-to-end speedups for both CPU and GPU inference, and highlight that sparsity is also compatible with quantization approaches. Models and software for reproducing our results are provided in Section 6.
Improved Baselines with Visual Instruction Tuning
Abstract:
Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks. Our final 13B checkpoint uses merely 1.2M publicly available data, and finishes full training in ~1 day on a single 8-A100 node. We hope this can make state-of-the-art LMM research more accessible. Code and model will be publicly available.
Llemma: An Open Language Model For Mathematics
Abstract:
We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.