Discover more from Deep Learning Weekly
Deep Learning Weekly : Issue #324
Fuyu-8B, Conversational Memory in LangChain with Milvus, The Foundation Model Transparency Index, a paper on Sparse Fine-tuning for Inference Acceleration of LLMs, and many more!
This week in deep learning, we bring you Fuyu-8B: A Multimodal Architecture for AI Agents, Conversational Memory in LangChain with Milvus, The Foundation Model Transparency Index, and a paper on Sparse Fine-tuning for Inference Acceleration of Large Language Models.
You may also enjoy NVIDIA's AI Agent for Robotic Learning, DeepMind's Evaluation Repository for Generative AI Systems, The N Implementation Details of RLHF with PPO, a paper on Improved Baselines with Visual Instruction Tuning, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Adept open-sources Fuyu-8B, a small multimodal decoder-only transformer with no specialized image encoder.
A new AI agent developed by NVIDIA Research that can teach robots complex skills has trained a robotic hand to perform rapid pen-spinning tricks.
A study finds humans struggle to understand the outputs of formal specifications, a method that some researchers claim can be used to make AI decision-making interpretable to humans.
Taiwan’s Foxconn says it plans to build AI factories with Nvidia, as the electronics maker ramps up efforts to become a major global player in electric car manufacturing.
Chinese AI developer Zhipu has raised more than 2.5 billion yuan, or $342 million, from investors since the start of the year.
Signos, a metabolic health platform that utilizes AI with a continuous glucose monitor for healthy weight management, has announced a successful $20 million Series B funding round.
MLOps & LLMOps
A practical approach to versioning machine learning projects using Git Branches that simplifies workflows and organizes data and models.
A post on how to use conversational memory with LangChain and the open source vector database Milvus.
An article on how to deploy the Idefics model to Amazon SageMaker using a new purpose-built Inference Container.
Learn to build your own LLM-powered applications using real data, all while working with the best tools in the modern LLMOps ecosystem.
A new index – designed by a multidisciplinary team from Stanford, MIT, and Princeton – rates the transparency of 10 foundation model companies and finds them lacking.
A technical blog post that attempts to reproduce OpenAI’s RLHF, and presents a checklist of implementation details.
An in-depth look at NVIDIA’s new four-step technique for LLM customization and dynamic steering of model outputs.
A step-by-step tutorial on how to train a Convolutional Neural Network for Image Classification, including callbacks and loggers for monitoring model performance.
Libraries & Code
DeepMind’s comprehensive list of safety evaluations for Generative AI.
Stanford DSPy: The framework for programming with foundation models
OpenAgents: An Open Platform for Language Agents in the Wild
Papers & Publications
We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fine-tuning pretrained LLMs on specialized tasks, while inducing sparsity in their weights. On the accuracy side, we observe that standard loss-based fine-tuning may fail to recover accuracy, especially at high sparsities. To address this, we perform a detailed study of distillation-type losses, determining an L2-based distillation approach we term SquareHead which enables accurate recovery even at higher sparsities, across all model types. On the practical efficiency side, we show that sparse LLMs can be executed with speedups by taking advantage of sparsity, for both CPU and GPU runtimes. While the standard approach is to leverage sparsity for computational reduction, we observe that in the case of memory-bound LLMs sparsity can also be leveraged for reducing memory bandwidth. We exhibit end-to-end results showing speedups due to sparsity, while recovering accuracy, on T5 (language translation), Whisper (speech translation), and open GPT-type (MPT for text generation). For MPT text generation, we show for the first time that sparse fine-tuning can reach 75% sparsity without accuracy drops, provide notable end-to-end speedups for both CPU and GPU inference, and highlight that sparsity is also compatible with quantization approaches. Models and software for reproducing our results are provided in Section 6.
Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks. Our final 13B checkpoint uses merely 1.2M publicly available data, and finishes full training in ~1 day on a single 8-A100 node. We hope this can make state-of-the-art LMM research more accessible. Code and model will be publicly available.
We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.
Thanks for reading Deep Learning Weekly! Subscribe for free to receive new posts and support my work.