Deep Learning Weekly: Issue 368
The Jamba 1.5 Open Model Family, From Basics to Advanced: Exploring LangGraph, Optimizing LLM Training: A Comprehensive Overview of Techniques, a paper on Automated Design of Agentic Systems, & more!
This week in deep learning, we bring you The Jamba 1.5 Open Model Family, From Basics to Advanced: Exploring LangGraph, Optimizing LLM Training: A Comprehensive Overview of Techniques, and a paper on Automated Design of Agentic Systems.
You may also enjoy Microsoft's Phi-3.5 SLMs, Enabling Fast Gradient Clipping and Ghost Clipping in Opacus, a paper on The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
The Jamba 1.5 Open Model Family: The Most Powerful and Efficient Long Context Models
Al21 debuted Jamba 1.5 Mini and Jamba 1.5 Large, which are built on their novel SSM-Transformer architecture.
Microsoft announced Phi-3.5-mini, Phi-3.5-vision, and a new member to the Phi family, Phi-3.5-MoE, a Mixture-of-Experts (MoE) model.
Artifacts are now generally available \ Anthropic
Anthropic recently made Artifacts available for all Claude.ai users across the Free, Pro, and Team plans. Artifacts can now also be created and viewed through the iOS and Android apps.
Goodfire raises $7M for its AI observability platform
Goodfire, a startup developing mechanistic interpretability tools, announced that it has raised $7 million in seed funding led by Lightspeed Venture Partners.
3 Questions: How to prove humanity online
In a new white paper, researchers from MIT, OpenAI, Microsoft, and other organizations propose the use of personhood credentials, a verification technique for proving real humans online.
Google debuts free 'Prompt Gallery' in AI Studio, supercharging developer tools
Google has unveiled a new Prompt Gallery feature in its AI Studio platform, significantly enhancing the toolset available to developers working with the Gemini API.
MLOps & LLMOps
From Basics to Advanced: Exploring LangGraph
An article that explores LangGraph’s key features and capabilities, including multi-agent applications.
Practical Strategies for Optimizing LLM Inference Sizing and Performance
Dmitry Mironov and Sergio Perez, senior deep learning solutions architects at NVIDIA, guide you through the critical aspects of LLM inference sizing.
Tutorial: Building a Chat Application with Function Calling
A Haystack article on how to build chat applications that demonstrate agent-like behavior using OpenAI’s function calling feature.
Learning
Enabling Fast Gradient Clipping and Ghost Clipping in Opacus
An article that describes implementations of Fast Gradient Clipping and Ghost Clipping in Opacus that enable memory-efficient training of models with differential privacy.
Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2
An article that discusses how training with packed instruction tuning examples (without padding) is now compatible with Flash Attention 2 in Hugging Face.
Optimizing LLM Training: A Comprehensive Overview of Techniques
A guide that offers a comprehensive exploration of various optimization strategies, covering the basics from memory consumption to distributed training.
Libraries & Code
An advanced automatic evaluation framework designed to assess and diagnose Retrieval-Augmented Generation (RAG) systems.
A framework for continual learning research.
Papers & Publications
Abstract:
In order to oversee advanced AI systems, it is important to understand their reasons for generating a given output. When prompted, large language models (LLMs) can provide natural language explanations or reasoning traces that sound plausible and receive high ratings from human annotators. However, it is unclear to what extent these explanations are truly capturing the factors responsible for the model's predictions: the most “human-like'' explanation may be different from the one that is most faithful to the model's true decision making process. In this work, we introduce the correlational counterfactual test (CCT), a faithfulness metric based on counterfactual input edits that takes into account not just the binary label change, but the total shift in the model's predicted label distribution. We evaluate the faithfulness of free-text explanations generated by few-shot-prompted LLMs from the Llama-2 family on three NLP tasks. We find that these explanations are indeed more likely to mention factors when they are impactful to the model's prediction, with the degree of association increasing with model size but varying significantly by task.
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Abstract:
Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Speculative RAG - a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. Each draft is generated from a distinct subset of retrieved documents, offering diverse perspectives on the evidence while reducing input token counts per draft. This approach enhances comprehension of each subset and mitigates potential position bias over long context. Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts. Extensive experiments demonstrate that Speculative RAG achieves state-of-the-art performance with reduced latency on TriviaQA, MuSiQue, PubHealth, and ARC-Challenge benchmarks. It notably enhances accuracy by up to 12.97% while reducing latency by 51% compared to conventional RAG systems on PubHealth.
Automated Design of Agentic Systems
Abstract:
Researchers are investing substantial effort in developing powerful general-purpose agents, wherein Foundation Models are used as modules within agentic systems (e.g. Chain-of-Thought, Self-Reflection, Toolformer). However, the history of machine learning teaches us that hand-designed solutions are eventually replaced by learned solutions. We formulate a new research area, Automated Design of Agentic Systems (ADAS), which aims to automatically create powerful agentic system designs, including inventing novel building blocks and/or combining them in new ways. We further demonstrate that there is an unexplored yet promising approach within ADAS where agents can be defined in code and new agents can be automatically discovered by a meta agent programming ever better ones in code. Given that programming languages are Turing Complete, this approach theoretically enables the learning of any possible agentic system: including novel prompts, tool use, control flows, and combinations thereof. We present a simple yet effective algorithm named Meta Agent Search to demonstrate this idea, where a meta agent iteratively programs interesting new agents based on an ever-growing archive of previous discoveries. Through extensive experiments across multiple domains including coding, science, and math, we show that our algorithm can progressively invent agents with novel designs that greatly outperform state-of-the-art hand-designed agents. Importantly, we consistently observe the surprising result that agents invented by Meta Agent Search maintain superior performance even when transferred across domains and models, demonstrating their robustness and generality. Provided we develop it safely, our work illustrates the potential of an exciting new research direction toward automatically designing ever-more powerful agentic systems to benefit humanity.