Deep Learning Weekly: Issue 349
Meta Training and Inference Accelerator, Verba: Building an Open Source, Modular RAG Application, Many-shot jailbreaking, a paper on Jamba: A Hybrid Transformer-Mamba Language Mode, and many more!
This week in deep learning, we bring you Meta Training and Inference Accelerator, Verba: Building an Open Source, Modular RAG Application, Many-shot jailbreaking \ Anthropic, and a paper on Jamba: A Hybrid Transformer-Mamba Language Model.
You may also enjoy Announcing MLCommons AI Safety v0.5 Proof of Concept, How to Make the Most Out of LLM Production Data: Simulated User Feedback, Tracking Compute-Intensive AI Models, a paper on AutoCodeRover: Autonomous Program Improvement, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Our next generation Meta Training and Inference Accelerator
Meta shares details about the next generation of the Meta Training and Inference Accelerator (MTIA), their family of custom-made chips designed for AI workloads.
Announcing MLCommons AI Safety v0.5 Proof of Concept
The MLCommons AI Safety working group achieved an important first step towards standardization with the release of the AI Safety v0.5 benchmark proof-of-concept.
x.ai introduces Grok-1.5V, their first-generation multimodal AI model that combines strong text capabilities with the ability to process various visual information, including documents, diagrams, charts, screenshots, and photographs.
The Importance of Data Pipelines in the Era of Generative AI
This is a real-world example of modern DevOps practices, offering a peek into industry-standard methods for deploying and managing cloud infrastructure.
Meta is testing an AI-powered search bar in Instagram
Meta is experimenting with an AI-powered search bar in Instagram.
Microsoft invests $1.5B in UAE-based AI company G42
Microsoft announced that it’s investing $1.5 billion in G42, a UAE-based AI developer.
MLOps & LLMOps
How to Make the Most Out of LLM Production Data: Simulated User Feedback
An article highlighting a novel approach that uses production data to simulate user feedback for testing and evaluating LLM applications.
Chronon, Airbnb’s ML Feature Platform, Is Now Open Source
A blog post that covers the main motivation and functionality of Chronon, Airbnb’s ML Feature Platform.
Verba: Building an Open Source, Modular RAG Application
A visual article highlighting the different components of an open-source RAG application called Verba, which is built using a modular, customizable architecture for personalized answers.
Learning
Tracking Compute-Intensive AI Models
Epoch AI presents a new dataset tracking compute-intensive AI models, with training compute over 1023 floating point operations.
Many-shot jailbreaking \ Anthropic
Anthropic investigated a “jailbreaking” technique — a method that can be used to evade the safety guardrails put in place by the developers of large language models (LLMs).
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
MedARC releases a state-of-the-art fMRI-to-image reconstruction model that is shown to work well with just 1 hour of training data.
Diffusion Models for Video Generation
Lilian Weng’s technical article about diffusion models for video generation, particularly its inner workings and adaptations.
Libraries & Code
A PyTorch-native library for easily authoring, fine-tuning and experimenting with LLMs.
A python quantization toolkit that provides several features that are either not supported or limited by the base pytorch quantization tools.
A library for data streaming and Python Stream Processing.
Papers & Publications
Jamba: A Hybrid Transformer-Mamba Language Model
Abstract:
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Abstract:
We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline.
For evaluation, we curate FreshWiki, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM's articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%). The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.
AutoCodeRover: Autonomous Program Improvement
Abstract:
Researchers have made significant progress in automating the software development process in the past decades. Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless software engineering involves the process of program improvement apart from coding, specifically to enable software maintenance (e.g. bug fixing) and software evolution (e.g. feature additions). In this paper, we propose an automated approach for solving GitHub issues to autonomously achieve program improvement. In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM's understanding of the issue's root cause, and effectively retrieve a context via iterative search. The use of spectrum based fault localization using tests, further sharpens the context, as long as a test-suite is available. Experiments on SWE-bench-lite which consists of 300 real-life GitHub issues show increased efficacy in solving GitHub issues (22-23% on SWE-bench-lite). On the full SWE-bench consisting of 2294 GitHub issues, AutoCodeRover solved around 16% of issues, which is higher than the efficacy of the recently reported AI software engineer Devin from Cognition Labs, while taking time comparable to Devin. We posit that our workflow enables autonomous software engineering, where, in future, auto-generated code from LLMs can be autonomously improved.