Deep Learning Weekly: Issue 349

Meta Training and Inference Accelerator, Verba: Building an Open Source, Modular RAG Application, Many-shot jailbreaking, a paper on Jamba: A Hybrid Transformer-Mamba Language Mode, and many more!

Apr 17, 2024

This week in deep learning, we bring you Meta Training and Inference Accelerator, Verba: Building an Open Source, Modular RAG Application, Many-shot jailbreaking \ Anthropic, and a paper on Jamba: A Hybrid Transformer-Mamba Language Model.

You may also enjoy Announcing MLCommons AI Safety v0.5 Proof of Concept, How to Make the Most Out of LLM Production Data: Simulated User Feedback, Tracking Compute-Intensive AI Models, a paper on AutoCodeRover: Autonomous Program Improvement, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Our next generation Meta Training and Inference Accelerator

Meta shares details about the next generation of the Meta Training and Inference Accelerator (MTIA), their family of custom-made chips designed for AI workloads.

Announcing MLCommons AI Safety v0.5 Proof of Concept

The MLCommons AI Safety working group achieved an important first step towards standardization with the release of the AI Safety v0.5 benchmark proof-of-concept.

Grok-1.5 Vision Preview

x.ai introduces Grok-1.5V, their first-generation multimodal AI model that combines strong text capabilities with the ability to process various visual information, including documents, diagrams, charts, screenshots, and photographs.

The Importance of Data Pipelines in the Era of Generative AI

This is a real-world example of modern DevOps practices, offering a peek into industry-standard methods for deploying and managing cloud infrastructure.

Meta is testing an AI-powered search bar in Instagram

Meta is experimenting with an AI-powered search bar in Instagram.

Microsoft invests $1.5B in UAE-based AI company G42

Microsoft announced that it’s investing $1.5 billion in G42, a UAE-based AI developer.

MLOps & LLMOps

How to Make the Most Out of LLM Production Data: Simulated User Feedback

An article highlighting a novel approach that uses production data to simulate user feedback for testing and evaluating LLM applications.

Chronon, Airbnb’s ML Feature Platform, Is Now Open Source

A blog post that covers the main motivation and functionality of Chronon, Airbnb’s ML Feature Platform.

Verba: Building an Open Source, Modular RAG Application

A visual article highlighting the different components of an open-source RAG application called Verba, which is built using a modular, customizable architecture for personalized answers.

Learning

Tracking Compute-Intensive AI Models

Epoch AI presents a new dataset tracking compute-intensive AI models, with training compute over 1023 floating point operations.

Many-shot jailbreaking \ Anthropic

Anthropic investigated a “jailbreaking” technique — a method that can be used to evade the safety guardrails put in place by the developers of large language models (LLMs).

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

MedARC releases a state-of-the-art fMRI-to-image reconstruction model that is shown to work well with just 1 hour of training data.

Diffusion Models for Video Generation

Lilian Weng’s technical article about diffusion models for video generation, particularly its inner workings and adaptations.

Libraries & Code

pytorch/torchtune

A PyTorch-native library for easily authoring, fine-tuning and experimenting with LLMs.

huggingface/quanto

A python quantization toolkit that provides several features that are either not supported or limited by the base pytorch quantization tools.

quixio/quix-streams

A library for data streaming and Python Stream Processing.

Papers & Publications

Jamba: A Hybrid Transformer-Mamba Language Model

Abstract:

We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Abstract:

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline.

For evaluation, we curate FreshWiki, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM's articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%). The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.

AutoCodeRover: Autonomous Program Improvement

Abstract:

Researchers have made significant progress in automating the software development process in the past decades. Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless software engineering involves the process of program improvement apart from coding, specifically to enable software maintenance (e.g. bug fixing) and software evolution (e.g. feature additions). In this paper, we propose an automated approach for solving GitHub issues to autonomously achieve program improvement. In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM's understanding of the issue's root cause, and effectively retrieve a context via iterative search. The use of spectrum based fault localization using tests, further sharpens the context, as long as a test-suite is available. Experiments on SWE-bench-lite which consists of 300 real-life GitHub issues show increased efficacy in solving GitHub issues (22-23% on SWE-bench-lite). On the full SWE-bench consisting of 2294 GitHub issues, AutoCodeRover solved around 16% of issues, which is higher than the efficacy of the recently reported AI software engineer Devin from Cognition Labs, while taking time comparable to Devin. We posit that our workflow enables autonomous software engineering, where, in future, auto-generated code from LLMs can be autonomously improved.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Deep Learning Weekly: Issue 349

Meta Training and Inference Accelerator, Verba: Building an Open Source, Modular RAG Application, Many-shot jailbreaking, a paper on Jamba: A Hybrid Transformer-Mamba Language Mode, and many more!

Industry

MLOps & LLMOps

Learning

Libraries & Code

Papers & Publications

Discussion about this post