Deep Learning Weekly: Issue 358
Sakana AI to become unicorn in under a year, RAG Evaluation with Prometheus 2, Apple’s On-Device and Server Foundation Models, Llama for Scalable Image Generation, and many more!
This week in deep learning, we bring you Japan's Sakana AI by Google alums to become unicorn in under a year, RAG Evaluation with Prometheus 2, Introducing Apple’s On-Device and Server Foundation Models, and a paper titled Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation.
You may also enjoy Paris-based AI startup Mistral AI raises $640M, Uncensor any LLM with abliteration, a paper titled CodeR: Issue Resolving with Multi-Agent and Task Graphs, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Japan's Sakana AI by Google Alums to Become Unicorn in Under a Year
Sakana AI, a foundation model developer focused on evolution-inspired AI, is planning a funding round that would value it at over $1 billion.
Paris-Based AI Startup Mistral AI Raises $640M
Mistral AI has closed its much-rumored Series B funding round, raising around $640 million in a mix of equity and debt.
NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models
NVIDIA announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models.
From Bytes to Bushels: How Gen AI Can Shape the Future of Agriculture
The McKinsey article discusses how generative AI can revolutionize agriculture by enhancing decision-making, increasing efficiency, and creating new economic opportunities for farmers.
MLOps & LLMOps
Task-Aware RAG Strategies for When Sentence Similarity Fails
The article explores advanced RAG strategies that enhance retrieval performance by incorporating additional metrics beyond sentence similarity, addressing the limitations of traditional methods.
How Meta Trains Large Language Models at Scale
An article about Meta’s advancements in data infrastructure and reliability solutions for training large-scale language models efficiently, tackling the distinct challenges of high-parameter AI systems.
RAG Evaluation with Prometheus 2
An article detailing the application of Prometheus 2 in evaluating RAG pipeline responses with Haystack through practical experimentation.
Learning
Uncensor any LLM with abliteration
A technical article that explores a technique called "abliteration" that can uncensor any LLM without retraining.
Introducing Apple’s On-Device and Server Foundation Models
An article detailing the two models that have been built into Apple Intelligence — a ~3 billion parameter on-device language model, and a larger server-based language model running on Apple silicon servers.
Accelerating ML Model Training with Active Learning Techniques
An article that aims to provide a detailed explanation of active learning and the various techniques used to make ML training efficient.
Libraries & Code
Stability-AI/stable-audio-tools
Generative models for conditional audio generation.
Interact, analyze and structure massive text, image, embedding, audio and video datasets.
Interact, analyze and structure massive text, image, embedding, audio and video datasets.
Papers & Publications
CodeR: Issue Resolving with Multi-Agent and Task Graphs
Abstract:
GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Abstract:
We introduce LlamaGen, a new family of image generation models that apply original ``next-token prediction'' paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly. We reexamine design spaces of image tokenizers, scalability properties of image generation models, and their training data quality. The outcome of this exploration consists of: (1) An image tokenizer with downsample ratio of 16, reconstruction quality of 0.94 rFID and codebook usage of 97% on ImageNet benchmark. (2) A series of class-conditional image generation models ranging from 111M to 3.1B parameters, achieving 2.18 FID on ImageNet 256x256 benchmarks, outperforming the popular diffusion models such as LDM, DiT. (3) A text-conditional image generation model with 775M parameters, from two-stage training on LAION-COCO and high aesthetics quality images, demonstrating competitive performance of visual quality and text alignment. (4) We verify the effectiveness of LLM serving frameworks in optimizing the inference speed of image generation models and achieve 326% - 414% speedup. We release all models and codes to facilitate open-source community of visual generation and multimodal foundation models.
TextGrad: Automatic "Differentiation" via Text
Abstract:
AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, developing principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic differentiation transformed the field by making optimization turn-key. Inspired by this, we introduce TextGrad, a powerful framework performing automatic ``differentiation'' via text. TextGrad backpropagates textual feedback provided by LLMs to improve individual components of a compound AI system. In our framework, LLMs provide rich, general, natural language suggestions to optimize variables in computation graphs, ranging from code snippets to molecular structures. TextGrad follows PyTorch's syntax and abstraction and is flexible and easy-to-use. It works out-of-the-box for a variety of tasks, where the users only provide the objective function without tuning components or prompts of the framework. We showcase TextGrad's effectiveness and generality across a diverse range of applications, from question answering and molecule optimization to radiotherapy treatment planning. Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from 51% to 55%, yields 20% relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity. TextGrad lays a foundation to accelerate the development of the next-generation of AI systems.