Deep Learning Weekly: Issue 358

Sakana AI to become unicorn in under a year, RAG Evaluation with Prometheus 2, Apple’s On-Device and Server Foundation Models, Llama for Scalable Image Generation, and many more!

Jun 19, 2024

This week in deep learning, we bring you Japan's Sakana AI by Google alums to become unicorn in under a year, RAG Evaluation with Prometheus 2, Introducing Apple’s On-Device and Server Foundation Models, and a paper titled Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation.

You may also enjoy Paris-based AI startup Mistral AI raises $640M, Uncensor any LLM with abliteration, a paper titled CodeR: Issue Resolving with Multi-Agent and Task Graphs, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Japan's Sakana AI by Google Alums to Become Unicorn in Under a Year

Sakana AI, a foundation model developer focused on evolution-inspired AI, is planning a funding round that would value it at over $1 billion.

Paris-Based AI Startup Mistral AI Raises $640M

Mistral AI has closed its much-rumored Series B funding round, raising around $640 million in a mix of equity and debt.

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

NVIDIA announced Nemotron-4 340B, a family of open models that developers can use to generate synthetic data for training large language models.

From Bytes to Bushels: How Gen AI Can Shape the Future of Agriculture

The McKinsey article discusses how generative AI can revolutionize agriculture by enhancing decision-making, increasing efficiency, and creating new economic opportunities for farmers.

MLOps & LLMOps

Task-Aware RAG Strategies for When Sentence Similarity Fails

The article explores advanced RAG strategies that enhance retrieval performance by incorporating additional metrics beyond sentence similarity, addressing the limitations of traditional methods.

How Meta Trains Large Language Models at Scale

An article about Meta’s advancements in data infrastructure and reliability solutions for training large-scale language models efficiently, tackling the distinct challenges of high-parameter AI systems.

RAG Evaluation with Prometheus 2

An article detailing the application of Prometheus 2 in evaluating RAG pipeline responses with Haystack through practical experimentation.

Learning

Uncensor any LLM with abliteration

A technical article that explores a technique called "abliteration" that can uncensor any LLM without retraining.

Introducing Apple’s On-Device and Server Foundation Models

An article detailing the two models that have been built into Apple Intelligence — a ~3 billion parameter on-device language model, and a larger server-based language model running on Apple silicon servers.

Accelerating ML Model Training with Active Learning Techniques

An article that aims to provide a detailed explanation of active learning and the various techniques used to make ML training efficient.

Libraries & Code

Stability-AI/stable-audio-tools

Generative models for conditional audio generation.

nomic-ai/nomic

Interact, analyze and structure massive text, image, embedding, audio and video datasets.

confident-ai/deepeval

Interact, analyze and structure massive text, image, embedding, audio and video datasets.

Papers & Publications

CodeR: Issue Resolving with Multi-Agent and Task Graphs

Abstract:

GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Abstract:

We introduce LlamaGen, a new family of image generation models that apply original ``next-token prediction'' paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly. We reexamine design spaces of image tokenizers, scalability properties of image generation models, and their training data quality. The outcome of this exploration consists of: (1) An image tokenizer with downsample ratio of 16, reconstruction quality of 0.94 rFID and codebook usage of 97% on ImageNet benchmark. (2) A series of class-conditional image generation models ranging from 111M to 3.1B parameters, achieving 2.18 FID on ImageNet 256x256 benchmarks, outperforming the popular diffusion models such as LDM, DiT. (3) A text-conditional image generation model with 775M parameters, from two-stage training on LAION-COCO and high aesthetics quality images, demonstrating competitive performance of visual quality and text alignment. (4) We verify the effectiveness of LLM serving frameworks in optimizing the inference speed of image generation models and achieve 326% - 414% speedup. We release all models and codes to facilitate open-source community of visual generation and multimodal foundation models.

TextGrad: Automatic "Differentiation" via Text

Abstract:

AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, developing principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic differentiation transformed the field by making optimization turn-key. Inspired by this, we introduce TextGrad, a powerful framework performing automatic ``differentiation'' via text. TextGrad backpropagates textual feedback provided by LLMs to improve individual components of a compound AI system. In our framework, LLMs provide rich, general, natural language suggestions to optimize variables in computation graphs, ranging from code snippets to molecular structures. TextGrad follows PyTorch's syntax and abstraction and is flexible and easy-to-use. It works out-of-the-box for a variety of tasks, where the users only provide the objective function without tuning components or prompts of the framework. We showcase TextGrad's effectiveness and generality across a diverse range of applications, from question answering and molecule optimization to radiotherapy treatment planning. Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from 51% to 55%, yields 20% relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity. TextGrad lays a foundation to accelerate the development of the next-generation of AI systems.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Deep Learning Weekly: Issue 358

Sakana AI to become unicorn in under a year, RAG Evaluation with Prometheus 2, Apple’s On-Device and Server Foundation Models, Llama for Scalable Image Generation, and many more!

Industry

MLOps & LLMOps

Learning

Libraries & Code

Papers & Publications

Discussion about this post