Deep Learning Weekly: Issue 396

Gemma 3, Improving Recommendation Systems & Search in the Age of LLMs, a paper on Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, and many more!

Mar 20, 2025

This week in deep learning, we bring you Gemma 3, Improving Recommendation Systems & Search in the Age of LLMs, and a paper on Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning.

You may also enjoy Mistral 3.1 Small, Scaling Recommendation Systems Training to Thousands of GPUs with 2D Sparse Parallelism, a paper on Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Introducing Gemma 3: The most capable model you can run on a single GPU or TPU

Google introduced Gemma 3, a collection of lightweight, state-of-the-art open models built from the same research and technology that powers Gemini 2.0 models.

Mistral Small 3.1

The Mistral team announced Mistral Small 3.1, which outperforms comparable models like Gemma 3 and GPT-4o Mini, while delivering inference speeds of 150 tokens per second.

AI code assist startup Graphite raises $52M to try and keep ahead of the competition

An AI-powered code review startup called Graphite has raised $52 million in a Series B round of funding led by Accel.

Nvidia's new reasoning models and building blocks pave way for advanced AI agents

NVIDIA unveiled a new family of Llama Nemotron AI models with advanced reasoning capabilities.

Mirage: World’s First Foundation Model for UGC Video

Captions introduced Mirage—the world's first video foundation model designed for generating UGC-style ads and talking content.

MLOps & LLMOps

Improving Recommendation Systems & Search in the Age of LLMs

Eugene Yan discusses how industrial search and recommendation systems have evolved over the past year and covers model architectures, data generation, training paradigms, and unified frameworks.

Introducing the Weaviate Transformation Agent

A Weaviate blog post introducing the Transformation Agent, a tool for agentic database management that uses natural language to transform data in Weaviate collections.

Scaling Recommendation Systems Training to Thousands of GPUs with 2D Sparse Parallelism

A PyTorch blog post that highlights 2D embedding parallel, a novel parallelism strategy that overcomes the sparse scaling challenges inherent in training large recommendation models across thousands of GPUs.

Learning

The State of LLM Reasoning Models

An article that explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling that have emerged since the release of DeepSeek R1.

Google Gemma 3 Function Calling Example

A practical blog post demonstrating how to implement function calling with Google's Gemma 3 model using Python and the GenAI API.

Libraries & Code

simular-ai/agent-s

An open agentic framework that uses computers like a human.

ZongqianLi/ReasonGraph

ReasonGraph is an open-source web platform for visualizing and analyzing reasoning processes of Large Languages.

Papers & Publications

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Abstract:

Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs). Retrieval augmentation and tool-use training approaches where a search engine is treated as a tool lack complex multi-turn retrieval flexibility or require large-scale supervised data. Prompting advanced LLMs with reasoning capabilities during inference to use search engines is not optimal, since the LLM does not learn how to optimally interact with the search engine. This paper introduces Search-R1, an extension of the DeepSeek-R1 model where the LLM learns -- solely through reinforcement learning (RL) -- to autonomously generate (multiple) search queries during step-by-step reasoning with real-time retrieval. Search-R1 optimizes LLM rollouts with multi-turn search interactions, leveraging retrieved token masking for stable RL training and a simple outcome-based reward function. Experiments on seven question-answering datasets show that Search-R1 improves performance by 26% (Qwen2.5-7B), 21% (Qwen2.5-3B), and 10% (LLaMA3.2-3B) over SOTA baselines. This paper further provides empirical insights into RL optimization methods, LLM choices, and response length dynamics in retrieval-augmented reasoning.

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Abstract:

Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation. In this work, we introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models. Block diffusion overcomes key limitations of both approaches by supporting flexible-length generation and improving inference efficiency with KV caching and parallel token sampling. We propose a recipe for building effective block diffusion models that includes an efficient training algorithm, estimators of gradient variance, and data-driven noise schedules to minimize the variance. Block diffusion sets a new state-of-the-art performance among diffusion models on language modeling benchmarks and enables generation of arbitrary-length sequences.

A guest post by

Miko Planas

~~~

Deep Learning Weekly