Deep Learning Weekly: Issue 394

LLM Evaluation Frameworks: Head-to-Head Comparison, TAID: A Novel Method for Efficient Knowledge Transfer from Large Language Models to Small Language Models, and a paper on Fractal Generative Models.

Mar 06, 2025

This week in deep learning, we bring you LLM Evaluation Frameworks: Head-to-Head Comparison, TAID: A Novel Method for Efficient Knowledge Transfer from Large Language Models to Small Language Models, and a paper on Fractal Generative Models.

You may also enjoy Introducing NextGenAI: A consortium to advance research and education with AI, LLM Evaluation Frameworks: Head-to-Head Comparison, a paper on self-rewarding correction for mathematical reasoning, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Introducing NextGenAI: A consortium to advance research and education with AI

OpenAI launched NextGenAI, a first-of-its-kind consortium with 15 leading research institutions dedicated to using AI to accelerate research breakthroughs and transform education.

Salesforce launches AI agent skills marketplace with 200+ initial partners

Salesforce announced the launch of AgentExchange, a trusted marketplace that allows enterprise customers to expand the capabilities of Agentforce AI agents.

AI Model Offers Conservationists New Tools to Protect Fisheries, Wildlife at Scale

The Allen Institute for AI recently released a lightweight model named Atlantes to analyze more than five billion GPS signals a day emanating from the world’s nearly 600,000 ocean-going vessels.

CoreWeave acquires AI developer platform Weights & Biases

NVIDIA-backed data center company CoreWeave has acquired AI developer platform Weights & Biases for an undisclosed sum.

MLOps & LLMOps

LLM Evaluation Frameworks: Head-to-Head Comparison

An article comparing several leading LLM evaluation frameworks, examining their core features, performance, usability, and unique functionalities.

LLM Agents in Production: Architectures, Challenges, and Best Practices

An article from ZenML discusses the architectures, tools, challenges, and best practices for deploying large language model (LLM) agents in production environments.

Accelerating Generative AI with PyTorch: Segment Anything 2 - Fast and furious inference with low latency and fast cold starts

An informative blog post from PyTorch details various techniques like ahead-of-time compilation and reduced precision to achieve fast and low-latency inference with Segment Anything 2 (SAM2).

Learning

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

A detailed blog post from Hugging Face announcing the release of Aya Vision, Cohere For AI's new family of multilingual and multimodal vision-language models.

TAID: A Novel Method for Efficient Knowledge Transfer from Large Language Models to Small Language Models

Sakana’s post introduces TAID, a novel method for efficient knowledge transfer from large language models to smaller language models.

Libraries & Code

Wan-Video/Wan2.1

A comprehensive and open suite of video foundation models that pushes the boundaries of video generation.

mmschlk/shapiq

A Python package for approximating any-order Shapley interactions and more.

fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

Papers & Publications

Self-rewarding correction for mathematical reasoning

Abstract:

We study self-rewarding reasoning large language models (LLMs), which can simultaneously generate step-by-step reasoning and evaluate the correctness of their outputs during the inference time-without external feedback. This integrated approach allows a single model to independently guide its reasoning process, offering computational advantages for model deployment. We particularly focus on the representative task of self-correction, where models autonomously detect errors in their responses, revise outputs, and decide when to terminate iterative refinement loops. To enable this, we propose a two-staged algorithmic framework for constructing self-rewarding reasoning models using only self-generated data. In the first stage, we employ sequential rejection sampling to synthesize long chain-of-thought trajectories that incorporate both self-rewarding and self-correction mechanisms. Fine-tuning models on these curated data allows them to learn the patterns of self-rewarding and self-correction. In the second stage, we further enhance the models' ability to assess response accuracy and refine outputs through reinforcement learning with rule-based signals. Experiments with Llama-3 and Qwen-2.5 demonstrate that our approach surpasses intrinsic self-correction capabilities and achieves performance comparable to systems that rely on external reward models.

Fractal Generative Models

Abstract:

Modularization is a cornerstone of computer science, abstracting complex functions into atomic building blocks. In this paper, we introduce a new level of modularization by abstracting generative models into atomic generative modules. Analogous to fractals in mathematics, our method constructs a new type of generative model by recursively invoking atomic generative modules, resulting in self-similar fractal architectures that we call fractal generative models. As a running example, we instantiate our fractal framework using autoregressive models as the atomic generative modules and examine it on the challenging task of pixel-by-pixel image generation, demonstrating strong performance in both likelihood estimation and generation quality. We hope this work could open a new paradigm in generative modeling and provide a fertile ground for future research.

A guest post by

Miko Planas

~~~

Deep Learning Weekly