Deep Learning Weekly: Issue 430

Kimi K2 Thinking, Nested Learning: A new ML paradigm for continual learning, a paper on SPICE: Self-Play In Corpus Environments Improves Reasoning, and many more!

Nov 13, 2025

This week in deep learning, we bring you Kimi K2 Thinking, Nested Learning: A new ML paradigm for continual learning, and a paper on SPICE: Self-Play In Corpus Environments Improves Reasoning.

You may also enjoy Omnilingual ASR: Advancing Automatic Speech Recognition for 1,600+ Languages, TabPFN-2.5 Model Report, a paper on Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics, and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Kimi K2 Thinking

The Moonshot team introduced Kimi K2 Thinking, an open-source thinking model that sets new records across benchmarks that assess reasoning, coding, and agent capabilities.

Omnilingual ASR: Advancing Automatic Speech Recognition for 1,600+ Languages

Meta introduced Omnilingual Automatic Speech Recognition (ASR), a suite of models providing automatic speech recognition capabilities for more than 1,600 languages.

TabPFN-2.5 Model Report

Prior Labs releases TabPFN-2.5, a tabular foundation model that matches complex AutoGluon ensembles while scaling to 50,000 samples and 2,000 features.

Anthropic and Iceland announce one of the world’s first national AI education pilots

Anthropic and Iceland’s Ministry of Education and Children announced a partnership to bring Claude to teachers across the nation, launching one of the world’s first comprehensive national AI education pilots.

AI-powered visual presentation platform Gamma raises $68M at $2.1B valuation

Gamma announced that it has raised $68 million, led by Andreessen Horowitz, at a valuation of $2.1 billion.

MLOps & LLMOps.

Human-in-the-Loop Review Workflows for LLM Applications & Agents

A blog post explaining Human-in-the-Loop review workflows, including systematic tracing and structured rubric design.

Building powerful RAG pipelines with Docling and OpenSearch

A technical blog post detailing how to build RAG pipelines by integrating the Docling document processing toolkit with OpenSearch for high-performance, metadata-aware vector retrieval.

Where to use sub-agents versus agents as tools

A blog post explaining the key difference between sub-agents and agents as tools in multi-agent systems.

Learning

Best LLM Observability Tools of 2025: Top Platforms & Features

Learn about the top LLM observability tools of 2025, including Opik, Langfuse, and Datadog, to monitor, evaluate, and optimize model performance.

Nested Learning: A new ML paradigm for continual learning

A foundational research blog introducing the Nested Learning paradigm, which unifies model architecture and optimization as interconnected problems to create continuum memory systems.

5 Thoughts on Kimi K2 Thinking - by Nathan Lambert

An article providing a few quick reactions and a technical analysis of the Kimi K2 Thinking model, highlighting its strong performance and the growing influence of Chinese AI labs.

Libraries & Code

comet-ml/opik

An open-source LLM evaluation tool used to debug, evaluate, and monitor LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Papers & Publications

SPICE: Self-Play In Corpus Environments Improves Reasoning

Abstract:

Self-improving systems require environmental interaction for continuous adaptation. We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single model acts in two roles: a Challenger that mines documents from a large corpus to generate diverse reasoning tasks, and a Reasoner that solves them. Through adversarial dynamics, the Challenger creates an automatic curriculum at the frontier of the Reasoner’s capability, while corpus grounding provides the rich, near-inexhaustible external signal necessary for sustained improvement. Unlike existing ungrounded self-play methods that offer more limited benefits, SPICE achieves consistent gains across mathematical (+8.9%) and general reasoning (+9.8%) benchmarks on multiple model families. Our analysis reveals how document grounding is a key ingredient in SPICE to continuously generate its own increasingly challenging goals and achieve them, enabling sustained self-improvement.

Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics

Abstract:

As large language models (LLMs) are increasingly used in morally sensitive domains, it is crucial to understand how persona traits affect their moral reasoning and persuasive behavior. We present the first large-scale study of multi-dimensional persona effects in AI-AI debates over real-world moral dilemmas. Using a 6-dimensional persona space (age, gender, country, social class, ideology, and personality), we simulate structured debates between AI agents over 131 relationship-based cases. Our results show that personas affect initial moral stances and debate outcomes, with political ideology and personality traits exerting the strongest influence. Persuasive success varies across traits, with liberal and open personalities reaching higher consensus. While logit-based confidence grows during debates, emotional and credibility-based appeals diminish, indicating more tempered argumentation over time. These trends mirror findings from psychology and cultural studies, reinforcing the need for persona-aware evaluation frameworks for AI moral reasoning.

A guest post by

Miko Planas

~~~

Deep Learning Weekly