Deep Learning Weekly: Issue 405

OpenAI Codex , The Opik Agent Optimizer Public Beta, a paper on Parallel Scaling Law for Language Models, and many more!

May 22, 2025

This week in deep learning, we bring you OpenAI Codex , The Opik Agent Optimizer Public Beta, and a paper on Parallel Scaling Law for Language Models.

You may also enjoy AlphaEvolve, The State of LLM Reasoning Model Inference, a paper on From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Introducing Codex | OpenAI

OpenAI launched a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel.

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

DeepMind announced AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization.

Transforming R&D with agentic AI: Introducing Microsoft Discovery

Microsoft announced a new enterprise agentic platform called Microsoft Discovery to accelerate research and development (R&D).

Fuel your creativity with new generative media models and tools

Google introduced Veo 3 and Imagen 4, and a new tool for filmmaking called Flow.

MLOps & LLMOps

Announcing the Opik Agent Optimizer Public Beta

The Comet team has released the public beta of Opik Agent Optimizer, a new suite of tools designed to automate and elevate your prompt and agent optimization workflows.

Deploy AI Pipelines Faster with Hayhooks

The Haystack team announced Hayhooks, an open-source package that can turn Haystack pipelines into production-ready REST APIs or expose them as MCP tools with full customization and minimal code.

Learning

The State of LLM Reasoning Model Inference

An article that explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling.

Why We Think

Lilian Weng explores how giving LLMs extra "thinking time" and enabling them to show intermediate steps (like Chain-of-Thought) significantly improves their ability to solve complex problems.

Improved Long & Short-Term Memory for LlamaIndex Agents

An article that walks through some of the core features of the new LlamaIndex memory component, and how you can start using it in your own agentic applications.

Deeper insights into retrieval augmented generation: The role of sufficient context

The Google Research team introduced a new notion of sufficient context to examine retrieval augmented generation (RAG) systems, developing a method to classify instances, analyzing failures of RAG systems, and proposing a way to reduce hallucinations.

Introduction to Context Parallel

A tutorial for Context Parallel, an approach used in large language model training to reduce peak activation size by sharding the long input sequence across multiple devices.

Libraries & Code

llm-d/llm-d

A Kubernetes-native high-performance distributed LLM inference framework

LMCache/LMCache

An LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.

Papers & Publications

Parallel Scaling Law for Language Models

Abstract:

It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce the third and more inference-efficient scaling paradigm: increasing the model's parallel computation during both training and inference time. We apply P diverse and learnable transformations to the input, execute forward passes of the model in parallel, and dynamically aggregate the P outputs. This method, namely parallel scaling (ParScale), scales parallel computation by reusing existing parameters and can be applied to any model structure, optimization procedure, data, or task. We theoretically propose a new scaling law and validate it through large-scale pre-training, which shows that a model with P parallel streams is similar to scaling the parameters by O(logP) while showing superior inference efficiency. For example, ParScale can use up to 22× less memory increase and 6× less latency increase compared to parameter scaling that achieves the same performance improvement. It can also recycle an off-the-shelf pre-trained model into a parallelly scaled one by post-training on a small amount of tokens, further reducing the training budget. The new scaling law we discovered potentially facilitates the deployment of more powerful models in low-resource scenarios, and provides an alternative perspective for the role of computation in machine learning.

From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery

Abstract:

Large Language Models (LLMs) are catalyzing a paradigm shift in scientific discovery, evolving from task-specific automation tools into increasingly autonomous agents and fundamentally redefining research processes and human-AI collaboration. This survey systematically charts this burgeoning field, placing a central focus on the changing roles and escalating capabilities of LLMs in science. Through the lens of the scientific method, we introduce a foundational three-level taxonomy-Tool, Analyst, and Scientist-to delineate their escalating autonomy and evolving responsibilities within the research lifecycle. We further identify pivotal challenges and future research trajectories such as robotic automation, self-improvement, and ethical governance. Overall, this survey provides a conceptual architecture and strategic foresight to navigate and shape the future of AI-driven scientific discovery, fostering both rapid innovation and responsible advancement.

A guest post by

Miko Planas

~~~

Deep Learning Weekly