Deep Learning Weekly: Issue 405
OpenAI Codex , The Opik Agent Optimizer Public Beta, a paper on Parallel Scaling Law for Language Models, and many more!
This week in deep learning, we bring you OpenAI Codex , The Opik Agent Optimizer Public Beta, and a paper on Parallel Scaling Law for Language Models.
You may also enjoy AlphaEvolve, The State of LLM Reasoning Model Inference, a paper on From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
OpenAI launched a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel.
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
DeepMind announced AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization.
Transforming R&D with agentic AI: Introducing Microsoft Discovery
Microsoft announced a new enterprise agentic platform called Microsoft Discovery to accelerate research and development (R&D).
Fuel your creativity with new generative media models and tools
Google introduced Veo 3 and Imagen 4, and a new tool for filmmaking called Flow.
MLOps & LLMOps
Announcing the Opik Agent Optimizer Public Beta
The Comet team has released the public beta of Opik Agent Optimizer, a new suite of tools designed to automate and elevate your prompt and agent optimization workflows.
Deploy AI Pipelines Faster with Hayhooks
The Haystack team announced Hayhooks, an open-source package that can turn Haystack pipelines into production-ready REST APIs or expose them as MCP tools with full customization and minimal code.
Learning
The State of LLM Reasoning Model Inference
An article that explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling.
Lilian Weng explores how giving LLMs extra "thinking time" and enabling them to show intermediate steps (like Chain-of-Thought) significantly improves their ability to solve complex problems.
Improved Long & Short-Term Memory for LlamaIndex Agents
An article that walks through some of the core features of the new LlamaIndex memory component, and how you can start using it in your own agentic applications.
Deeper insights into retrieval augmented generation: The role of sufficient context
The Google Research team introduced a new notion of sufficient context to examine retrieval augmented generation (RAG) systems, developing a method to classify instances, analyzing failures of RAG systems, and proposing a way to reduce hallucinations.
Introduction to Context Parallel
A tutorial for Context Parallel, an approach used in large language model training to reduce peak activation size by sharding the long input sequence across multiple devices.
Libraries & Code
A Kubernetes-native high-performance distributed LLM inference framework
An LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.
Papers & Publications
Parallel Scaling Law for Language Models
Abstract:
It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce the third and more inference-efficient scaling paradigm: increasing the model's parallel computation during both training and inference time. We apply P diverse and learnable transformations to the input, execute forward passes of the model in parallel, and dynamically aggregate the P outputs. This method, namely parallel scaling (ParScale), scales parallel computation by reusing existing parameters and can be applied to any model structure, optimization procedure, data, or task. We theoretically propose a new scaling law and validate it through large-scale pre-training, which shows that a model with P parallel streams is similar to scaling the parameters by O(logP) while showing superior inference efficiency. For example, ParScale can use up to 22× less memory increase and 6× less latency increase compared to parameter scaling that achieves the same performance improvement. It can also recycle an off-the-shelf pre-trained model into a parallelly scaled one by post-training on a small amount of tokens, further reducing the training budget. The new scaling law we discovered potentially facilitates the deployment of more powerful models in low-resource scenarios, and provides an alternative perspective for the role of computation in machine learning.
From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery
Abstract:
Large Language Models (LLMs) are catalyzing a paradigm shift in scientific discovery, evolving from task-specific automation tools into increasingly autonomous agents and fundamentally redefining research processes and human-AI collaboration. This survey systematically charts this burgeoning field, placing a central focus on the changing roles and escalating capabilities of LLMs in science. Through the lens of the scientific method, we introduce a foundational three-level taxonomy-Tool, Analyst, and Scientist-to delineate their escalating autonomy and evolving responsibilities within the research lifecycle. We further identify pivotal challenges and future research trajectories such as robotic automation, self-improvement, and ethical governance. Overall, this survey provides a conceptual architecture and strategic foresight to navigate and shape the future of AI-driven scientific discovery, fostering both rapid innovation and responsible advancement.