Deep Learning Weekly: Issue 424
OpenAI's Sora 2, Developing an open standard for agentic commerce, a paper on Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering, and many more!
This week in deep learning, we bring you OpenAI’s Sora 2, Developing an open standard for agentic commerce, and a paper on Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering.
You may also enjoy Claude Sonnet 4.5, CWM: An Open-Weights LLM for Research on Code Generation with World Models, a paper on Scaling Agents via Continual Pre-training and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
OpenAI released Sora 2, their flagship video and audio generation model.
Anthropic unveiled Claude Sonnet 4.5, which boasts state-of-the-art coding and computer use performance, and accompanies the release of the Claude Agent SDK.
Introducing Liquid Nanos — frontier‑grade performance on everyday devices
The Liquid AI team launched Liquid Nanos — a family of models that deliver frontier‑model quality on specialized, agentic tasks while running directly on embedded devices.
OpenAI adds Instant Checkout shopping feature to ChatGPT
OpenAI launched a new ChatGPT feature that enables users to make online purchases directly in the chatbot’s interface.
Vibe working: Introducing Agent Mode and Office Agent in Microsoft 365 Copilot
The Microsoft team brought “vibe working” to Microsoft 365 Copilot with Agent Mode in Office apps and Office Agent in Copilot chat.
Meta strikes expanded $14.2B AI infrastructure deal with CoreWeave
Shares of CoreWeave jumped more than 11% after announcing that they signed a new multibillion-dollar agreement with Meta to provide them with AI compute infrastructure.
MLOps & LLMOps
Why Multi-Agent Systems Need Memory Engineering
An article about how shared memory infrastructure is essential for multi-agent AI systems to coordinate effectively and avoid the failures that plague stateless individual agents.
Developing an open standard for agentic commerce
A foundational blog post about the Agentic Commerce Protocol (ACP), an open standard co-developed by Stripe and OpenAI that enables AI agents like ChatGPT to conduct secure, programmatic commerce transactions.
Sandboxing agents at the kernel level
A deep dive motivating kernel-level sandboxing for AI agents by analyzing the Linux open syscall and explaining how containerization technology combines mount namespaces and root changes to enforce file access control.
How to Integrate Computer Vision Pipelines with Generative AI and Reasoning
A technical blog detailing the NVIDIA AI Blueprint VSS 2.4 release, which integrates the Cosmos Reason VLM to improve physical world understanding, enhance Q&A using agentic knowledge graph traversal, and support edge AI deployments.
Learning
CWM: An Open-Weights LLM for Research on Code Generation with World Models
A foundational research release about the Code World Model (CWM), designed to advance code generation research using reasoning and planning in computational environments.
The anatomy of a personal health agent
A research blog post describing the Personal Health Agent (PHA) framework, which uses a collaborative multi-agent architecture to provide personalized, evidence-based guidance.
To Understand AI, Watch How It Evolves
An article presenting an argument that understanding LLMs requires an evolutionary perspective, focusing on training rather than final, static internal structures.
Gemini Robotics 1.5 brings AI agents into the physical world
A post detailing the Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 models, which combine in an agentic framework to enable physical robots to perceive, plan, and solve complex, multi-step physical tasks.
Libraries & Code
An open-source LLM evaluation tool used to debug, evaluate, and monitor LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
A scikit-learn-compatible library for estimating prediction intervals and controlling risks, based on conformal predictions.
Papers & Publications
Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering
Abstract:
As robots become increasingly capable of operating over extended periods -- spanning days, weeks, and even months -- they are expected to accumulate knowledge of their environments and leverage this experience to assist humans more effectively. This paper studies the problem of Long-term Active Embodied Question Answering (LA-EQA), a new task in which a robot must both recall past experiences and actively explore its environment to answer complex, temporally-grounded questions. Unlike traditional EQA settings, which typically focus either on understanding the present environment alone or on recalling a single past observation, LA-EQA challenges an agent to reason over past, present, and possible future states, deciding when to explore, when to consult its memory, and when to stop gathering observations and provide a final answer. Standard EQA approaches based on large models struggle in this setting due to limited context windows, absence of persistent memory, and an inability to combine memory recall with active exploration. To address this, we propose a structured memory system for robots, inspired by the mind palace method from cognitive science. Our method encodes episodic experiences as scene-graph-based world instances, forming a reasoning and planning algorithm that enables targeted memory retrieval and guided navigation. To balance the exploration-recall trade-off, we introduce value-of-information-based stopping criteria that determines when the agent has gathered sufficient information. We evaluate our method on real-world experiments and introduce a new benchmark that spans popular simulation environments and actual industrial sites. Our approach significantly outperforms state-of-the-art baselines, yielding substantial gains in both answer accuracy and exploration efficiency.
Scaling Agents via Continual Pre-training
Abstract:
Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consistently underperform in agentic tasks, particularly in open-source implementations. We identify the root cause: the absence of robust agentic foundation models forces models during post-training to simultaneously learn diverse agentic behaviors while aligning them to expert demonstrations, thereby creating fundamental optimization tensions. To this end, we are the first to propose incorporating Agentic Continual Pre-training (Agentic CPT) into the deep research agents training pipeline to build powerful agentic foundational models. Based on this approach, we develop a deep research agent model named AgentFounder. We evaluate our AgentFounder-30B on 10 benchmarks and achieve state-of-the-art performance while retains strong tool-use ability, notably 39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% Pass@1 on HLE.