Deep Learning Weekly: Issue 462
GPT-5.6 Sol, Scaling Laws, a paper on Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents, and many more!
This week in deep learning, we bring you Previewing GPT-5.6 Sol: a next-generation model, Scaling Laws, Carefully and a paper on Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents.
You may also enjoy, Redeploying Claude Fable 5, R&B-EnCoRe: Self-Improving Pretraining of Embodied Reasoning Vision-Language-Action Models, a paper on Unlimited OCR Works, and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Previewing GPT-5.6 Sol: a next-generation model
OpenAI begins a limited, government-coordinated preview of GPT-5.6 Sol, Terra, and Luna, its strongest cybersecurity model yet, paired with a layered safeguard stack and 700,000 GPU hours of automated red-teaming.
Anthropic restores access to Claude Fable 5 and Mythos 5 after US export controls are lifted, rolling out with a strengthened cybersecurity classifier and a new cross-industry jailbreak severity framework.
Anthropic launches Claude Sonnet 5, its most agentic Sonnet model yet, narrowing the performance gap to Opus 4.8 on coding and tool use.
Claude Science, an AI workbench for scientists
Anthropic launches Claude Science, an AI workbench that unifies research tools like PubMed, Jupyter, and cluster compute into one environment with over 60 scientific skills and auditable, reproducible outputs.
LFM2.5-230M: Built to Run Anywhere
Liquid AI releases LFM2.5-230M, its smallest open-weight model yet, delivering 213 tok/s on a Galaxy S25 Ultra and outperforming larger models on tool use and data extraction benchmarks.
Start building with Nano Banana 2 Lite and Gemini Omni Flash
Google launched Nano Banana 2 Lite, generating images in 4 seconds at $0.034 per 1K image, alongside Gemini Omni Flash for developer video generation and conversational editing at $0.10 per second.
MLOps/LLMOps/AgentOps
Opik + Oracle Agent Specification: Build Once, Run Anywhere
Comet integrates Opik with Oracle’s Open Agent Specification, for defining agents once and trace, evaluate, and swap them across frameworks like LangGraph, AutoGen, and WayFlow without rebuilding.
AI Evaluation Simplified: Automate Dataset & Metric Eval Workflows with Test Suites
A guide about Opik’s Test Suites feature, which replaces traditional dataset-and-metric AI evaluation workflows with plain-English assertions that return pass/fail results instead of raw scores.
Learning
A comprehensive explainer tracing how LLM scaling laws evolved from Kaplan to Chinchilla to data-constrained regimes, covering why fitting these power-law relationships is more fragile in practice than it looks.
R&B-EnCoRe: Self-Improving Pretraining of Embodied Reasoning Vision-Language-Action Models
A research blog post introducing R&B-EnCoRe, a self-supervised pretraining cycle that lets vision-language-action models discover which reasoning steps actually improve action prediction, rather than following fixed templates.
Core dump epidemiology: fixing an 18-year-old bug
An engineering deep-dive about how OpenAI used population-level core dump analysis, rather than single-case debugging, to uncover two unrelated crash causes in its Rockset data infrastructure.
MirrorCode: What’s the largest software project AI can complete on its own?
Epoch AI, with METR, launches MirrorCode, a benchmark measuring whether AI can reimplement entire real-world programs end-to-end without source access, on which Claude Opus 4.7 leads at a 56% solve rate.
Libraries & Code
An open-source AI observability tool used to debug, evaluate, and monitor LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
An enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
Papers & Publications
Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents
Abstract:
Despite recent progress, LLM agents still struggle with reasoning over long interaction histories. While current memory-augmented agents rely on a static retrieve-then-reason paradigm, this rigid pipeline design prevents them from dynamically adapting memory access to intermediate evidence discovered during inference. To bridge this gap, we propose MRAgent, a framework that combines an associative memory graph with an active reconstruction mechanism. We represent memory as a Cue-Tag-Content graph, where associative tags serve as semantic bridges connecting fine-grained cues to memory contents. Operating on this structure, our active reconstruction mechanism integrates LLM reasoning directly into memory access, allowing the agent to iteratively explore and prune retrieval paths based on accumulated evidence. This ensures that memory retrieval is dynamically adapted to the reasoning context while avoiding combinatorial explosion caused by unconstrained expansion. Experiments on the LoCoMo benchmark and LongMemEval benchmark demonstrate significant improvements over strong baselines (up to 23%), while substantially reducing token and runtime cost, highlighting the effectiveness of active and associative reconstruction for long-horizon memory reasoning.
Abstract:
Recently, end-to-end OCR models, exemplified by DeepSeek OCR, have once again thrust OCR into the spotlight. A widely held view is that employing a large language model (LLM) as the decoder allows the model to leverage the prior distribution of language, leading to improved OCR performance. However, the downside is equally evident: as the output sequence lengthens, the accumulated KV cache drives up memory consumption and progressively slows down generation. This stands in stark contrast to humans, who exhibit no such decline in efficiency during long-horizon copying tasks. In this technical report, we propose Unlimited OCR, a model designed to emulate human parsing working memory. Taking DeepSeek OCR as the baseline, we replace all attention layers in the decoder with our proposed Reference Sliding Window Attention (R-SWA), which reduces attention computation costs while maintaining a constant KV cache throughout the entire decoding process. By combining the high compression rate of DeepSeek OCR’s encoder with our constant KV cache design, Unlimited OCR can transcribe dozens of pages of documents in a single forward pass under a standard maximum length of 32K. More importantly, R-SWA is a general-purpose parsing attention mechanism - beyond OCR, it is equally applicable to tasks such as ASR, translation, etc.


