Deep Learning Weekly: Issue 438
Comet, Vercel, and Google DeepMind launch a month-long AI Agents hackathon with $30K prizes, Claude Cowork, a paper on Prompt Repetition Improves Non-Reasoning LLMs, and many more!
This week in deep learning, we bring you Comet, Vercel, and Google DeepMind launch a month-long AI Agents hackathon with $30K prizes, Claude Cowork and a paper on Prompt Repetition Improves Non-Reasoning LLMs.
You may also enjoy Sakana AI Agent Wins AtCoder Heuristic Contest, Use multiple models, a paper on Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Comet, Vercel, and Google DeepMind launch a month-long AI Agents hackathon with $30K prizes
A virtual hackathon that focuses on shipping LLM-powered apps that turn New Year’s resolutions into measurable outcomes across six impact categories.
Anthropic launches Cowork research preview, extending Claude Code’s agentic capabilities to non-coding workflows for Claude Max subscribers on macOS.
Sakana AI Agent Wins AtCoder Heuristic Contest (First AI to Place 1st)
Sakana AI’s ALE-Agent became the first AI to win a competitive programming contest, defeating 804 human participants by discovering novel optimization algorithms.
OpenAI buys Torch to bring unified medical data into ChatGPT Health
OpenAI acquires Torch to integrate unified medical data aggregation into ChatGPT Health, consolidating fragmented patient records from multiple healthcare providers into a single AI-powered interface.
New tech and tools for retailers to succeed in an agentic shopping era
Google launches Universal Commerce Protocol (UCP) with Shopify, Target, Walmart and 20+ partners, enabling AI-powered checkout in Search, branded Business Agent chatbots, and Direct Offers for personalized discounts.
MLOps & LLMOps
Building Agents with the Gemini Interactions API
A practical guide about building AI agents using Google’s Gemini Interactions API (Beta), demonstrating how server-side state management simplifies agent development from basic chatbots to multi-turn CLI agents in under 100 lines of code.
Best practices for coding with agents · Cursor
A comprehensive guide about maximizing productivity with Cursor’s AI coding agents through planning workflows, context management, parallel execution, and iterative debugging strategies.
Learning
MIPRO: The Optimizer That Brought Science to Prompt Engineering
An article about MIPRO (Multiprompt Instruction Proposal Optimizer), achieving up to 13% better performance than hand-crafted prompts.
Use multiple models - by Nathan Lambert
An article about the emerging multi-model workflow strategy for AI power users in 2026, where switching between GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro for different tasks yields better results than relying on any single model due to uneven “jagged” capabilities.
An article about Google’s MedGemma 1.5 4B update adding 3D medical imaging interpretation and MedASR speech transcription model, launching alongside a $100,000 Kaggle hackathon for healthcare AI applications.
Supercharging LLMs: Scalable RL with torchforge and Weaver
A technical post about Meta’s torchforge RL library achieving 4x faster training on 512 GPUs when combined with Stanford’s Weaver verifier system, capturing 44-65% of supervised learning performance without requiring human annotations.
Libraries & Code
An open-source LLM evaluation tool used to debug, evaluate, and monitor LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
AI pair programming in your terminal
Papers & Publications
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Abstract:
While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic N-gram embedding for O(1) lookup. By formulating the Sparsity Allocation problem, we uncover a U-shaped scaling law that optimizes the trade-off between neural computation (MoE) and static memory (Engram). Guided by this law, we scale Engram to 27B parameters, achieving superior performance over a strictly iso-parameter and iso-FLOPs MoE baseline. Most notably, while the memory module is expected to aid knowledge retrieval (e.g., MMLU +3.4; CMMLU +4.0), we observe even larger gains in general reasoning (e.g., BBH +5.0; ARC-Challenge +3.7) and code/math domains~(HumanEval +3.0; MATH +2.4). Mechanistic analyses reveal that Engram relieves the backbone’s early layers from static reconstruction, effectively deepening the network for complex reasoning. Furthermore, by delegating local dependencies to lookups, it frees up attention capacity for global context, substantially boosting long-context retrieval (e.g., Multi-Query NIAH: 84.2 to 97.0). Finally, Engram establishes infrastructure-aware efficiency: its deterministic addressing enables runtime prefetching from host memory, incurring negligible overhead. We envision conditional memory as an indispensable modeling primitive for next-generation sparse models.
Prompt Repetition Improves Non-Reasoning LLMs
Abstract:
When not using reasoning, repeating the input prompt improves performance for popular models (Gemini, GPT, Claude, and Deepseek) without increasing the number of generated tokens or latency.


