Deep Learning Weekly: Issue 438

Comet, Vercel, and Google DeepMind launch a month-long AI Agents hackathon with $30K prizes, Claude Cowork, a paper on Prompt Repetition Improves Non-Reasoning LLMs, and many more!

Jan 15, 2026

This week in deep learning, we bring you Comet, Vercel, and Google DeepMind launch a month-long AI Agents hackathon with $30K prizes, Claude Cowork and a paper on Prompt Repetition Improves Non-Reasoning LLMs.

You may also enjoy Sakana AI Agent Wins AtCoder Heuristic Contest, Use multiple models, a paper on Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Comet, Vercel, and Google DeepMind launch a month-long AI Agents hackathon with $30K prizes

A virtual hackathon that focuses on shipping LLM-powered apps that turn New Year’s resolutions into measurable outcomes across six impact categories.

Introducing Cowork | Claude

Anthropic launches Cowork research preview, extending Claude Code’s agentic capabilities to non-coding workflows for Claude Max subscribers on macOS.

Sakana AI Agent Wins AtCoder Heuristic Contest (First AI to Place 1st)

Sakana AI’s ALE-Agent became the first AI to win a competitive programming contest, defeating 804 human participants by discovering novel optimization algorithms.

OpenAI buys Torch to bring unified medical data into ChatGPT Health

OpenAI acquires Torch to integrate unified medical data aggregation into ChatGPT Health, consolidating fragmented patient records from multiple healthcare providers into a single AI-powered interface.

New tech and tools for retailers to succeed in an agentic shopping era

Google launches Universal Commerce Protocol (UCP) with Shopify, Target, Walmart and 20+ partners, enabling AI-powered checkout in Search, branded Business Agent chatbots, and Direct Offers for personalized discounts.

MLOps & LLMOps

Building Agents with the Gemini Interactions API

A practical guide about building AI agents using Google’s Gemini Interactions API (Beta), demonstrating how server-side state management simplifies agent development from basic chatbots to multi-turn CLI agents in under 100 lines of code.

Best practices for coding with agents · Cursor

A comprehensive guide about maximizing productivity with Cursor’s AI coding agents through planning workflows, context management, parallel execution, and iterative debugging strategies.

Learning

MIPRO: The Optimizer That Brought Science to Prompt Engineering

An article about MIPRO (Multiprompt Instruction Proposal Optimizer), achieving up to 13% better performance than hand-crafted prompts.

Use multiple models - by Nathan Lambert

An article about the emerging multi-model workflow strategy for AI power users in 2026, where switching between GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro for different tasks yields better results than relying on any single model due to uneven “jagged” capabilities.

Next generation medical image interpretation with MedGemma 1.5 and medical speech to text with MedASR

An article about Google’s MedGemma 1.5 4B update adding 3D medical imaging interpretation and MedASR speech transcription model, launching alongside a $100,000 Kaggle hackathon for healthcare AI applications.

Supercharging LLMs: Scalable RL with torchforge and Weaver

A technical post about Meta’s torchforge RL library achieving 4x faster training on 512 GPUs when combined with Stanford’s Weaver verifier system, capturing 44-65% of supervised learning performance without requiring human annotations.

Libraries & Code

comet-ml/opik

An open-source LLM evaluation tool used to debug, evaluate, and monitor LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Aider-AI/aider

AI pair programming in your terminal

Papers & Publications

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Abstract:

While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic N-gram embedding for O(1) lookup. By formulating the Sparsity Allocation problem, we uncover a U-shaped scaling law that optimizes the trade-off between neural computation (MoE) and static memory (Engram). Guided by this law, we scale Engram to 27B parameters, achieving superior performance over a strictly iso-parameter and iso-FLOPs MoE baseline. Most notably, while the memory module is expected to aid knowledge retrieval (e.g., MMLU +3.4; CMMLU +4.0), we observe even larger gains in general reasoning (e.g., BBH +5.0; ARC-Challenge +3.7) and code/math domains~(HumanEval +3.0; MATH +2.4). Mechanistic analyses reveal that Engram relieves the backbone’s early layers from static reconstruction, effectively deepening the network for complex reasoning. Furthermore, by delegating local dependencies to lookups, it frees up attention capacity for global context, substantially boosting long-context retrieval (e.g., Multi-Query NIAH: 84.2 to 97.0). Finally, Engram establishes infrastructure-aware efficiency: its deterministic addressing enables runtime prefetching from host memory, incurring negligible overhead. We envision conditional memory as an indispensable modeling primitive for next-generation sparse models.

Prompt Repetition Improves Non-Reasoning LLMs

Abstract:

When not using reasoning, repeating the input prompt improves performance for popular models (Gemini, GPT, Claude, and Deepseek) without increasing the number of generated tokens or latency.

A guest post by

Miko Planas

~~~

Philip ikedi

Jan 20

How can one apply for the work?

and is it a remote job

Jesús Martínez

Jan 15

Amid faster models, bigger benchmarks, and smarter agents, the real intelligence question remains simple: are we building systems that think better or just systems that scale faster?

Progress is not measured by novelty alone, but by clarity, restraint, and the ability to choose the right tool for the right problem.

True intelligence human or artificial emerges not from excess, but from structure, memory, and deliberate use of attention.

Deep Learning Weekly

Discussion about this post

Ready for more?