Deep Learning Weekly: Issue 411
AlphaGenome: AI for better understanding the genome, TPU Deep Dive, a paper on OmniGen2: Exploration to Advanced Multimodal Generation, and many more!
This week in deep learning, we bring you AlphaGenome: AI for better understanding the genome, TPU Deep Dive, and a paper on OmniGen2: Exploration to Advanced Multimodal Generation.
You may also enjoy Anthropic Economic Futures Program Launch, The New Skill in AI is Not Prompting, It's Context Engineering, a paper on Biomni: A General-Purpose Biomedical AI Agent, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
AlphaGenome: AI for better understanding the genome
DeepMind introduced AlphaGenome, a new AI tool that more accurately predicts how single variants or mutations in human DNA sequences impact a wide range of biological processes.
Anthropic Economic Futures Program Launch
Anthropic announced the Anthropic Economic Futures Program, a new initiative to support research and policy development focused on addressing AI’s economic impacts.
Researchers Uncover Hidden Ingredients Behind AI Creativity
A recent study by two physicists suggests that the creativity of a diffusion model comes from locality and equivariance – a by-product of its architecture.
Creative Commons introduces CC Signals framework for AI data use
Creative Commons has previewed an upcoming framework designed to help creators manage how AI models use their content.
AI startup Tandem Health raises $50M to reduce notetaking burden on European doctors
Tandem Health, an AI-powered startup aiming to ease the administrative burden on doctors and clinicians, announced that it has raised $50 million in early-stage funding.
MLOps & LLMOps
Beyond chatbots: adopting Agentic Document Workflows for enterprises
A blog post about the stages, building blocks, and real-world applications of Agentic Document Workflows (ADW) for enterprises.
The New Skill in AI is Not Prompting, It's Context Engineering
An article highlighting the concept of "Context Engineering" as a new skill in AI, shifting from prompt engineering to providing comprehensive, dynamic information and tools.
Life of an inference request (vLLM V1): How LLMs are served efficiently at scale
An article highlighting the architecture and lifecycle of an inference request in vLLM V1, explaining how large language models are served efficiently at scale.
Learning
Training and Finetuning Sparse Embedding Models with Sentence Transformers v5
An article about training and fine-tuning sparse embedding models with Sentence Transformers v5, detailing model types, datasets, training arguments, evaluators, and more.
A deep dive into Google's Tensor Processing Units (TPUs), covering their single-chip, multi-chip, rack, and multi-pod levels, as well as their design philosophy focusing on systolic arrays, pipelining, and Ahead-of-Time (AoT) compilation.
Libraries & Code
A neuro-symbolic framework combining classical Python programming with the differentiable, programmable nature of LLMs.
An LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios.
Papers & Publications
OmniGen2: Exploration to Advanced Multimodal Generation
Abstract:
In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field.
Biomni: A General-Purpose Biomedical AI Agent
Abstract:
Biomedical research underpins progress in our understanding of human health and disease, drug discovery, and clinical care. However, with the growth of complex lab experiments, large datasets, many analytical tools, and expansive literature, biomedical research is increasingly constrained by repetitive and fragmented workflows that slow discovery and limit innovation, underscoring the need for a fundamentally new way to scale scientific expertise. Here, we introduce Biomni, a general-purpose biomedical AI agent designed to autonomously execute a wide spectrum of research tasks across diverse biomedical subfields. To systematically map the biomedical action space, Biomni first employs an action discovery agent to create the first unified agentic environment – mining essential tools, databases, and protocols from tens of thousands of publications across 25 biomedical domains. Built on this foundation, Biomni features a generalist agentic architecture that integrates large language model (LLM) reasoning with retrieval-augmented planning and code-based execution, enabling it to dynamically compose and carry out complex biomedical workflows – entirely without relying on predefined templates or rigid task flows. Systematic benchmarking demonstrates that Biomni achieves strong generalization across heterogeneous biomedical tasks – including causal gene prioritization, drug repurposing, rare disease diagnosis, micro-biome analysis, and molecular cloning – without any task-specific prompt tuning. Real-world case studies further showcase Biomni’s ability to interpret complex, multi-modal biomedical datasets and autonomously generate experimentally testable protocols. Biomni envisions a future where virtual AI biologists operate alongside and augment human scientists to dramatically enhance research productivity, clinical insight, and healthcare. Biomni is ready to use at https://biomni.stanford.edu, and we invite scientists to explore its capabilities, stress-test its limits, and co-create the next era of biomedical discoveries.
So, can $50 million really get those AI clinical tools up and running in hospitals, even with all the red tape, data rules, and tech issues they face?