Deep Learning Weekly: Issue 370
AlphaProteo, Navigating the New Types of LLM Agents and Architectures, Late Chunking: Balancing Precision and Cost in Long Context Retrieval, a paper on CTRLorALTer, and many more!
This week in deep learning, we bring you AlphaProteo generates novel proteins for biology and health research, Navigating the New Types of LLM Agents and Architectures, Late Chunking: Balancing Precision and Cost in Long Context Retrieval, and a paper on CTRLorALTer: Conditional LoRAdapter for Efficient Zero-Shot Control & Altering of T2I Models.
You may also enjoy The Current State of AI Markets, Tackle Complex LLM Decision-Making with Language Agent Tree Search (LATS) & GPT-4o, a paper on Autonomous LLM-driven research from data to human-verifiable research papers, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
AlphaProteo generates novel proteins for biology and health research
New AI system designs proteins that successfully bind to target molecules, with potential for advancing drug design, disease understanding and more.
Salesforce teams up with Anthropic to enhance Einstein capabilities with Claude
Salesforce customers can now select Claude models for AI-powered business applications and experiences built with Einstein 1 Studio.
The Current State of AI Markets
An article about the current state of AI markets, providing a quantitative analysis of where value has accrued in the AI value chain.
New open source AI leader Reflection 70B's performance questioned, accused of 'fraud'
Reflection 70B, a variant of Meta’s Llama 3.1 LLM released by HyperWrite, initially boasted leading benchmarks but is now under scrutiny as third-party evaluators struggle to reproduce its performance.
Transparency is often lacking in datasets used to train large language models
Researchers developed an easy-to-use tool that enables an AI practitioner to find data that suits the purpose of their model, which could improve accuracy and reduce bias.
MLOps & LLMOps
Navigating the New Types of LLM Agents and Architectures
An article about the evolution of LLM agents and architectures, highlighting the transition from ReAct agents to more structured, second-generation agents with defined solution spaces.
A post-training approach to AI regulation with Model Specs
A blog post about a post-training approach to AI regulation which involves a document that contains a list of principles and examples for desired behavior.
A blog post that discusses the methods used to achieve FP16 inference with popular LLM models such as Meta’s Llama3-8B, where 100% of computation is performed using OpenAI’s Triton Language.
Learning
Late Chunking: Balancing Precision and Cost in Long Context Retrieval
An article about the late chunking method, which balances precision and cost in long context retrieval by embedding entire documents before chunking to preserve contextual information.
Tackle Complex LLM Decision-Making with Language Agent Tree Search (LATS) & GPT-4o
An article about enhancing LLM decision-making by integrating Language Agent Tree Search (LATS) with GPT-4o, providing a robust framework for solving complex problems through dynamic, tree-based search methodologies.
Auto-Retrieval with LlamaCloud
A notebook on how to implement auto-retrieval, an advanced RAG technique that uses an LLM to dynamically infer metadata filter parameters before initiating retrieval.
What's Missing From LLM Chatbots: A Sense of Purpose
An article about the importance of purposeful dialogue in LLM chatbots, emphasizing the need for multi-round conversations to achieve specific user goals and improve human-AI collaboration.
Libraries & Code
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
An open-source multimodal large language model that can hear and talk while thinking.
Papers & Publications
CTRLorALTer: Conditional LoRAdapter for Efficient Zero-Shot Control & Altering of T2I Models
Abstract:
Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images. However, guiding the generative process of these models to consider detailed forms of conditioning reflecting style and/or structure information remains an open problem. In this paper, we present LoRAdapter, an approach that unifies both style and structure conditioning under the same formulation using a novel conditional LoRA block that enables zero-shot control. LoRAdapter is an efficient, powerful, and architecture- agnostic approach to condition text-to-image diffusion models, which enables fine-grained control conditioning during generation and outperforms recent state- of-the-art approaches.
Autonomous LLM-driven research from data to human-verifiable research papers
Abstract:
As AI promises to accelerate scientific discovery, it remains unclear whether fully AI-driven research is possible and whether it can adhere to key scientific values, such as transparency, traceability and verifiability. Mimicking human scientific practices, we built data-to-paper, an automation platform that guides interacting LLM agents through a complete stepwise research process, while programmatically back-tracing information flow and allowing human oversight and interactions. In autopilot mode, provided with annotated data alone, data-to-paper raised hypotheses, designed research plans, wrote and debugged analysis codes, generated and interpreted results, and created complete and information-traceable research papers. Even though research novelty was relatively limited, the process demonstrated autonomous generation of de novo quantitative insights from data. For simple research goals, a fully-autonomous cycle can create manuscripts which recapitulate peer-reviewed publications without major errors in about 80-90%, yet as goal complexity increases, human co-piloting becomes critical for assuring accuracy. Beyond the process itself, created manuscripts too are inherently verifiable, as information-tracing allows to programmatically chain results, methods and data. Our work thereby demonstrates a potential for AI-driven acceleration of scientific discovery while enhancing, rather than jeopardizing, traceability, transparency and verifiability.
Abstract:
This paper explores a simple extension of diffusion-based rectified flow Transformers for text-to-music generation, termed as FluxMusic. Generally, along with the design in the advanced Flux model, we transfer it into a latent VAE space of mel-spectrum. It involves first applying a sequence of independent attention to the double text-music stream, followed by a stacked single music stream for denoised patch prediction. We employ multiple pre-trained text encoders to sufficiently capture caption semantic information as well as inference flexibility. In between, coarse textual information, in conjunction with time step embeddings, is utilized in a modulation mechanism, while fine-grained textual details are concatenated with the music patch sequence as inputs. Through an in-depth study, we demonstrate that rectified flow training with an optimized architecture significantly outperforms established diffusion methods for the text-to-music task, as evidenced by various automatic metrics and human preference evaluations.