Deep Learning Weekly: Issue 389
Tencent introduces Hunyuan3D 2.0, Common pitfalls when building generative AI applications, a paper on Self-adaptive LLMs, and many more!
This week in deep learning, we bring you Tencent introduces Hunyuan3D 2.0, Common pitfalls when building generative AI applications and a paper on Self-adaptive LLMs.
You may also enjoy DeepSeek open-sources its R1 reasoning model series, How Deepseek R1 was trained, a paper on Zero-Shot Mono-to-Binaural Speech Synthesis, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Tencent introduces 'Hunyuan3D 2.0,' AI that speeds up 3D design from days to seconds
Tencent has unveiled Hunyuan 3D 2.0, an AI system that turns single images or text descriptions into detailed 3D models within seconds.
DeepSeek open-sources its R1 reasoning model series
DeepSeek released a new large language model family, the R1 series, that’s optimized for reasoning tasks.
Perplexity launches Sonar, an API for AI search
Perplexity launched an API service called Sonar, allowing enterprises and developers to build the startup’s generative AI search tools into their own applications.
OpenAI teams up with SoftBank and Oracle on $500B data center project
OpenAI says that it will team up with both the Japanese conglomerate SoftBank and with Oracle, along with others, to build multiple data centers for AI in the U.S.
Delivery management platform Package.ai raises $14M for ‘Uber-like’ customer experience
Package.ai, an AI-based delivery platform for retail operations and customer service, has raised $14 million in an early-stage funding round led by Susquehanna Growth Equity.
MLOps & LLMOps
Common pitfalls when building generative AI applications
Chip Huyen’s blog post that discusses common pitfalls when building generative AI applications.
Building knowledge graph agents with LlamaIndex Workflows
An informative blog post about the process and challenges of building a knowledge graph for answering natural language questions, using LlamaIndex Workflows.
Lessons Learned from Building an AI Sales Assistant
A post that explores how NVIDIA built an AI sales assistant to streamline sales workflows and address challenges, core solution components, and key lessons learned.
Learning
A technical article about how Deepseek trained its DeepSeek R1 model using a combination of supervised fine-tuning and reinforcement learning.
How to use Anthropic MCP Server with open LLMs, OpenAI or Google Gemini
A blog post on how to use Anthropic MCP Servers with any open LLM, OpenAI, or Google Gemini.
How Virgo is using DINOv2 to analyze endoscopy videos for precision medicine
An article that describes Virgo's innovative use of Meta’s open-source DINOv2 to create an AI model that analyzes endoscopy videos, particularly for inflammatory bowel disease (IBD).
Stock Market Forecasting with Differential Graph Transformer
An article about a novel graph transformer architecture, called Differential Graph Transformer, that incorporates interstock relationships in predicting stock prices.
Libraries & Code
A foundation model for tabular data that outperforms traditional methods while being dramatically faster.
A platform for software development agents powered by AI.
Autonomous agents for everyone.
Papers & Publications
Abstract:
Self-adaptive large language models (LLMs) aim to solve the challenges posed by traditional fine-tuning methods, which are often computationally intensive and static in their ability to handle diverse tasks. We introduce Transformer2, a novel self-adaptation framework that adapts LLMs for unseen tasks in real-time by selectively adjusting only the singular components of their weight matrices. During inference, Transformer2 employs a two-pass mechanism: first, a dispatch system identifies the task properties, and then task-specific "expert" vectors, trained using reinforcement learning, are dynamically mixed to obtain targeted behavior for the incoming prompt. Our method outperforms ubiquitous approaches such as LoRA, with fewer parameters and greater efficiency. Transformer2 demonstrates versatility across different LLM architectures and modalities, including vision-language tasks. Transformer2 represents a significant leap forward, offering a scalable, efficient solution for enhancing the adaptability and task-specific performance of LLMs, paving the way for truly dynamic, self-organizing AI systems.
Zero-Shot Mono-to-Binaural Speech Synthesis
Abstract:
We present ZeroBAS, a neural method to synthesize binaural audio from monaural audio recordings and positional information without training on any binaural data. To our knowledge, this is the first published zero-shot neural approach to mono-to-binaural audio synthesis. Specifically, we show that a parameter-free geometric time warping and amplitude scaling based on source location suffices to get an initial binaural synthesis that can be refined by iteratively applying a pretrained denoising vocoder. Furthermore, we find this leads to generalization across room conditions, which we measure by introducing a new dataset, TUT Mono-to-Binaural, to evaluate state-of-the-art monaural-to-binaural synthesis methods on unseen conditions. Our zero-shot method is perceptually on-par with the performance of supervised methods on the standard mono-to-binaural dataset, and even surpasses them on our out-of-distribution TUT Mono-to-Binaural dataset. Our results highlight the potential of pretrained generative audio models and zero-shot learning to unlock robust binaural audio synthesis.