Deep Learning Weekly: Issue 388
Cohere's secure AI workspace called North, a paper on Clio: Privacy-Preserving Insights into Real-World AI Use, and many more!
This week in deep learning, we bring you Cohere's secure AI workspace called North, and a paper on Clio: Privacy-Preserving Insights into Real-World AI Use.
You may also enjoy Microsoft's Phi-4, Optimizing LLM Test-Time Compute Involves Solving a Meta-RL Problem, a paper on KG-TRICK: Unifying Textual and Relational Information Completion of Knowledge for Multilingual Knowledge Graphs, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Introducing North: A secure AI workspace to get more done
Cohere launched the early access program for North, an all-in-one secure AI workspace platform that empowers employees to significantly improve the quality and speed of their work.
Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning
Microsoft introduced Phi-4, a 14B parameter state-of-the-art small language model (SLM) that excels at complex reasoning in areas such as math, in addition to conventional language processing.
Mistral AI unveils Codestral 25.01, which features a more efficient architecture and an improved tokenizer than the original, generating code 2 times faster.
Teaching AI to communicate sounds like humans do
Inspired by the human vocal tract, a new AI model can produce and understand vocal imitations of everyday sounds. The method could help build new sonic interfaces for entertainment and education.
Overhaul raises $55M for its AI-powered cargo management platform
Overhaul, a startup helping enterprises protect merchandise while it travels through their supply chains, has raised $55 million in funding.
MLOps & LLMOps
Introducing Agentic Document Workflows
A technical blog post introducing Agentic Document Workflows for applying agents to document processing.
How to Build AI Agents with LangGraph: A Step-by-Step Guide
An informative blog post about how to build AI agents using LangGraph and AWS.
Learning
Optimizing LLM Test-Time Compute Involves Solving a Meta-RL Problem
A blog post exploring the idea of optimizing test-time compute for LLMs as a meta-RL problem.
Can AI Models Show Us How People Learn? Impossible Languages Point a Way.
A detailed article about how linguists are using artificial languages and neural networks to explore how people learn.
Paradigm Shifts of Eval in the Age of LLMs
An in-depth blog post discussing paradigm shifts in evaluating LLM applications, including the need to benchmark differences and embrace human triage.
Integrating Ascend Backend with Torchtune through PyTorch Multi-Device Support
A blog post that introduces torchtune, and demonstrates how it can be used to fine-tune models with Ascend.
Libraries & Code
A developer-first world foundation model platform designed to help Physical AI developers build their Physical AI systems better and faster.
The open-source visual AI programming environment and TypeScript library
Papers & Publications
Clio: Privacy-Preserving Insights into Real-World AI Use
Abstract:
How are AI assistants being used in the real world? While model providers in theory have a window into this impact via their users' data, both privacy concerns and practical challenges have made analyzing this data difficult. To address these issues, we present Clio (Claude insights and observations), a privacy-preserving platform that uses AI assistants themselves to analyze and surface aggregated usage patterns across millions of conversations, without the need for human reviewers to read raw conversations. We validate this can be done with a high degree of accuracy and privacy by conducting extensive evaluations. We demonstrate Clio's usefulness in two broad ways. First, we share insights about how models are being used in the real world from one million Claude.ai Free and Pro conversations, ranging from providing advice on hairstyles to providing guidance on Git operations and concepts. We also identify the most common high-level use cases on Claude.ai (coding, writing, and research tasks) as well as patterns that differ across languages (e.g., conversations in Japanese discuss elder care and aging populations at higher-than-typical rates). Second, we use Clio to make our systems safer by identifying coordinated attempts to abuse our systems, monitoring for unknown unknowns during critical periods like launches of new capabilities or major world events, and improving our existing monitoring systems. We also discuss the limitations of our approach, as well as risks and ethical concerns. By enabling analysis of real-world AI usage, Clio provides a scalable platform for empirically grounded AI safety and governance.
TransPixar: Advancing Text-to-Video Generation with Transparency
Abstract:
Text-to-video generative models have made significant strides, enabling diverse applications in entertainment, advertising, and education. However, generating RGBA video, which includes alpha channels for transparency, remains a challenge due to limited datasets and the difficulty of adapting existing models. Alpha channels are crucial for visual effects (VFX), allowing transparent elements like smoke and reflections to blend seamlessly into scenes. We introduce TransPixar, a method to extend pretrained video models for RGBA generation while retaining the original RGB capabilities. TransPixar leverages a diffusion transformer (DiT) architecture, incorporating alpha-specific tokens and using LoRA-based fine-tuning to jointly generate RGB and alpha channels with high consistency. By optimizing attention mechanisms, TransPixar preserves the strengths of the original RGB model and achieves strong alignment between RGB and alpha channels despite limited training data. Our approach effectively generates diverse and consistent RGBA videos, advancing the possibilities for VFX and interactive content creation.
Abstract:
Multilingual knowledge graphs (KGs) provide high-quality relational and textual information for various NLP applications, but they are often incomplete, especially in non-English languages. Previous research has shown that combining information from KGs in different languages aids either Knowledge Graph Completion (KGC), the task of predicting missing relations between entities, or Knowledge Graph Enhancement (KGE), the task of predicting missing textual information for entities. Although previous efforts have considered KGC and KGE as independent tasks, we hypothesize that they are interdependent and mutually beneficial. To this end, we introduce KG-TRICK, a novel sequence-to-sequence framework that unifies the tasks of textual and relational information completion for multilingual KGs. KG-TRICK demonstrates that: i) it is possible to unify the tasks of KGC and KGE into a single framework, and ii) combining textual information from multiple languages is beneficial to improve the completeness of a KG. As part of our contributions, we also introduce WikiKGE10++, the largest manually-curated benchmark for textual information completion of KGs, which features over 25,000 entities across 10 diverse languages.