Deep Learning Weekly: Issue 372
Challenges and solutions for building and scaling Generative AI solutions using GenOps, Anthropic introduces Contextual Retrieval, a paper on Kolmogorov-Arnold Transformer, and many more!
This week in deep learning, we bring you Mistral AI's free API, Learn how to build and scale Generative AI solutions with GenOps, Introducing Contextual Retrieval \ Anthropic, and a paper on Kolmogorov-Arnold Transformer.
You may also enjoy Microsoft Trustworthy AI, Choose Your Own Adventure Workflow (Human In The Loop), a paper on Composable Interventions for Language Models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Mistral AI introduced a free API, improved pricing across the board, a new enterprise-grade Mistral Small, and free vision capabilities on le Chat.
Microsoft Trustworthy AI: Unlocking human potential starts with trust
Microsoft announced Correction, a capability in Microsoft Azure AI Content Safety’s Groundedness detection feature that helps fix hallucination issues in real time.
Alibaba releases 100+ open-source AI models and new text-to-video generator
Alibaba Cloud announced the release of more than 100 new open-source large language models as part of the Qwen 2.5 family.
Amazon introduces Amelia, an AI assistant for third-party sellers
Amazon is rolling out an AI tool designed to help third-party sellers quickly resolve issues with their accounts, as well as fetch sales and inventory data.
Contact center-as-a-service startup Ujet raises $76M to accelerate generative AI development
Ujet raised $76 million to further develop generative AI enhancements for customer support.
Google is adding Gemini chatbot to Google Workspace, increasing access to its enterprise-grade AI app.
Intel launches Xeon 6 and Gaudi 3 AI chips to boost AI and HPC performance
Intel launched its Xeon 6 processor and Gaudi 3 AI accelerators to compete in the AI market.
MLOps & LLMOps
Learn how to build and scale Generative AI solutions with GenOps
Google’s blog post about the challenges and solutions for building and scaling generative AI solutions using GenOps, the next evolution of MLOps.
Choose Your Own Adventure Workflow (Human In The Loop)
A blog post about how to build an agentic "choose your own adventure" workflow with LlamaIndex.
Optimize and deploy models with Optimum-Intel and OpenVINO GenAI
An article outlining how to optimize and deploy Hugging Face Transformers models using Optimum-Intel and OpenVINO GenAI for efficient AI inference.
Learning
Introducing Contextual Retrieval \ Anthropic
A technical blog post describing Contextual Retrieval, a method by Anthropic to improve Retrieval-Augmented Generation by adding context to retrieved information chunks.
A technical blog post about the development of FineVideo, a dataset of 43k videos annotated with descriptions, scene splits, and question-answer pairs for training video AI models.
Evaluate LLMs using Evaluation Harness and Hugging Face TGI/vLLM
An article explaining how to evaluate large language models hosted on Hugging Face's Text Generation Inference (TGI) or vLLM using the Evaluation Harness.
Libraries & Code
EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
Papers & Publications
Abstract:
Transformers stand as the cornerstone of mordern deep learning. Traditionally, these models rely on multi-layer perceptron (MLP) layers to mix the information between channels. In this paper, we introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces MLP layers with Kolmogorov-Arnold Network (KAN) layers to enhance the expressiveness and performance of the model. Integrating KANs into transformers, however, is no easy feat, especially when scaled up. Specifically, we identify three key challenges: (C1) Base function. The standard B-spline function used in KANs is not optimized for parallel computing on modern hardware, resulting in slower inference speeds. (C2) Parameter and Computation Inefficiency. KAN requires a unique function for each input-output pair, making the computation extremely large. (C3) Weight initialization. The initialization of weights in KANs is particularly challenging due to their learnable activation functions, which are critical for achieving convergence in deep neural networks. To overcome the aforementioned challenges, we propose three key solutions: (S1) Rational basis. We replace B-spline functions with rational functions to improve compatibility with modern GPUs. By implementing this in CUDA, we achieve faster computations. (S2) Group KAN. We share the activation weights through a group of neurons, to reduce the computational load without sacrificing performance. (S3) Variance-preserving initialization. We carefully initialize the activation weights to make sure that the activation variance is maintained across layers. With these designs, KAT scales effectively and readily outperforms traditional MLP-based transformers.
Composable Interventions for Language Models
Abstract:
Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventions interact. We fill this gap by introducing composable interventions, a framework to study the effects of using multiple interventions on the same language models, featuring new metrics and a unified codebase. Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories -- Knowledge Editing, Model Compression, and Machine Unlearning. Our results from 310 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. Taken together, our findings showcase clear gaps in composability, suggesting a need for new multi-objective interventions.
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
Abstract:
The increasing demand for high-quality 3D assets across various industries necessitates efficient and automated 3D content creation. Despite recent advancements in 3D generative models, existing methods still face challenges with optimization speed, geometric fidelity, and the lack of assets for physically based rendering (PBR). In this paper, we introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome these limitations. 3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format, facilitating the modeling of high-resolution geometry with PBR assets. On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion. 3DTopia-XL learns to generate high-quality 3D assets from textual or visual inputs. We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-quality 3D assets with fine-grained textures and materials, efficiently bridging the quality gap between generative models and real-world applications.