Deep Learning Weekly: Issue 355
OpenAI Board Forms Safety and Security Committee, From Predictive to Generative – How Michelangelo Accelerates Uber’s AI Journey, For-profit AI Safety, and many more!
This week in deep learning, we bring you OpenAI Board Forms Safety and Security Committee, From Predictive to Generative – How Michelangelo Accelerates Uber’s AI Journey, For-profit AI Safety, and a paper on MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning.
You may also enjoy Codestral, Understanding the Cost of Generative AI Models in Production, a paper on Emu Edit: Precise Image Editing via Recognition and Generation Tasks, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
OpenAI Board Forms Safety and Security Committee
The OpenAI Board formed a Safety and Security Committee led by directors Bret Taylor, Adam D’Angelo, Nicole Seligman, and Sam Altman.
Foundation Model Transparency Index
Center for Research on Foundation Models (CRFM) released a comprehensive assessment of the transparency of foundation model developers.
Mistral AI introduced Codestral, their first open-weight generative AI model designed for code generation tasks.
Looking for a specific action in a video? This AI-based method can find it for you
Researchers from MIT developed a technique that teaches machine learning models to identify specific actions in long videos.
Aporia introduces real-time guardrails for multimodal AI applications
Aporia, an observability startup, launched Guardrails for Multimodal AI Applications.
Vox Media and The Atlantic sign content deals with OpenAI
Two more media companies have signed licensing agreements with OpenAI, allowing their content to be used to train its AI models and be shared inside of ChatGPT.
MLOps & LLMOps
Understanding the Cost of Generative AI Models in Production
A post that aims to provide a more comprehensive understanding of the total cost of ownership (TCO) for Generative AI.
From Predictive to Generative – How Michelangelo Accelerates Uber’s AI Journey
A blog post that discusses the evolution of Michelangelo, Uber’s centralized ML platform, from predictive machine learning to advanced deep learning models and ultimately to Generative AI.
Create a Blog Writer Multi-Agent System using Crewai and Ollama
A technical article that walks through how to create a multi-agent system for blog writing using Crewai and Ollama.
Learning
Prompting Fundamentals and How to Apply them Effectively
Eugene Yan discusses some prompting fundamentals to help you get the most out of LLMs.
The Role of Feature Stores in Fine-Tuning LLMs
Fine-tuning is the process of taking a pre-trained LLM and adapting it to a specific task or domain.
A post that outlines the for-profit AI safety opportunities
How would you tokenize (or break down) a million digits of pi?
A visual exploration into how LLMs tokenize long sequences of numbers and other unusual sequences.
Property Graph Index: A Powerful New Way to Build Knowledge Graphs with LLMs
An article that highlights a new feature in LlamaIndex that expands knowledge graph capabilities to be more flexible and robust.
Libraries & Code
A high-performance RLHF framework built on Ray, DeepSpeed and HF Transformers
An imitation learning benchmark specifically targeted towards locomotion
Papers & Publications
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Abstract:
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. To achieve it, we introduce the corresponding non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix. Furthermore, these operators ensure that the weight can be merged back into LLMs, which makes our method can be deployed like LoRA. We perform a comprehensive evaluation of our method across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory and pretraining. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Abstract:
Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To address this, our work presents the pioneering attempt to develop a unified MLLM with the MoE architecture, named Uni-MoE that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We also implement a sparse MoE architecture within the LLMs to enable efficient training and inference through modality-level data parallelism and expert-level model parallelism. To enhance the multi-expert collaboration and generalization, we present a progressive training strategy: 1) Cross-modality alignment using various connectors with different cross-modality data, 2) Training modality-specific experts with cross-modality instruction data to activate experts' preferences, and 3) Tuning the Uni-MoE framework utilizing Low-Rank Adaptation (LoRA) on mixed multimodal instruction data. We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets. The extensive experimental results demonstrate Uni-MoE's principal advantage of significantly reducing performance bias in handling mixed multimodal datasets, alongside improved multi-expert collaboration and generalization.
Emu Edit: Precise Image Editing via Recognition and Generation Tasks
Abstract:
Instruction-based image editing holds immense potential for a variety of applications, as it enables users to perform any editing operation using a natural language instruction. However, current models in this domain often struggle with accurately executing user instructions. We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing. To develop Emu Edit we train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks, all of which are formulated as generative tasks. Additionally, to enhance Emu Edit’s multi-task learning abilities, we provide it with learned task embeddings which guide the generation process towards the correct edit type. Both these elements are essential for Emu Edit’s outstanding performance. Furthermore, we show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples. This capability offers a significant advantage in scenarios where high-quality samples are scarce. Lastly, to facilitate a more rigorous and informed assessment of instructable image editing models, we release a new challenging and versatile benchmark that includes seven different image editing tasks.