Deep Learning Weekly: Issue 355

OpenAI Board Forms Safety and Security Committee, From Predictive to Generative – How Michelangelo Accelerates Uber’s AI Journey, For-profit AI Safety, and many more!

May 30, 2024

This week in deep learning, we bring you OpenAI Board Forms Safety and Security Committee, From Predictive to Generative – How Michelangelo Accelerates Uber’s AI Journey, For-profit AI Safety, and a paper on MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning.

You may also enjoy Codestral, Understanding the Cost of Generative AI Models in Production, a paper on Emu Edit: Precise Image Editing via Recognition and Generation Tasks, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

OpenAI Board Forms Safety and Security Committee

The OpenAI Board formed a Safety and Security Committee led by directors Bret Taylor, Adam D’Angelo, Nicole Seligman, and Sam Altman.

Foundation Model Transparency Index

Center for Research on Foundation Models (CRFM) released a comprehensive assessment of the transparency of foundation model developers.

Codestral: Hello, World!

Mistral AI introduced Codestral, their first open-weight generative AI model designed for code generation tasks.

Looking for a specific action in a video? This AI-based method can find it for you

Researchers from MIT developed a technique that teaches machine learning models to identify specific actions in long videos.

Aporia introduces real-time guardrails for multimodal AI applications

Aporia, an observability startup, launched Guardrails for Multimodal AI Applications.

Vox Media and The Atlantic sign content deals with OpenAI

Two more media companies have signed licensing agreements with OpenAI, allowing their content to be used to train its AI models and be shared inside of ChatGPT.

MLOps & LLMOps

Understanding the Cost of Generative AI Models in Production

A post that aims to provide a more comprehensive understanding of the total cost of ownership (TCO) for Generative AI.

From Predictive to Generative – How Michelangelo Accelerates Uber’s AI Journey

A blog post that discusses the evolution of Michelangelo, Uber’s centralized ML platform, from predictive machine learning to advanced deep learning models and ultimately to Generative AI.

Create a Blog Writer Multi-Agent System using Crewai and Ollama

A technical article that walks through how to create a multi-agent system for blog writing using Crewai and Ollama.

Learning

Prompting Fundamentals and How to Apply them Effectively

Eugene Yan discusses some prompting fundamentals to help you get the most out of LLMs.

The Role of Feature Stores in Fine-Tuning LLMs

Fine-tuning is the process of taking a pre-trained LLM and adapting it to a specific task or domain.

For-profit AI Safety

A post that outlines the for-profit AI safety opportunities

How would you tokenize (or break down) a million digits of pi?

A visual exploration into how LLMs tokenize long sequences of numbers and other unusual sequences.

Property Graph Index: A Powerful New Way to Build Knowledge Graphs with LLMs

An article that highlights a new feature in LlamaIndex that expands knowledge graph capabilities to be more flexible and robust.

Libraries & Code

OpenLLMAI/OpenRLHF

A high-performance RLHF framework built on Ray, DeepSpeed and HF Transformers

robfiras/loco-mujoco

An imitation learning benchmark specifically targeted towards locomotion

Papers & Publications

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Abstract:

Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. To achieve it, we introduce the corresponding non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix. Furthermore, these operators ensure that the weight can be merged back into LLMs, which makes our method can be deployed like LoRA. We perform a comprehensive evaluation of our method across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory and pretraining. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Abstract:

Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To address this, our work presents the pioneering attempt to develop a unified MLLM with the MoE architecture, named Uni-MoE that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We also implement a sparse MoE architecture within the LLMs to enable efficient training and inference through modality-level data parallelism and expert-level model parallelism. To enhance the multi-expert collaboration and generalization, we present a progressive training strategy: 1) Cross-modality alignment using various connectors with different cross-modality data, 2) Training modality-specific experts with cross-modality instruction data to activate experts' preferences, and 3) Tuning the Uni-MoE framework utilizing Low-Rank Adaptation (LoRA) on mixed multimodal instruction data. We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets. The extensive experimental results demonstrate Uni-MoE's principal advantage of significantly reducing performance bias in handling mixed multimodal datasets, alongside improved multi-expert collaboration and generalization.

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

Abstract:

Instruction-based image editing holds immense potential for a variety of applications, as it enables users to perform any editing operation using a natural language instruction. However, current models in this domain often struggle with accurately executing user instructions. We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing. To develop Emu Edit we train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks, all of which are formulated as generative tasks. Additionally, to enhance Emu Edit’s multi-task learning abilities, we provide it with learned task embeddings which guide the generation process towards the correct edit type. Both these elements are essential for Emu Edit’s outstanding performance. Furthermore, we show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples. This capability offers a significant advantage in scenarios where high-quality samples are scarce. Lastly, to facilitate a more rigorous and informed assessment of instructable image editing models, we release a new challenging and versatile benchmark that includes seven different image editing tasks.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Deep Learning Weekly: Issue 355

OpenAI Board Forms Safety and Security Committee, From Predictive to Generative – How Michelangelo Accelerates Uber’s AI Journey, For-profit AI Safety, and many more!

Industry

MLOps & LLMOps

Learning

Libraries & Code

Papers & Publications

Discussion about this post