Deep Learning Weekly: Issue #290

Meta's LLaMA, errors to avoid for distributed training, Imitation Learning Techniques and Inverse Q-Learning, a paper on Composer: Creative and Controllable Image Synthesis with Composable Conditions

Mar 01, 2023

Hey Folks,

This week in deep learning, we bring you Meta's LLaMA, errors to avoid for distributed training, Imitation Learning Techniques and Inverse Q-Learning, and a paper on Composer: Creative and Controllable Image Synthesis with Composable Conditions.

You may also enjoy The 2023 MAD Landscape, production-ready RecSys pipeline on the cloud, Content Moderation Patterns in the Industry, a paper on Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

To Teach Computers Math, Researchers Merge AI Approaches

Two papers suggest the shape of future AI design, where LLMs can learn to reason via mathematical thinking.

Legal tech startup Robin AI secures $10.5m

Robin AI, a generative AI startup, has scored $10.5m in a funding round headed by investor Plural.

The 2023 MAD (Machine Learning, Artificial Intelligence & Data) Landscape

An annual state of the union post that contains the general landscape, market trends, infrastructure trends, and AI trends.

Introducing LLaMA: A foundational, 65-billion-parameter large language model

Meta publicly released LLaMA (Large Language Model Meta AI), a state-of-the-art foundational large language model designed to help researchers advance their work in this subfield of AI.

Snapchat unveils AI chatbot powered by OpenAI’s ChatGPT

Snap revealed the addition of a chatbot, powered by OpenAI’s ChatGPT, to its multimedia instant messaging app.

Artifact, The New Personalized News App, Is Now Open to Public

Launched by Instagram co-founders Kevin Systrom and Mike Krieger, the AI-powered personalized news reader called Artifact is now open to all.

Comet Releases MLOps Industry Report | 2023 Machine Learning Practitioner Survey

A survey of 500 US-based ML practitioners which covers common challenges, budgets, and resources.

MLOps

NVIDIA Merlin meets the MLOps ecosystem: building a production-ready RecSys pipeline on cloud

A post that highlights a (production-ready pipeline for deep learning recommendations using NVIDIA Merlin and Metaflow.

Distributed Training: Errors to Avoid

An article that covers ten of the most common errors in distributed model training.

Challenges of Feature Monitoring for Real-Time Machine Learning

A post that explores what feature monitoring for real-time machine learning entails and the common obstacles to be faced.

Learning

Credit Card Fraud Detection With Autoencoders

An article that leverages the power of autoencoders to address a key issue for banks and their customers: credit card fraud detection.

Learning to Imitate

A post that covers several Imitation Learning techniques to solve issues with hand-designed reward functions, and a method called Inverse Q-Learning.

Content Moderation - Patterns in Industry

An article about five practical patterns followed by companies for content moderation tasks such as classification, anomaly detection, and search relevance.

Build a GNN-based real-time fraud detection solution using the Deep Graph Library without using external graph storage

This post presents an implementation of a fraud detection solution using the Relational Graph Convolutional Network (RGCN) model to predict the probability that a transaction is fraudulent.

Libraries & Code

huggingface/peft

A Hugging Face library for State-of-the-art Parameter-Efficient Fine-Tuning (PEFT) methods.

OpenAI Cookbook

Examples and guides for using the OpenAI API.

dair-ai/Prompt-Engineering-Guide

Guides, papers, lectures, and resources for prompt engineering.

Papers & Publications

Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes

Abstract:

The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works argue that offline RL methods encounter unique challenges to scaling up model capacity. Drawing on the learnings from these works, we re-examine previous design choices and find that with appropriate choices: ResNets, cross-entropy based distributional backups, and feature normalization, offline Q-learning algorithms exhibit strong performance that scales with model capacity. Using multi-task Atari as a testbed for scaling and generalization, we train a single policy on 40 games with near-human performance using up-to 80 million parameter networks, finding that model performance scales favorably with capacity. In contrast to prior work, we extrapolate beyond dataset performance even when trained entirely on a large (400M transitions) but highly suboptimal dataset (51% human-level performance). Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal. Finally, we show that offline Q-learning with a diverse dataset is sufficient to learn powerful representations that facilitate rapid transfer to novel games and fast online learning on new variations of a training game, improving over existing state-of-the-art representation learning approaches.

Composer: Creative and Controllable Image Synthesis with Composable Conditions

Abstract:

Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. With compositionality as the core idea, we first decompose an image into representative factors, and then train a diffusion model with all these factors as the conditions to recompose the input. At the inference stage, the rich intermediate representations work as composable elements, leading to a huge design space (i.e., exponentially proportional to the number of decomposed factors) for customizable content creation. It is noteworthy that our approach, which we call Composer, supports various levels of conditions, such as text description as the global information, depth map and sketch as the local guidance, color histogram for low-level details, etc. Besides improving controllability, we confirm that Composer serves as a general framework and facilitates a wide range of classical generative tasks without retraining. Code and models will be made available.

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

Abstract:

In recent years, large-scale models have demonstrated state-of-the-art performance across various domains. However, training such models requires various techniques to address the problem of limited computing power and memory on devices such as GPUs. Some commonly used techniques include pipeline parallelism, tensor parallelism, and activation checkpointing. While existing works have focused on finding efficient distributed execution plans (Zheng et al. 2022) and activation checkpoint scheduling (Herrmann et al. 2019, Beaumont et al. 2021}, there has been no method proposed to optimize these two plans jointly. Moreover, ahead-of-time compilation relies heavily on accurate memory and computing overhead estimation, which is often time-consuming and misleading. Existing training systems and machine learning pipelines either physically execute each operand or estimate memory usage with a scaled input tensor. To address these challenges, we introduce a system that can jointly optimize distributed execution and gradient checkpointing plans. Additionally, we provide an easy-to-use symbolic profiler that generates memory and computing statistics for any PyTorch model with a minimal time cost. Our approach allows users to parallelize their model training on the given hardware with minimum code change based.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Discussion about this post

Ready for more?