Deep Learning Weekly: Issue #289

An AI racing agent for Gran Turismo 7, an in-depth analysis of GPUs for Deep Learning in 2023, Towards Geometric Deep Learning, a paper on adding Conditional Control to Text-to-image Diffusion Models

Feb 22, 2023

Hey Folks,

This week in deep learning, we bring you Sony AI and Polyphony Digital's AI racing agent for Gran Turismo 7, an in-depth analysis of GPUs for Deep Learning in 2023, Towards Geometric Deep Learning, and a paper on adding Conditional Control to Text-to-image Diffusion Models.

You may also enjoy production-ready TensorFlow Decision Forests, a tutorial for Pipeline Parallelism, zero-shot image-to-text generation with BLIP-2, a paper on Symbolic Discovery of Optimization Algorithms, and more.

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

How generative AI & ChatGPT will change business

A timely McKinsey blog that highlights and breaks down the possible use cases that ChatGPT has for businesses.

Conservation AI Detects Threats to Endangered Species

Conservation AI detects threats to pangolins, rhinos and more than 50 other species in real time, using the NVIDIA Jetson platform and Triton Inference Server.

The ‘breakthrough’ AI racing agent GT Sophy debuts for Gran Turismo 7

Sony AI and Polyphony Digital have released Gran Turismo Sophy, a hyper-realistic AI racing agent for Gran Turismo 7.

OpenAI launches initiative to improve ChatGPT after it went rogue

OpenAI is launching an initiative to provide more transparency about ChatGPT.

You.com debuts multimodal AI chatbot for search

Search startup You.com has debuted an artificial intelligence chatbot that can provide natural language answers to user queries, as well as surface related items such as code snippets and graphs.

GitHub launches Copilot for Business into general availability with AI upgrades

Microsoft ’s GitHub unit today made Copilot for Business, its AI code generation service, generally available.

TensorFlow Decision Forests is production ready

TensorFlow announces the production-ready version of TensorFlow Decision Forests.

MLOps

An End-to-End Guide on Using Comet ML’s Model Versioning Feature

A two-part guide to using Comet ML for model registration and dealing with different model versions.

The Best GPUs for Deep Learning in 2023

An in-depth analysis of GPUs, including the Ampere series from NVIDIA, for deep learning.

Pipeline Parallelism - DeepSpeed

A tutorial that provides the first steps for DeepSpeed pipeline parallel training for an AlexNet model, as well as more advanced topics.

Learning

Graph Machine Learning Explainability with PyG

A step-by-step blog post that goes through the explainability module of PyG, shedding light on how each component of the framework works and what purpose it serves.

Constructing and Visualizing DataGrids in Kangas

A deep-dive into the DataGrid function in Kangas, an open-source computer vision tool for large-scale dataset exploration.

How Instacart Uses Embeddings to Improve Search Relevance

In this blog post, Instacart introduces ITEMS (the Instacart Transformer-based Embedding Model for Search), a deep learning model that fills in the gap by creating a dense, unified representation of search queries and products.

Towards Geometric Deep Learning

A comprehensive study on how Geometric Deep Learning ideas have emerged through history from ancient Greek geometry to Graph Neural Networks.

Zero-shot image-to-text generation with BLIP-2

This guide introduces BLIP-2 from Salesforce Research that enables a suite of state-of-the-art visual-language models that are now available in Hugging Face Transformers.

Libraries & Code

ScottfreeLLC/AlphaPy

A library for Automated Machine Learning with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost.

ritchieng/the-incredible-pytorch

A curated list of tutorials, papers, projects, communities and more relating to PyTorch.

torchsynth/torchsynth

A GPU-optional modular synthesizer in PyTorch, 16200x faster than realtime, for audio ML researchers.

Papers & Publications

Adding Conditional Control to Text-to-Image Diffusion Models

Abstract:

We present a neural network structure, ControlNet, to control pre-trained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related application

Symbolic Discovery of Optimization Algorithms

Abstract:

We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. We leverage efficient search techniques to explore an infinite and sparse program space. To bridge the large generalization gap between proxy and target tasks, we also introduce program selection and simplification strategies. Our method discovers a simple and effective optimization algorithm, Lion (EvoLved Sign Momentum). It is more memory-efficient than Adam as it only keeps track of the momentum. Different from adaptive optimizers, its update has the same magnitude for each parameter calculated through the sign operation. We compare Lion with widely used optimizers, such as Adam and Adafactor, for training a variety of models on different tasks. On image classification, Lion boosts the accuracy of ViT by up to 2% on ImageNet and saves up to 5x the pre-training compute on JFT. On vision-language contrastive learning, we achieve 88.3% zero-shot and 91.1% fine-tuning accuracy on ImageNet, surpassing the previous best results by 2% and 0.1%, respectively. On diffusion models, Lion outperforms Adam by achieving a better FID score and reducing the training compute by up to 2.3x. For autoregressive, masked language modeling, and fine-tuning, Lion exhibits a similar or better performance compared to Adam. Our analysis of Lion reveals that its performance gain grows with the training batch size. It also requires a smaller learning rate than Adam due to the larger norm of the update produced by the sign function. Additionally, we examine the limitations of Lion and identify scenarios where its improvements are small or not statistically significant. The implementation of Lion is publicly available.

FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

Abstract:

We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese. Source documents are selected to enable detailed analysis of phenomena of interest, including lexically distinct terms and distractor terms. We explore automatic evaluation metrics for FRMT and validate their correlation with expert human evaluation across both region-matched and mismatched rating scenarios. Finally, we present a number of baseline models for this task, and offer guidelines for how researchers can train, evaluate, and compare their own models.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Discussion about this post

Ready for more?