Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #289
An AI racing agent for Gran Turismo 7, an in-depth analysis of GPUs for Deep Learning in 2023, Towards Geometric Deep Learning, a paper on adding Conditional Control to Text-to-image Diffusion Models
This week in deep learning, we bring you Sony AI and Polyphony Digital's AI racing agent for Gran Turismo 7, an in-depth analysis of GPUs for Deep Learning in 2023, Towards Geometric Deep Learning, and a paper on adding Conditional Control to Text-to-image Diffusion Models.
You may also enjoy production-ready TensorFlow Decision Forests, a tutorial for Pipeline Parallelism, zero-shot image-to-text generation with BLIP-2, a paper on Symbolic Discovery of Optimization Algorithms, and more.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
A timely McKinsey blog that highlights and breaks down the possible use cases that ChatGPT has for businesses.
Conservation AI detects threats to pangolins, rhinos and more than 50 other species in real time, using the NVIDIA Jetson platform and Triton Inference Server.
Sony AI and Polyphony Digital have released Gran Turismo Sophy, a hyper-realistic AI racing agent for Gran Turismo 7.
OpenAI is launching an initiative to provide more transparency about ChatGPT.
Search startup You.com has debuted an artificial intelligence chatbot that can provide natural language answers to user queries, as well as surface related items such as code snippets and graphs.
Microsoft ’s GitHub unit today made Copilot for Business, its AI code generation service, generally available.
TensorFlow announces the production-ready version of TensorFlow Decision Forests.
A two-part guide to using Comet ML for model registration and dealing with different model versions.
An in-depth analysis of GPUs, including the Ampere series from NVIDIA, for deep learning.
A tutorial that provides the first steps for DeepSpeed pipeline parallel training for an AlexNet model, as well as more advanced topics.
A step-by-step blog post that goes through the explainability module of PyG, shedding light on how each component of the framework works and what purpose it serves.
A deep-dive into the DataGrid function in Kangas, an open-source computer vision tool for large-scale dataset exploration.
In this blog post, Instacart introduces ITEMS (the Instacart Transformer-based Embedding Model for Search), a deep learning model that fills in the gap by creating a dense, unified representation of search queries and products.
A comprehensive study on how Geometric Deep Learning ideas have emerged through history from ancient Greek geometry to Graph Neural Networks.
This guide introduces BLIP-2 from Salesforce Research that enables a suite of state-of-the-art visual-language models that are now available in Hugging Face Transformers.
Libraries & Code
A library for Automated Machine Learning with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost.
A curated list of tutorials, papers, projects, communities and more relating to PyTorch.
A GPU-optional modular synthesizer in PyTorch, 16200x faster than realtime, for audio ML researchers.
Papers & Publications
We present a neural network structure, ControlNet, to control pre-trained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related application
We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. We leverage efficient search techniques to explore an infinite and sparse program space. To bridge the large generalization gap between proxy and target tasks, we also introduce program selection and simplification strategies. Our method discovers a simple and effective optimization algorithm, Lion (EvoLved Sign Momentum). It is more memory-efficient than Adam as it only keeps track of the momentum. Different from adaptive optimizers, its update has the same magnitude for each parameter calculated through the sign operation. We compare Lion with widely used optimizers, such as Adam and Adafactor, for training a variety of models on different tasks. On image classification, Lion boosts the accuracy of ViT by up to 2% on ImageNet and saves up to 5x the pre-training compute on JFT. On vision-language contrastive learning, we achieve 88.3% zero-shot and 91.1% fine-tuning accuracy on ImageNet, surpassing the previous best results by 2% and 0.1%, respectively. On diffusion models, Lion outperforms Adam by achieving a better FID score and reducing the training compute by up to 2.3x. For autoregressive, masked language modeling, and fine-tuning, Lion exhibits a similar or better performance compared to Adam. Our analysis of Lion reveals that its performance gain grows with the training batch size. It also requires a smaller learning rate than Adam due to the larger norm of the update produced by the sign function. Additionally, we examine the limitations of Lion and identify scenarios where its improvements are small or not statistically significant. The implementation of Lion is publicly available.
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese. Source documents are selected to enable detailed analysis of phenomena of interest, including lexically distinct terms and distractor terms. We explore automatic evaluation metrics for FRMT and validate their correlation with expert human evaluation across both region-matched and mismatched rating scenarios. Finally, we present a number of baseline models for this task, and offer guidelines for how researchers can train, evaluate, and compare their own models.