Deep Learning Weekly: Issue #260
NVIDIA Canvas app launches in beta, Meta AI's library for differentiable nonlinear optimization, mixed precision training, a paper on generative multiplane images, and many more
Hey Folks,
This week in deep learning, we bring you NVIDIA Canvas app launches in beta, Meta AI's library for differentiable nonlinear optimization, mixed precision training, and a paper on generative multiplane images.
You may also enjoy DALL-E now available in beta, ML pipelines with Azure Machine Learning, visualizing your embeddings, a paper on multi-game decision transformers, and more.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
NVIDIA Canvas App Launches in Beta
The NVIDIA Canvas app, now available as a free beta, brings the real-time painting tool GauGAN to anyone with an NVIDIA RTX GPU.
New 20+ pipeline operators for BQML
Google announces the release of over twenty new BigQuery and BigQuery ML (BQML) operators for Vertex AI Pipelines.
OpenAI is inviting 1 million people from their waitlist, over the coming weeks, to create using the AI system.
Lunit, Maker of FDA-Cleared AI for Cancer Analysis, Goes Public
South Korean startup Lunit, developer of two FDA-cleared AI models for healthcare, went public this week on the country’s Kosdaq stock market.
Harvard Developed AI Identifies the Shortest Path to Human Happiness
Deep Longevity, in collaboration with Harvard Medical School, presents a deep learning approach that offers superior personalization and identifies the shortest path toward a cluster of mental stability for any individual.
MLOps
Deploying TensorFlow Vision Models in Hugging Face with TF Serving
In this post, you'll see how to deploy a Vision Transformer model (for image classification) locally using TensorFlow Serving.
This post takes a closer look at the motivation behind having an automated process to track experiments with Amazon SageMaker Experiments and the native capabilities built into Amazon SageMaker Pipelines.
ML pipelines with Python SDK v2 (preview) - Azure Machine Learning
This tutorial uses Azure Machine Learning to create a production ready machine learning project, using AzureML Python SDK v2.
Learning
Rethinking Thinking: How Do Attention Mechanisms Actually Work?
A comprehensive article synthesizing the attention mechanisms present in the brain, mathematics, and deep learning.
An evolutionary guide and evaluation of the different techniques for visualizing embeddings.
Parallel Programming for Training and Productionization of ML/AI Systems
This article goes in-depth on the fundamentals of parallel processing, how it may be used in ML/AI systems, and how it can be used for productionization.
A deep learning tutorial covering mixed precision training, the hardware required to take advantage of such computational capability, and the advantages of using mixed precision training in detail.
Image Augmentation with Keras Preprocessing Layers and tf.image
BIn this post you will discover how you can use the Keras preprocessing layer, as well as the tf.image module in TensorFlow for image augmentation.
Libraries & Code
CogVideo - a Hugging Face Space by THUDM
A Hugging Face Space for the limited implementation of CogVideo, a text-to-video model.
facebookresearch/theseus: A library for differentiable nonlinear optimization
Theseus is an efficient application-agnostic library for building custom nonlinear optimization layers in PyTorch to support constructing various problems in robotics and vision as end-to-end differentiable architectures.
A multi-view dataset of multiple identities performing a sequence of facial expressions.
Papers & Publications
DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection
Abstract:
Modern neural networks use building blocks such as convolutions that are equivariant to arbitrary 2D translations. However, these vanilla blocks are not equivariant to arbitrary 3D translations in the projective manifold. Even then, all monocular 3D detectors use vanilla blocks to obtain the 3D coordinates, a task for which the vanilla blocks are not designed for. This paper takes the first step towards convolutions equivariant to arbitrary 3D translations in the projective manifold. Since the depth is the hardest to estimate for monocular detection, this paper proposes Depth EquiVarIAnt NeTwork (DEVIANT) built with existing scale equivariant steerable blocks. As a result, DEVIANT is equivariant to the depth translations in the projective manifold whereas vanilla networks are not. The additional depth equivariance forces the DEVIANT to learn consistent depth estimates, and therefore, DEVIANT achieves state-of-the-art monocular 3D detection results on KITTI and Waymo datasets in the image-only category and performs competitively to methods using extra information. Moreover, DEVIANT works better than vanilla networks in cross-dataset evaluation.
Generative Multiplane Images: Making a 2D GAN 3D-Aware
Abstract:
What is really needed to make an existing 2D GAN 3D-aware? To answer this question, we modify a classical GAN, i.e., StyleGANv2, as little as possible. We find that only two modifications are absolutely necessary: 1) a multiplane image style generator branch which produces a set of alpha maps conditioned on their depth; 2) a pose-conditioned discriminator. We refer to the generated output as a 'generative multiplane image' (GMPI) and emphasize that its renderings are not only high-quality but also guaranteed to be view-consistent, which makes GMPIs different from many prior works. Importantly, the number of alpha maps can be dynamically adjusted and can differ between training and inference, alleviating memory concerns and enabling fast training of GMPIs in less than half a day at a resolution of 10242. Our findings are consistent across three challenging and common high-resolution datasets, including FFHQ, AFHQv2, and MetFaces.
Multi-Game Decision Transformers
Abstract:
A longstanding goal of the field of AI is a strategy for compiling diverse experience into a highly capable, generalist agent. In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets. Motivated by this progress, we investigate whether the same strategy can be used to produce generalist reinforcement learning agents. Specifically, we show that a single transformer-based model - with a single set of weights - trained purely offline can play a suite of up to 46 Atari games simultaneously at close-to-human performance. When trained and evaluated appropriately, we find that the same trends observed in language and vision hold, including scaling of performance with model size and rapid adaptation to new games via fine-tuning. We compare several approaches in this multi-game setting, such as online and offline RL methods and behavioral cloning, and find that our Multi-Game Decision Transformer models offer the best scalability and performance. We release the pre-trained models and code to encourage further research in this direction.