Deep Learning Weekly: Issue #259
Meta's Make-A-Scene, AI Infrastructure Ecosystem Report of 2022, dynamic adversarial data collection, a paper on efficient representation learning via adaptive context pooling, and many more
This week in deep learning, we bring you Meta's multimodal generative method with higher creative control called Make-A-Scene, AI Infrastructure Ecosystem Report of 2022, dynamic adversarial data collection, and a paper on efficient representation learning via adaptive context pooling.
You may also enjoy Google at ICML 2022, an MLOps pipeline tutorial for Binance time-series prediction, the technology behind BLOOM training, a paper on object tracking unification, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Greater creative control for AI image generation
Meta AI showcases a multimodal generative method with higher creative control called Make-A-Scene, that demonstrates AI’s potential for empowering anyone to bring their imagination to life.
Google has a strong presence at this year’s conference with over 100 accepted publications and active involvement in a number of workshops and tutorials.
Reducing Bias and Improving Safety in DALL·E 2
OpenAI implemented a new technique so that DALL·E generates images of people that more accurately reflect the diversity of the world’s population.
Microsoft launches simulator to train drone AI systems
Microsoft has launched a platform to train the artificial intelligence systems of autonomous aircraft.
Building MLOps Pipeline for Time Series Prediction
A comprehensive tutorial on a Binance time-series-based machine learning project and its MLOps pipeline.
Achieve enterprise-grade monitoring for your Amazon SageMaker models using Fiddler
A post showing how your MLOps team can improve productivity and reduce time to detect issues for your SageMaker models by integrating with the Fiddler Model Performance Management Platform.
AI Infrastructure Ecosystem Report of 2022
AI Infrastructure Alliance’s first annual AI Infrastructure Ecosystem report highlighting stack maturity questions and other relevant information.
Feature Stores for MLOps Dummies
An article describing feature stores, and the top frameworks to use for deploying them.
Case Study: PathAI Uses PyTorch to Improve Patient Outcomes with AI-powered Pathology
A case study on the leading provider of AI-powered technology tools and services for pathology, and how they leverage image segmentation, graph neural networks, and multiple instance learning.
10 Open Source Machine Learning Libraries
A list of various machine learning libraries and how they’ve changed the machine learning landscape.
Fast Transformer Inference With Better Transformer
A technical tutorial showing how to use Better Transformer for production inference with torchtext.
How to train your model dynamically using adversarial data
A blog describing dynamic adversarial data collection, along with a basic code example.
Why do Policy Gradient Methods work so well in Cooperative MARL?
BAIR presents concrete analysis to show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, value decomposition can be problematic. By contrast, policy gradient methods with individual policies can converge to an optimal policy.
The Technology Behind BLOOM Training
An article that sheds some light on the technology and engineering behind the training, both in terms of hardware and software, of the 176B parameter language model called BLOOM.
Libraries & Code
An open source no-code system for text annotation and building text classifiers.
TensorFlow Lattice is a library that implements constrained and interpretable lattice based models. It is an implementation of Monotonic Calibrated Interpolated Look-Up Tables in TensorFlow.
Papers & Publications
Towards Grand Unification of Object Tracking
We present a unified method, termed Unicorn, that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters. Due to the fragmented definitions of the object tracking problem itself, most existing trackers are developed to address a single or part of tasks and overspecialize on the characteristics of specific tasks. By contrast, Unicorn provides a unified solution, adopting the same input, backbone, embedding, and head across all tracking tasks. For the first time, we accomplished the great unification of the tracking network architecture and learning paradigm. Unicorn performs on-par or better than its task-specific counterparts in eight tracking datasets, including LaSOT, TrackingNet, MOT17, BDD100K, DAVIS16-17, MOTS20, and BDD100K MOTS. We believe that Unicorn will serve as a solid step towards the general vision model.
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin model, we develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores: a rapidly updated sensory memory, a high-resolution working memory, and a compact thus sustained long-term memory. Crucially, we develop a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long-term prediction. Combined with a new memory reading mechanism, XMem greatly exceeds state-of-the-art performance on long-video datasets while being on par with state-of-the-art methods (that do not work on long videos) on short-video datasets.
Efficient Representation Learning via Adaptive Context Pooling
Self-attention mechanisms model long-range context by using pairwise attention between all input tokens. In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer. The pooling weights and support size are adaptively determined, allowing the pooled features to encode meaningful context with varying scale. We show that ContextPool makes attention models more expressive, achieving strong performance often with fewer layers and thus significantly reduced cost. Experiments validate that our ContextPool module, when plugged into transformer models, matches or surpasses state-of-the-art performance using less compute on several language and image benchmarks, outperforms recent works with learned context sizes or sparse attention patterns, and is also applicable to ConvNets for efficient feature learning.