Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue # 248
DeepMind's single visual language model called Flamingo, experiment tracking in KubeFlow pipelines, large model training using PyTorch FSDP, a paper on faster Text-to-Image Generation and more
This week in deep learning, we bring you DeepMind's single visual language model called Flamingo, experiment tracking in KubeFlow pipelines, large model training using PyTorch FSDP, and a paper on faster and better Text-to-Image Generation via Hierarchical Transformers.
You may also enjoy ETH Zurich's model that generated the first global vegetation height map, feature stores for real-time machine learning, deep learning in neuroimaging, a paper on dual networks , and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Meta AI announced a long-term research initiative to better understand how the human brain processes language. In collaboration with Neurospin and INRIA, they are comparing how language models and the brain respond to the same sentences.
DeepMind introduces Flamingo, a single visual language model (VLM) that sets a new state of the art in few-shot learning on a wide range of open-ended multimodal tasks.
MIT and IBM Research have created a way to collect and inspect the explanations an AI gives for its decisions, thus allowing a quick analysis of its behavior.
MIT engineers have found a new way to model how waves break. The team used machine learning along with data from wave-tank experiments to tweak equations that have traditionally been used to predict wave behavior.
ETH Zurich’s EcoVision Lab researchers produced an interactive Global Canopy Height map. Using a new deep learning algorithm that processes publicly available satellite images, the study could help scientists identify areas of ecosystem degradation and deforestation.
An article that introduces and describes some tools, including TensorBoard, MLFlow, and Neptune.ai, especially in using them with Kubeflow Pipelines, a popular framework that runs on Kubernetes.
An article that highlights how the choice of an online feature store, as well as the architecture of the feature store, play important roles in determining how performant and cost-effective it is.
A repository for the two-part series of blog posts on AWS Machine Learning Blog on secure multi-account deployment.
In this post, we look at best practices for deploying transformer models at scale on GPUs using Triton Inference Server on SageMaker.
Outerbounds released first-class support for Kubernetes as an alternative to AWS-native service integrations in Metaflow. Data scientists can scale out compute to Kubernetes clusters and schedule flows to be executed by Argo Workflows.
This three-part series will dive into the importance of model interpretability as this is an important element for both Data Scientists and stakeholders. Not only do we need to understand the importance of model interpretability, but we also need to understand the different types, and how we should approach interpreting our models using different methods and techniques.
This article provides an informal introduction to unique aspects of neuroimaging data and how we can leverage these aspects with deep learning algorithms.
A technical post that looks at how we can leverage Accelerate Library for training large models which enables users to leverage the latest features of PyTorch FullyShardedDataParallel (FSDP).
An article that discusses how to build transfer learning models in PyTorch with PyTorchCV..
Libraries & Code
TensorTrade is an open source Python framework for building, training, evaluating, and deploying robust trading algorithms using reinforcement learning.
A platform for building, training, and monitoring large scale deep learning applications.
Papers & Publications
Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning
We propose the integration of top-down and bottom-up approaches to exploit their strengths. Our top-down network estimates human joints from all persons instead of one in an image patch, making it robust to possible erroneous bounding boxes. Our bottom-up network incorporates human-detection based normalized heatmaps, allowing the network to be more robust in handling scale variations. Finally, the estimated 3D poses from the top-down and bottom-up networks are fed into our integration network for final 3D poses. To address the common gaps between training and testing data, we do optimization during the test time, by refining the estimated 3D human poses using high-order temporal constraint, re-projection loss, and bone length regularization. We also introduce a two-person pose discriminator that enforces natural two-person interactions. Finally, we apply a semi-supervised method to overcome the 3D ground-truth data scarcity.
The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel auto-regressive generation. We pre-train a 6B-parameter transformer with a simple and flexible self-supervised task, Cross-modal general language model (CogLM), and fine-tune it for fast super-resolution. The new text-to-image system, CogView2, shows very competitive generation compared to concurrent state-of-the-art DALL-E-2, and naturally supports interactive text-guided editing on images.