Deep Learning Weekly: Issue #234
Meta's AI Research SuperCluster, end-to-end feature stores using SageMaker, Macaw: a general QA model that challenges GPT-3, a paper on Language Models for Dialog Applications, and more
This week in deep learning, we bring you Meta's AI Research SuperCluster, a repository on end-to-end feature stores using SageMaker, Macaw: a general question answering model that challenges GPT-3 in terms of size and performance, and a paper on Language Models for Dialog Applications.
You may also enjoy NVIDIA's hybrid unsupervised neural rendering pipeline for 3D artists, a technical series on cloud-agnostic MLOps, a blog on supercharged searching on the Hugging Face Hub, a paper on Omnivore models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Meta is announcing the AI Research SuperCluster (RSC) — which they believe is among the fastest AI supercomputers running today and will be the fastest AI supercomputer in the world when it’s fully built out in mid-2022.
Scientists at NVIDIA and Cornell University introduced a hybrid unsupervised neural rendering pipeline to represent large and complex scenes efficiently in voxel worlds. 3D artists only need to build the bare minimum, and the algorithm will do the rest for a realistic world.
Scientists demonstrate that AI-risk models, paired with AI-designed screening policies, can offer significant and equitable improvements to cancer screening.
An AI system known as Dabus (device for the autonomous bootstrapping of unified sentience) was named one of the creators of a food container.
Amazon announced its first Amazon Style store, which is a new physical store concept infused with technology ranging from QR codes to artificial intelligence.
A company called Canary Speech is using deep learning to analyze short voice samples for signs of Alzheimer’s and other conditions.
A repository of different modules and notebooks on how to create a feature store using Amazon SageMaker.
A practical blog that takes the reader through the entire journey of MLOps, its containerized architecture, and the cloud-agnostic process flow.
A technical blog post on how to run tests, using assert, pandera, pytest, and hypothesis, specifically for data practitioners.
A six-part SageMaker series on how to build the architectural components underpinning each of the machine learning (ML) life-cycle phases using a detailed music recommendation example.
An article discussing what bottlenecks are typically observed with recommender workloads in practice, and how they can be identified and alleviated using TensorFlow 2.7’s GPU implementations.
An article highlighting the new features added to huggingface_hub, which provides users with a friendly API to search for the models and datasets they want to use without leaving their Jupyter or Python interfaces.
A documented experiment that focuses on training with a custom PyTorch Perceptual Feature Loss, converting to ONNX, and experimenting with model architectures using PyTorch Lightning.
A short blog (with code) on classification-by-retrieval, which provides an easy way to create a classifier without computationally expensive training via backpropagation.
Libraries & Code
Macaw is a ready-to-use model capable of general question answering, showing robustness outside the domains it was trained on. This model challenges GPT-3 with its size and performance.
An integrated large-scale model training system with efficient parallelization techniques.
Papers & Publications
We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency.
Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data. Instead, in this paper, we propose a single model which excels at classifying images, videos, and single-view 3D data using exactly the same model parameters. Our 'Omnivore' model leverages the flexibility of transformer-based architectures and is trained jointly on classification tasks from different modalities. Omnivore is simple to train, uses off-the-shelf standard datasets, and performs at-par or better than modality-specific models of the same size. A single Omnivore model obtains 86.0% on ImageNet, 84.1% on Kinetics, and 67.1% on SUN RGB-D. After finetuning, our models outperform prior work on a variety of vision tasks and generalize across modalities. Omnivore's shared visual representation naturally enables cross-modal recognition without access to correspondences between modalities. We hope our results motivate researchers to model visual modalities together.