Deep Learning Weekly: Issue #284
Microsoft Azure OpenAI service, Federated Learning on AWS with FedML, Large Transformer Inference Optimizations, a paper on VALL-E, and many more.
Hey Folks,
This week in deep learning, we bring you Microsoft Azure OpenAI service, Federated Learning on AWS with FedML, Large Transformer Inference Optimizations, and a paper on VALL-E: Neural Codec Language Models are Zero-shot Text to Speech Synthesizers.
You may also enjoy forecasting potential misuses of language models, autoscaling NVIDIA Riva for speech AI in production, PyTorch internals, a paper on Tracr: Compiled Transformers as a Laboratory for Interpretability, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Microsoft Azure OpenAI service now generally available, with ChatGPT on the way
Microsoft announced the general availability of Azure OpenAI Service, which allows businesses to power their apps with large-scale AI models, including GPT-3.5, DALL-E 2, and Codex.
HPE acquires AI startup Pachyderm
Hewlett Packard Enterprise is acquiring Pachyderm, a startup with a software platform designed to speed up artificial intelligence projects by automating data workflows.
NVIDIA, Evozyne Create Generative AI Model for Proteins
Using NVIDIA’s BioNeMo, startup Evozyne created two proteins with significant potential in healthcare and clean energy.
Forecasting Potential Misuses of Language Models for Disinformation Campaigns—and How to Reduce Risk
OpenAI researchers collaborated with Georgetown University’s Center for Security and Emerging Technology and the Stanford Internet Observatory to investigate how large language models might be misused for disinformation purposes.
MLOps
Data-Driven AI in Manufacturing: Casting Defect Identification using Active Learning
An article that shows how to build an active learning pipeline for casting defect identification.
Bag of Tricks for Optimizing Machine Learning Training Pipelines
Ntropy shares some of their techniques for speeding up training, improving the ML engineering experience, and keeping costs under control.
Autoscaling NVIDIA Riva Deployment with Kubernetes for Speech AI in Production
This post walks you through, step by step, how to deploy Riva servers on a large scale with Kubernetes for autoscaling and Traefik for load balancing.
Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 1
A post that covers how you can deploy the open-source FedML framework on AWS.
Learning
Large Transformer Model Inference Optimization
A comprehensive and mathematical article that looks into several approaches for making transformer inference more efficient.
An in-depth article that goes through the underlying code of PyTorch.
Build a GitHub support bot with GPT3, LangChain, and Python
An article that shows how to build a GitHub support bot using Dagster, GPT3, and LangChain.
Libraries & Code
TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.
AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.
Papers & Publications
VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Abstract:
We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems. VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt. Experiment results show that VALL-E significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity. In addition, we find VALL-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.
Tracr: Compiled Transformers as a Laboratory for Interpretability
Abstract:
Interpretability research aims to build tools for understanding machine learning (ML) models. However, such tools are inherently hard to evaluate because we do not have ground truth information about how ML models actually work. In this work, we propose to build transformer models manually as a testbed for interpretability research. We introduce Tracr, a "compiler" for translating human-readable programs into weights of a transformer model. Tracr takes code written in RASP, a domain-specific language (Weiss et al. 2021), and translates it into weights for a standard, decoder-only, GPT-like transformer architecture. We use Tracr to create a range of ground truth transformers that implement programs including computing token frequencies, sorting, and Dyck-n parenthesis checking, among others.
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
Abstract:
We identify and overcome two key obstacles in extending the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets): (i) convolution operation cannot handle irregular, random-masked input images; (ii) the single-scale nature of BERT pre-training is inconsistent with convnet's hierarchical structure. For (i), we treat unmasked pixels as sparse voxels of 3D point clouds and use sparse convolution to encode. This is the first use of sparse convolution for 2D masked modeling. For (ii), we develop a hierarchical decoder to reconstruct images from multi-scale encoded features. Our method called Sparse masKed modeling (SparK) is general: it can be used directly on any convolutional model without backbone modifications. We validate it on both classical (ResNet) and modern (ConvNeXt) models: on three downstream tasks, it surpasses both state-of-the-art contrastive learning and transformer-based masked modeling by similarly large margins (around +1.0%). Improvements on object detection and instance segmentation are more substantial (up to +3.5%), verifying the strong transferability of features learned. We also find its favorable scaling behavior by observing more gains on larger models.