Deep Learning Weekly: Issue #273
Meta AI's AI-powered audio hypercompression, running multiple models with SageMaker's multi-model endpoints, Airbnb's Listing Embedding technique in Search Ranking, and more.
Hey Folks,
This week in deep learning, we bring you Meta AI's AI-powered audio hypercompression called Encodec, running multiple models on GPU with SageMaker's multi-model endpoints, Airbnb's Listing Embedding technique in Search Ranking, and a paper on a transformer that solves tabular classification problems in a second.
You may also enjoy MIT's optics method for accelerating deep learning computations, unreasonable effectiveness of data pipeline smoke tests, meta analysis and optimal stopping at Netflix, a paper on poisson flow generative models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
A new method developed by MIT researchers uses optics to accelerate machine learning computations on smart speakers and other low-power connected devices.
Using AI to compress audio files for quick and easy sharing
Meta AI details progress that their Fundamental AI Research (FAIR) team has made in the area of AI-powered hypercompression of audio.
OpenAI reportedly in advanced talks to raise funding from Microsoft
OpenAI Inc. is reportedly seeking to raise a funding round from Microsoft Corp., which invested $1 billion in the artificial intelligence research group three years ago.
Federated Learning from Simulation to Production with NVIDIA FLARE
The release of NVIDIA’s FLARE 2.2, an open-source platform and SDK for Federated Learning, brings numerous updates that simplify the research and development workflow for researchers and data scientists.
Introducing Project Wisdom for Red Hat Ansible
IBM Research has built on its "AI for Code" effort to create Project Wisdom for Red Hat, making it easier to build automations for the hybrid cloud, using plain English.
MLOps
The Unreasonable Effectiveness of Data Pipeline Smoke Tests
A post that covers a powerful technique for speeding up data pipeline development: the data pipeline smoke test.
Logging Recommendation System Visualizations in Comet
In this tutorial, learn about the recommendation system, build a system for recommending movies, and then integrate it into Comet.
Modular Data Stack — Build a Data Platform with Prefect, dbt and Snowflake
A tutorial on how to solve common data platform challenges using Prefect, dbt, and Snowflake. In this post, you’ll learn about the desired outcome and how it can be implemented.
Run multiple deep learning models on GPU with Amazon SageMaker multi-model endpoints
An article that covers how to run multiple deep learning models on GPU with SageMaker MMEs.
Learning
Guide to Activation Functions in Neural Networks
This article discusses why activation functions are required in neural networks and the types of activation functions, including how to pick the best one for your model.
Listing Embeddings in Search Ranking
In this blog post, Airbnb describes a Listing Embedding technique they developed and deployed for the purpose of improving Similar Listing Recommendations and Real-Time Personalization in Search Ranking.
Improving Experimentation Efficiency at Netflix with Meta Analysis and Optimal Stopping
Netflix shares how they accelerate their experimentation workflow and product innovation using meta analysis and optimal stopping.
New number formats and basic computations emerge to speed up AI training.
Evaluating Language Model Bias with 🤗 Evaluate
In this blog post, HuggingFace presents a few examples of the new additions to the Evaluate library and how to use them. This focuses on the evaluation of causal language models.
Libraries & Code
timeseriesAI/tsai: Timeseries Deep Learning
An open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series tasks like classification, regression, forecasting, etc.
An open source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats.
jankrepl/deepdow: Portfolio optimization with deep learning.
A Python package connecting portfolio optimization and deep learning. Its goal is to facilitate research of networks that perform weight allocation in one forward pass.
Papers & Publications
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
Abstract:
We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods. TabPFN is fully entailed in the weights of our network, which accepts training and test samples as a set-valued input and yields predictions for the entire test set in a single forward pass. TabPFN is a Prior-Data Fitted Network (PFN) and is trained offline once, to approximate Bayesian inference on synthetic datasets drawn from our prior. This prior incorporates ideas from causal reasoning: It entails a large space of structural causal models with a preference for simple structures. On 30 datasets from the OpenML-CC18 suite, we show that our method clearly outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with up to 70× speedup. This increases to a 3200× speedup when a GPU is available.
Poisson Flow Generative Models
Abstract:
We propose a new "Poisson flow" generative model (PFGM) that maps a uniform distribution on a high-dimensional hemisphere into any data distribution. We interpret the data points as electrical charges on the z=0 hyperplane in a space augmented with an additional dimension z, generating a high-dimensional electric field (the gradient of the solution to Poisson equation). We prove that if these charges flow upward along electric field lines, their initial distribution in the z=0 plane transforms into a distribution on the hemisphere of radius r that becomes uniform in the r→∞ limit. To learn the bijective transformation, we estimate the normalized field in the augmented space. For sampling, we devise a backward ODE that is anchored by the physically meaningful additional dimension: the samples hit the unaugmented data manifold when the z reaches zero. Experimentally, PFGM achieves current state-of-the-art performance among the normalizing flow models on CIFAR-10, with an Inception score of 9.68 and a FID score of 2.35. It also performs on par with the state-of-the-art SDE approaches while offering 10× to 20× acceleration on image generation tasks. Additionally, PFGM appears more tolerant of estimation errors on a weaker network architecture and robust to the step size in the Euler method.
Abstract:
Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on methods for sharing parameters in isotropic networks (SPIN). We present a framework to formalize major weight sharing design decisions and perform a comprehensive empirical evaluation of this design space. Guided by our experimental results, we propose a weight sharing strategy to generate a family of models with better overall efficiency, in terms of FLOPs and parameters versus accuracy, compared to traditional scaling methods alone, for example compressing ConvMixer by 1.9x while improving accuracy on ImageNet. Finally, we perform a qualitative study to further understand the behavior of weight sharing in isotropic architectures.