Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #273
Meta AI's AI-powered audio hypercompression, running multiple models with SageMaker's multi-model endpoints, Airbnb's Listing Embedding technique in Search Ranking, and more.
This week in deep learning, we bring you Meta AI's AI-powered audio hypercompression called Encodec, running multiple models on GPU with SageMaker's multi-model endpoints, Airbnb's Listing Embedding technique in Search Ranking, and a paper on a transformer that solves tabular classification problems in a second.
You may also enjoy MIT's optics method for accelerating deep learning computations, unreasonable effectiveness of data pipeline smoke tests, meta analysis and optimal stopping at Netflix, a paper on poisson flow generative models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
A new method developed by MIT researchers uses optics to accelerate machine learning computations on smart speakers and other low-power connected devices.
Meta AI details progress that their Fundamental AI Research (FAIR) team has made in the area of AI-powered hypercompression of audio.
OpenAI Inc. is reportedly seeking to raise a funding round from Microsoft Corp., which invested $1 billion in the artificial intelligence research group three years ago.
The release of NVIDIA’s FLARE 2.2, an open-source platform and SDK for Federated Learning, brings numerous updates that simplify the research and development workflow for researchers and data scientists.
IBM Research has built on its "AI for Code" effort to create Project Wisdom for Red Hat, making it easier to build automations for the hybrid cloud, using plain English.
A post that covers a powerful technique for speeding up data pipeline development: the data pipeline smoke test.
In this tutorial, learn about the recommendation system, build a system for recommending movies, and then integrate it into Comet.
A tutorial on how to solve common data platform challenges using Prefect, dbt, and Snowflake. In this post, you’ll learn about the desired outcome and how it can be implemented.
An article that covers how to run multiple deep learning models on GPU with SageMaker MMEs.
This article discusses why activation functions are required in neural networks and the types of activation functions, including how to pick the best one for your model.
In this blog post, Airbnb describes a Listing Embedding technique they developed and deployed for the purpose of improving Similar Listing Recommendations and Real-Time Personalization in Search Ranking.
Netflix shares how they accelerate their experimentation workflow and product innovation using meta analysis and optimal stopping.
New number formats and basic computations emerge to speed up AI training.
In this blog post, HuggingFace presents a few examples of the new additions to the Evaluate library and how to use them. This focuses on the evaluation of causal language models.
Libraries & Code
An open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series tasks like classification, regression, forecasting, etc.
An open source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats.
A Python package connecting portfolio optimization and deep learning. Its goal is to facilitate research of networks that perform weight allocation in one forward pass.
Papers & Publications
We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods. TabPFN is fully entailed in the weights of our network, which accepts training and test samples as a set-valued input and yields predictions for the entire test set in a single forward pass. TabPFN is a Prior-Data Fitted Network (PFN) and is trained offline once, to approximate Bayesian inference on synthetic datasets drawn from our prior. This prior incorporates ideas from causal reasoning: It entails a large space of structural causal models with a preference for simple structures. On 30 datasets from the OpenML-CC18 suite, we show that our method clearly outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with up to 70× speedup. This increases to a 3200× speedup when a GPU is available.
We propose a new "Poisson flow" generative model (PFGM) that maps a uniform distribution on a high-dimensional hemisphere into any data distribution. We interpret the data points as electrical charges on the z=0 hyperplane in a space augmented with an additional dimension z, generating a high-dimensional electric field (the gradient of the solution to Poisson equation). We prove that if these charges flow upward along electric field lines, their initial distribution in the z=0 plane transforms into a distribution on the hemisphere of radius r that becomes uniform in the r→∞ limit. To learn the bijective transformation, we estimate the normalized field in the augmented space. For sampling, we devise a backward ODE that is anchored by the physically meaningful additional dimension: the samples hit the unaugmented data manifold when the z reaches zero. Experimentally, PFGM achieves current state-of-the-art performance among the normalizing flow models on CIFAR-10, with an Inception score of 9.68 and a FID score of 2.35. It also performs on par with the state-of-the-art SDE approaches while offering 10× to 20× acceleration on image generation tasks. Additionally, PFGM appears more tolerant of estimation errors on a weaker network architecture and robust to the step size in the Euler method.
Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on methods for sharing parameters in isotropic networks (SPIN). We present a framework to formalize major weight sharing design decisions and perform a comprehensive empirical evaluation of this design space. Guided by our experimental results, we propose a weight sharing strategy to generate a family of models with better overall efficiency, in terms of FLOPs and parameters versus accuracy, compared to traditional scaling methods alone, for example compressing ConvMixer by 1.9x while improving accuracy on ImageNet. Finally, we perform a qualitative study to further understand the behavior of weight sharing in isotropic architectures.