Deep Learning Weekly: Issue #283
The world's first AI-powered lawyer, the evolution and core challenges of self-serve feature platforms, GraphML in 2023: The State of Affairs, & a paper on a generalist neural algorithmic learner.
Hey Folks,
This week in deep learning, we bring you the world's first AI-powered lawyer, the evolution and core challenges of self-serve feature platforms, GraphML in 2023: The State of Affairs, and a paper on a generalist neural algorithmic learner.
You may also enjoy BioNTech acquires InstaDeep for £562M, tracking VAE experiments using Comet in Pythae, building visual search engines, a paper on curation in training for effective vision-language data, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
'Robot lawyer' powered by AI will help fight speeding ticket as it takes first case in court
The AI billed as “the world’s first robot lawyer” by the startup that created it, DoNotPay, will run on a smartphone and listen to court arguments in real-time before telling the defendant what to say via headphones.
Death of the narrator? Apple unveils suite of AI-voiced audiobooks
Apple has quietly launched a catalogue of books narrated by artificial intelligence in a move that may mark the beginning of the end for human narrators.
BioNTech acquires Tunisian-born and U.K-based AI startup InstaDeep for £562M
German-based biotech company BioNTech SE is set to acquire InstaDeep, a Tunis-born and U.K.-based artificial intelligence startup InstaDeep for up to £562 million in its largest deal yet.
A new study by the Jacobs University Bremen CGMBH reported DeepLandforms, the first pre-release of landform mapping tools using Deep Learning.
MLOps
Cookiecutter MLOps – A production-focused ML project template
An article that reviews Shreya Shankar’s practical principles for MLOps, which comes with a cookie-cutter project template in the form of a DagsHub repo.
A technical yet succinct introduction to tracking experiments with Comet in Pythae, a library that gathers commonly used VAE models.
Bringing Machine Learning to Production at Ubisoft
An article that explores the journey of bringing machine learning from research and development to production at Ubisoft, a leading global video game company.
Self-serve feature platforms: architectures and APIs
A blog post that discusses the evolution of feature platforms and the core challenges of making these platforms self-serve for data scientists.
Learning
Graph ML in 2023: The State of Affairs
An article that reviews the 2022 achievements in GraphML and hypothesizes the breakthroughs in 2023.
Building Visual Search Engines with Kuba Cieślik
A podcast-turned-article that comprehensively covers how to build visual search engines.
Unit Testing Machine Learning Code in Ludwig and PyTorch: Tests for Gradient Updates
An article that shows a mechanism for testing the updates of the weights to Ludwig, how it improved code quality, and how it can be used by the PyTorch community.
In this post, we explore common design patterns for building ML applications on Amazon SageMaker.
Libraries & Code
nnAudio is an audio processing toolbox using PyTorch convolutional neural network as its backend.
An easy-to-use NLP development and application toolkit in PyTorch, first released inside Alibaba in 2021.
Papers & Publications
A Generalist Neural Algorithmic Learner
Abstract:
The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalizes out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.
CiT: Curation in Training for Effective Vision-Language Data
Abstract:
Large vision-language models are generally applicable to many downstream tasks, but come at an exorbitant training cost that only large institutions can afford. This paper trades generality for efficiency and presents Curation in Training (CiT), a simple and efficient vision-text learning algorithm that couples a data objective into training. CiT automatically yields quality data to speed-up contrastive image-text training and alleviates the need for an offline data filtering pipeline, allowing broad data sources (including raw image-text pairs from the web). CiT contains two loops: an outer loop curating the training data and an inner loop consuming the curated training data. The text encoder connects the two loops. Given metadata for tasks of interest, e.g., class names, and a large pool of image-text pairs, CiT alternatively selects relevant training data from the pool by measuring the similarity of their text embeddings and embeddings of the metadata. In our experiments, we observe that CiT can speed up training by over an order of magnitude, especially if the raw data size is large.
HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling
Abstract:
Volumetric scene representations enable photorealistic view synthesis for static scenes and form the basis of several existing 6-DoF video techniques. However, the volume rendering procedures that drive these representations necessitate careful trade-offs in terms of quality, rendering speed, and memory efficiency. In particular, existing methods fail to simultaneously achieve real-time performance, small memory footprint, and high-quality rendering for challenging real-world scenes. To address these issues, we present HyperReel -- a novel 6-DoF video representation. The two core components of HyperReel are: (1) a ray-conditioned sample prediction network that enables high-fidelity, high frame rate rendering at high resolutions and (2) a compact and memory efficient dynamic volume representation. Our 6-DoF video pipeline achieves the best performance compared to prior and contemporary approaches in terms of visual quality with small memory requirements, while also rendering at up to 18 frames-per-second at megapixel resolution without any custom CUDA code.