Deep Learning Weekly: Issue #283
The world's first AI-powered lawyer, the evolution and core challenges of self-serve feature platforms, GraphML in 2023: The State of Affairs, & a paper on a generalist neural algorithmic learner.
This week in deep learning, we bring you the world's first AI-powered lawyer, the evolution and core challenges of self-serve feature platforms, GraphML in 2023: The State of Affairs, and a paper on a generalist neural algorithmic learner.
You may also enjoy BioNTech acquires InstaDeep for £562M, tracking VAE experiments using Comet in Pythae, building visual search engines, a paper on curation in training for effective vision-language data, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
The AI billed as “the world’s first robot lawyer” by the startup that created it, DoNotPay, will run on a smartphone and listen to court arguments in real-time before telling the defendant what to say via headphones.
Apple has quietly launched a catalogue of books narrated by artificial intelligence in a move that may mark the beginning of the end for human narrators.
German-based biotech company BioNTech SE is set to acquire InstaDeep, a Tunis-born and U.K.-based artificial intelligence startup InstaDeep for up to £562 million in its largest deal yet.
A new study by the Jacobs University Bremen CGMBH reported DeepLandforms, the first pre-release of landform mapping tools using Deep Learning.
An article that reviews Shreya Shankar’s practical principles for MLOps, which comes with a cookie-cutter project template in the form of a DagsHub repo.
A technical yet succinct introduction to tracking experiments with Comet in Pythae, a library that gathers commonly used VAE models.
An article that explores the journey of bringing machine learning from research and development to production at Ubisoft, a leading global video game company.
A blog post that discusses the evolution of feature platforms and the core challenges of making these platforms self-serve for data scientists.
An article that reviews the 2022 achievements in GraphML and hypothesizes the breakthroughs in 2023.
A podcast-turned-article that comprehensively covers how to build visual search engines.
An article that shows a mechanism for testing the updates of the weights to Ludwig, how it improved code quality, and how it can be used by the PyTorch community.
In this post, we explore common design patterns for building ML applications on Amazon SageMaker.
Libraries & Code
nnAudio is an audio processing toolbox using PyTorch convolutional neural network as its backend.
An easy-to-use NLP development and application toolkit in PyTorch, first released inside Alibaba in 2021.
Papers & Publications
The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalizes out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.
Large vision-language models are generally applicable to many downstream tasks, but come at an exorbitant training cost that only large institutions can afford. This paper trades generality for efficiency and presents Curation in Training (CiT), a simple and efficient vision-text learning algorithm that couples a data objective into training. CiT automatically yields quality data to speed-up contrastive image-text training and alleviates the need for an offline data filtering pipeline, allowing broad data sources (including raw image-text pairs from the web). CiT contains two loops: an outer loop curating the training data and an inner loop consuming the curated training data. The text encoder connects the two loops. Given metadata for tasks of interest, e.g., class names, and a large pool of image-text pairs, CiT alternatively selects relevant training data from the pool by measuring the similarity of their text embeddings and embeddings of the metadata. In our experiments, we observe that CiT can speed up training by over an order of magnitude, especially if the raw data size is large.
Volumetric scene representations enable photorealistic view synthesis for static scenes and form the basis of several existing 6-DoF video techniques. However, the volume rendering procedures that drive these representations necessitate careful trade-offs in terms of quality, rendering speed, and memory efficiency. In particular, existing methods fail to simultaneously achieve real-time performance, small memory footprint, and high-quality rendering for challenging real-world scenes. To address these issues, we present HyperReel -- a novel 6-DoF video representation. The two core components of HyperReel are: (1) a ray-conditioned sample prediction network that enables high-fidelity, high frame rate rendering at high resolutions and (2) a compact and memory efficient dynamic volume representation. Our 6-DoF video pipeline achieves the best performance compared to prior and contemporary approaches in terms of visual quality with small memory requirements, while also rendering at up to 18 frames-per-second at megapixel resolution without any custom CUDA code.