Deep Learning Weekly: Issue #265
DeepMind's turtle facial recognition competition, reverse engineering the neural tangent kernel for architecture design, 8-bit matrix multiplication for transformers at scale, and more.
Hey Folks,
This week in deep learning, we bring you DeepMind's turtle facial recognition competition, reverse engineering the neural tangent kernel for architecture design, 8-bit matrix multiplication for transformers at scale, and a paper on audio-visual segmentation.
You may also enjoy Google's AI Test Kitchen, streaming with Kafka and BentoML, fast beam search decoding in PyTorch, a paper on perception using radiance fields, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Advancing conservation with AI-based facial recognition of turtles
DeepMind and Zindi just held a turtle facial recognition competition to improve turtle reidentification and support machine learning projects across Africa.
New AI Chip Twice As Energy Efficient As Alternatives
To tackle energy use issues, a team of researchers have developed a prototype of a new compute-in-memory (CIM) chip that is twice as efficient for AI workloads.
Google begins rolling out its AI Test Kitchen machine learning app
Google began rolling out its AI Test Kitchen app, which will enable users to interact with advanced neural networks developed by the search giant’s engineers.
Scientists are using machine learning to eavesdrop on naked mole rats, fruit bats, crows and whales — and to communicate back.
French tax officials use AI to spot 20,000 undeclared pools
French tax authorities using AI software, developed by Google and Capgemini, have found thousands of undeclared private swimming pools, landing the owners with bills totaling about €10m.
MLOps
Machine Learning Streaming with Kafka, Debezium, and BentoML
An exploration and tutorial on building a real-time price recommender system using tools such as BentoML, Postgres, Debezium, and Kafka.
Recommender Systems: Lessons From Building and Deployment
This article discusses practical considerations while building a recommender system, from dataset creation all the way to online MLOps.
Boosting AI Model Inference Performance on Azure Machine Learning
This post provides a step-by-step tutorial for boosting your AI inference performance on Azure Machine Learning using NVIDIA Triton Model Analyzer and ONNX Runtime OLive
Human-in-the-Loop Machine Learning
A deep dive on Human-in-the-Loop ML, an iterative process that combines human and machine components.
Learning
Reverse engineering the NTK: towards first-principles architecture design
BAIR proposes a paradigm for architecture design using recent theoretical breakthroughs: first design a good kernel function, and then “reverse-engineer” into a neural network.
Fast Beam Search Decoding in PyTorch with TorchAudio and Flashlight Text
A collection of articles on topics such as embeddings, language models, knowledge graphs, etc.
Accelerate Your Scripts with nvFuser
This tutorial will demonstrate how you can accelerate your networks with nvFuser, a deep learning compiler for PyTorch.
This article focuses on giving a high-level overview of Int8 inference, and outlining the difficulties in incorporating this quantization technology into the Hugging Face transformers library.
Explore Amazon SageMaker Data Wrangler capabilities with sample datasets
A technical walkthrough on the capabilities of Amazon SageMaker Data Wrangler and its flow.
Deploy a Language AI App Easily with Cohere and Streamlit
In this article, we’ll look at how we can quickly prototype a Startup Idea Generator web app with Cohere and Streamlit.
Libraries & Code
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
Alpa is a system for auto-parallelization training and serving large-scale neural networks.
Papers & Publications
PeRFception: Perception using Radiance Fields
Abstract:
The recent progress in implicit 3D representation, i.e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner. This new representation can effectively convey the information of hundreds of high-resolution images in one compact format and allows photorealistic synthesis of novel views. In this work, using the variant of NeRF called Plenoxels, we create the first large-scale implicit representation datasets for perception tasks, called the PeRFception, which consists of two parts that incorporate both object-centric and scene-centric scans for classification and segmentation. It shows a significant memory compression rate (96.4\%) from the original dataset, while containing both 2D and 3D information in a unified form. We construct the classification and segmentation models that directly take as input this implicit format and also propose a novel augmentation technique to avoid overfitting on backgrounds of images.
Abstract:
We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark (AVSBench), providing pixel-wise annotations for the sounding objects in audible videos. Two settings are studied with this benchmark: 1) semi-supervised audio-visual segmentation with a single sound source and 2) fully-supervised audio-visual segmentation with multiple sound sources. To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process. We also design a regularization loss to encourage the audio-visual mapping during training. Quantitative and qualitative experiments on the AVSBench compare our approach to several existing methods from related tasks, demonstrating that the proposed method is promising for building a bridge between the audio and pixel-wise visual semantics.
Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments
Abstract:
We introduce a novel federated learning framework, FedD3, which reduces the overall communication volume and with that opens up the concept of federated learning to more application scenarios in network-constrained environments. It achieves this by leveraging local dataset distillation instead of traditional learning approaches (i) to significantly reduce communication volumes and (ii) to limit transfers to one-shot communication, rather than iterative multiway communication. Instead of sharing model updates, as in other federated learning approaches, FedD3 allows the connected clients to distill the local datasets independently, and then aggregates those decentralized distilled datasets (typically in the form a few unrecognizable images, which are normally smaller than a model) across the network only once to form the final model. Our experimental results show that FedD3 significantly outperforms other federated learning frameworks in terms of needed communication volumes, while it provides the additional benefit to be able to balance the trade-off between accuracy and communication cost, depending on usage scenario or target dataset. For instance, for training an AlexNet model on a Non-IID CIFAR-10 dataset with 10 clients, FedD3 can either increase the accuracy by over 71% with a similar communication volume, or save 98% of communication volume, while reaching the same accuracy, comparing to other one-shot federated learning approaches.