Deep Learning Weekly: Issue #225
GPT-3’s public availability, a model that can diagnose glaucoma in a snap, TensorFlow Graph Neural Networks library, a paper on accelerating RNNs for gravitational wave experiments, and more
This week in deep learning, we bring you GPT-3's availability, a model that can diagnose glaucoma in 10 seconds, TensorFlow Graph Neural Networks library, and a paper on permutation-invariant neural networks for RL.
You may also enjoy on-device training (with reference Colab) using TensorFlow Lite, a primer on ML interpretability, a tutorial for fine-tuning XLSR for automatic speech recognition, a paper on accelerating recurrent neural networks for gravitational wave experiments, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
With the round, Comet plans to continue building out its enterprise ML platform that combines both experiment tracking and model production monitoring tools.
Developers in supported countries can now sign up and start experimenting with GPT-3.
A team of engineers and ophthalmologists in Australia has developed a novel approach using AI to diagnose glaucoma that can yield results in just 10 seconds.
Amazon Web Services Inc. has commissioned its first-ever major art piece, a site-specific sculpture powered by artificial intelligence and designed by artist and architect Suchi Reddy that will be the centerpiece of the Smithsonian’s “Futures” exhibit.
Google LLC’s cloud business will help Cohere Inc., an early-stage artificial intelligence startup, run natural language processing models in the cloud as part of a multiyear partnership.
NVIDIA-powered systems won four of five tests in MLPerf HPC 1.0, an industry benchmark for AI performance on scientific applications in high performance computing.
Two-i uses computer vision to help prevent deadly accidents in the oil and gas industry, one of the world’s most dangerous sectors.
Mobile & Edge
TensorFlow Lite now supports training your models on-device, in addition to running inference. The article includes a reference Colab and a short tutorial.
This tutorial demonstrates how to use Arm Streamline Performance Analyzer to profile an ML-based Android application.
A tinyML project that can automatically detect a dog’s barking sounds and subsequently play some familiar voices to calm it down.
A quick guide on how to make a MobileNet-based garbage sorting device using a Raspberry Pi 4 Model B.
An announcement of and tutorial for TensorFlow Graph Neural Networks (GNNs), a library designed to make it easy to work with graph structured data using TensorFlow.
A comprehensive and theoretical article on the definition, motivating need, landscape, limitations, and the future of ML/DNN interpretability.
A comprehensive walkthrough on how to train SOTA models using TorchVision’s latest ResNet primitives.
A technical tutorial and in-detail explanation of how XLS-R (a multilingual version of Wav2Vec2) can be fine-tuned for automatic speech recognition.
Libraries & Code
A self-supervised idiosyncratic pattern detection system that learns typical patterns that occur in the control structures of high-level programming languages, such as C/C++, by mining these patterns from open-source repositories.
An open-source framework for prompt-learning that supports loading PLMs directly from Hugging Face transformers.
A JAX-based package defining a coding framework for writing differentiable numerical simulators with arbitrary discretizations.
Papers & Publications
In complex systems, we often observe complex global behavior emerge from a collection of agents interacting with each other in their environment, with each individual agent acting only on locally available information, without knowing the full picture. Such systems have inspired development of artificial intelligence algorithms in areas such as swarm optimization and cellular automata. Motivated by the emergence of collective behavior from complex cellular systems, we build systems that feed each sensory input from the environment into distinct, but identical neural networks, each with no fixed relationship with one another. We show that these sensory networks can be trained to integrate information received locally, and through communication via an attention mechanism, can collectively produce a globally coherent policy. Moreover, the system can still perform its task even if the ordering of its inputs is randomly permuted several times during an episode. These permutation invariant systems also display useful robustness and generalization properties that are broadly applicable.
This paper presents novel reconfigurable architectures for reducing the latency of recurrent neural networks (RNNs) that are used for detecting gravitational waves. Gravitational interferometers such as the LIGO detectors capture cosmic events such as black hole mergers which happen at unknown times and of varying durations, producing time-series data. We have developed a new architecture capable of accelerating RNN inference for analyzing time-series data from LIGO detectors. This architecture is based on optimizing the initiation intervals (II) in a multi-layer LSTM (Long Short-Term Memory) network, by identifying appropriate reuse factors for each layer. A customizable template for this architecture has been designed, which enables the generation of low-latency FPGA designs with efficient resource utilization using high-level synthesis tools. The proposed approach has been evaluated based on two LSTM models, targeting a ZYNQ 7045 FPGA and a U250 FPGA. Experimental results show that with balanced II, the number of DSPs can be reduced up to 42% while achieving the same IIs. When compared to other FPGA-based LSTM designs, our design can achieve about 4.92 to 12.4 times lower latency.
Modern video-text retrieval frameworks basically consist of three parts: video encoder, text encoder and the similarity head. With the success on both visual and textual representation learning, transformer based encoders and fusion methods have also been adopted in the field of video-text retrieval. In this report, we present CLIP2TV, aiming at exploring where the critical elements lie in transformer based methods. To achieve this, We first revisit some recent works on multi-modal learning, then introduce some techniques into video-text retrieval, finally evaluate them through extensive experiments in different configurations. Notably, CLIP2TV achieves 52.9@R1 on MSR-VTT dataset, outperforming the previous SOTA result by 4.1%.