Deep Learning Weekly: Issue #225

GPT-3’s public availability, a model that can diagnose glaucoma in a snap, TensorFlow Graph Neural Networks library, a paper on accelerating RNNs for gravitational wave experiments, and more

Hey folks,

This week in deep learning, we bring you GPT-3's availability, a model that can diagnose glaucoma in 10 seconds, TensorFlow Graph Neural Networks library, and a paper on permutation-invariant neural networks for RL.

You may also enjoy on-device training (with reference Colab) using TensorFlow Lite, a primer on ML interpretability, a tutorial for fine-tuning XLSR for automatic speech recognition, a paper on accelerating recurrent neural networks for gravitational wave experiments, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Comet Raises $50 Million Series B to Accelerate ML Development for the Enterprise

With the round, Comet plans to continue building out its enterprise ML platform that combines both experiment tracking and model production monitoring tools.

OpenAI's API Now Available with No Waitlist

Developers in supported countries can now sign up and start experimenting with GPT-3.

New AI Test Diagnoses Glaucoma in Just 10 Seconds

A team of engineers and ophthalmologists in Australia has developed a novel approach using AI to diagnose glaucoma that can yield results in just 10 seconds.

AWS makes AI and machine learning tangible with first major art debut at Smithsonian

Amazon Web Services Inc. has commissioned its first-ever major art piece, a site-specific sculpture powered by artificial intelligence and designed by artist and architect Suchi Reddy that will be the centerpiece of the Smithsonian’s “Futures” exhibit.

Google partners with startup Cohere to run AI models on Cloud TPU chips

Google LLC’s cloud business will help Cohere Inc., an early-stage artificial intelligence startup, run natural language processing models in the cloud as part of a multiyear partnership.

MLPerf HPC Benchmarks Show the Power of HPC+AI

NVIDIA-powered systems won four of five tests in MLPerf HPC 1.0, an industry benchmark for AI performance on scientific applications in high performance computing.

In Pursuit of Smart City Vision, Startup Two-i Keeps an AI on Worker Safety

Two-i uses computer vision to help prevent deadly accidents in the oil and gas industry, one of the world’s most dangerous sectors.

Mobile & Edge

On-device training in TensorFlow Lite

TensorFlow Lite now supports training your models on-device, in addition to running inference. The article includes a reference Colab and a short tutorial.

Using Arm Streamline for profiling ML workloads

This tutorial demonstrates how to use Arm Streamline Performance Analyzer to profile an ML-based Android application.

This tinyML system helps soothe your dog's separation anxiety with sounds of your voice

A tinyML project that can automatically detect a dog’s barking sounds and subsequently play some familiar voices to calm it down.

AI Recycling Machine using oneM2M

A quick guide on how to make a MobileNet-based garbage sorting device using a Raspberry Pi 4 Model B.


Introducing TensorFlow Graph Neural Networks

An announcement of and tutorial for TensorFlow Graph Neural Networks (GNNs), a library designed to make it easy to work with graph structured data using TensorFlow. 

A Primer on ML Interpretability & Explainability

A comprehensive and theoretical article on the definition, motivating need, landscape, limitations, and the future of ML/DNN interpretability.

How to Train State-Of-The-Art Models Using TorchVision’s Latest Primitives

A comprehensive walkthrough on how to train SOTA models using TorchVision’s latest ResNet primitives.

Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers

A technical tutorial and in-detail explanation of how XLS-R (a multilingual version of Wav2Vec2) can be fine-tuned for automatic speech recognition.

Libraries & Code

IntelLabs/control-flag: A system to flag anomalous source code expressions by learning typical expressions from training data

A self-supervised idiosyncratic pattern detection system that learns typical patterns that occur in the control structures of high-level programming languages, such as C/C++, by mining these patterns from open-source repositories.


An open-source framework for prompt-learning that supports loading PLMs directly from Hugging Face transformers.

jaxdf - JAX-based Discretization Framework

A JAX-based package defining a coding framework for writing differentiable numerical simulators with arbitrary discretizations.

Papers & Publications

The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning


In complex systems, we often observe complex global behavior emerge from a collection of agents interacting with each other in their environment, with each individual agent acting only on locally available information, without knowing the full picture. Such systems have inspired development of artificial intelligence algorithms in areas such as swarm optimization and cellular automata. Motivated by the emergence of collective behavior from complex cellular systems, we build systems that feed each sensory input from the environment into distinct, but identical neural networks, each with no fixed relationship with one another. We show that these sensory networks can be trained to integrate information received locally, and through communication via an attention mechanism, can collectively produce a globally coherent policy. Moreover, the system can still perform its task even if the ordering of its inputs is randomly permuted several times during an episode. These permutation invariant systems also display useful robustness and generalization properties that are broadly applicable.

Accelerating Recurrent Neural Networks for Gravitational Wave Experiments


This paper presents novel reconfigurable architectures for reducing the latency of recurrent neural networks (RNNs) that are used for detecting gravitational waves. Gravitational interferometers such as the LIGO detectors capture cosmic events such as black hole mergers which happen at unknown times and of varying durations, producing time-series data. We have developed a new architecture capable of accelerating RNN inference for analyzing time-series data from LIGO detectors. This architecture is based on optimizing the initiation intervals (II) in a multi-layer LSTM (Long Short-Term Memory) network, by identifying appropriate reuse factors for each layer. A customizable template for this architecture has been designed, which enables the generation of low-latency FPGA designs with efficient resource utilization using high-level synthesis tools. The proposed approach has been evaluated based on two LSTM models, targeting a ZYNQ 7045 FPGA and a U250 FPGA. Experimental results show that with balanced II, the number of DSPs can be reduced up to 42% while achieving the same IIs. When compared to other FPGA-based LSTM designs, our design can achieve about 4.92 to 12.4 times lower latency.

CLIP2TV: An Empirical Study on Transformer-based Methods for Video-Text Retrieval


Modern video-text retrieval frameworks basically consist of three parts: video encoder, text encoder and the similarity head. With the success on both visual and textual representation learning, transformer based encoders and fusion methods have also been adopted in the field of video-text retrieval. In this report, we present CLIP2TV, aiming at exploring where the critical elements lie in transformer based methods. To achieve this, We first revisit some recent works on multi-modal learning, then introduce some techniques into video-text retrieval, finally evaluate them through extensive experiments in different configurations. Notably, CLIP2TV achieves 52.9@R1 on MSR-VTT dataset, outperforming the previous SOTA result by 4.1%.

A guest post by
Industrial Engineering - Deep Learning - Music Production - Rock Climbing