Deep Learning Weekly: Issue #197

PyTorchVideo, sports analytics with deep learning, the Atlas of AI book, GPU shortage, carbon emissions of large neural networks, and more

Hey folks,

This week in deep learning, we bring you the Bridge2AI initiative to propel biomedical research, Cerebras’ latest chip, MLPerf benchmark, a new ML library written entirely in C++, and some well-written paper notes.

You may also enjoy DeepMind’s take on sports analytics, a deep learning library for video understanding, a paper on the carbon footprint of large neural networks, another one on the new TransGAN architecture, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Nvidia warns the great GPU shortage will continue throughout 2021

Nvidia warns that demand in GPUs, driven by gamers, cryptominers and AI practitioners, will continue to outstrip supply for the rest of the year, in particular for its GPU RTX 30-series.

This Researcher Says AI Is Neither Artificial nor Intelligent

In this interview, Kate Crawford, a researcher at Microsoft, presents her new book “Atlas of AI” which states that many applications and side effects of AI are in urgent need of regulation.

SEER: An important step toward AI that works well for everyone

SEER is a new high-performance computer vision system that can learn from any collection of digital images without requiring data curation and labeling.

Bridge to Artificial Intelligence (Bridge2AI)

US National Institute of Health launches the Bridge2AI program to propel biomedical research forward by setting the stage for widespread adoption of artificial intelligence (AI) that tackles complex biomedical challenges.

Accelerating Drug Discovery Research with New AI Models: a look at the AstraZeneca Cerebras collaboration

AstraZeneca is working with Cerebras, a chip maker, on NLP language models able to search efficiently in medical literature, a critical capability for advancing drug discovery.

Advancing sports analytics through AI research

DeepMind presents its vision on how AI can help in football analytics, using state-of-the-art techniques in computer vision, statistical learning and game theory.

Mobile & Edge

Cerebras Crams More Compute Into Second-Gen ‘Dinner Plate Sized’ Chip

Cerebras releases the WSE2, a chip the size of a dinner plate designed for AI workloads in large-scale data centers, with 50 times more transistors than the largest GPU on the market.

AI industry’s performance benchmark, MLPerf, for the first time also measures the energy that machine learning consumes

MLPerf benchmark compares how the main chips on the market perform on ML tasks. Its last version has been released as part of the MLCommons initiative.

New deep learning model brings image segmentation to edge devices

A new neural network architecture designed by AI researchers at DarwinAI and the University of Waterloo will make it possible to perform image segmentation on computing devices with low-power capacity.


Acceleration without Momentum

A technical post about the momentum term in the most commonly used optimizers for deep learning: the author shows that it is actually not always needed, and simple gradient descent can do the job.

Flashlight: Fast and flexible machine learning in C++

Flashlight is Facebook Research’s new open source ML library, written entirely in C++, which makes it a powerful tool for doing research in high-performance computing environments.

Paper Notes by Vitaly Kurin

Vitaly Kurin, a PhD student working on Multitask Graph-Based Reinforcement Learning at Oxford shares his notes of the papers he read.

Knowledge Intensive Reinforcement Learning

This course presents the latest techniques using NLP methods to solve reinforcement learning tasks.

Libraries & Code

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch

This framework provides an easy method to compute vector representations for sentences, paragraphs and images, using transformer networks like BERT.

PyTorchVideo: A deep learning library for video understanding research

PyTorchVideo is a deep learning library developed using PyTorch providing components needed to accelerate video understanding research.

Fastdebug: A helpful library for improving torch and fastai errors

Fastdebug is a very nice library designed around improving the quality of life when dealing with Pytorch and fastai errors, while also including some new sanity checks.

Papers & Publications

TransGAN: Two Transformers Can Make One Strong GAN


The recent explosive interest on transformers has suggested their potential to become powerful "universal" models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go - are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)? Driven by that curiosity, we conduct the first pilot study in building a GAN, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed TransGAN, consists of a memory-friendly transformer-based generator that progressively increases feature resolution while decreasing embedding dimension, and a patch-level discriminator that is also transformer-based. We then demonstrate TransGAN to notably benefit from data augmentations (more than standard GANs), a multi-task co-training strategy for the generator, and a locally initialized self-attention that emphasizes the neighborhood smoothness of natural images. Equipped with those findings, TransGAN can effectively scale up with bigger models and high-resolution image datasets. Specifically, our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones. Specifically, TransGAN sets new state-of-the-art IS score of 10.10 and FID score of 25.32 on STL-10. It also reaches competitive 8.64 IS score and 11.89 FID score on Cifar-10, and 12.23 FID score on CelebA 64×64, respectively. We also conclude with a discussion of the current limitations and future potential of TransGAN.

Carbon Emissions and Large Neural Network Training


The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refine earlier estimates for the neural architecture search that found Evolved Transformer. We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters. Geographic location matters for ML workload scheduling since the fraction of carbon-free energy and resulting CO2e vary ~5X-10X, even within the same country and the same organization. We are now optimizing where and when large models are trained. Specific datacenter infrastructure matters, as Cloud datacenters can be ~1.4-2X more energy efficient than typical datacenters, and the ML-oriented accelerators inside them can be ~2-5X more effective than off-the-shelf systems. Remarkably, the choice of DNN, datacenter, and processor can reduce the carbon footprint up to ~100-1000X. These large factors also make retroactive estimates of energy cost difficult. To avoid miscalculations, we believe ML papers requiring large computational resources should make energy consumption and CO2e explicit when practical. We are working to be more transparent about energy use and CO2e in our future research. To help reduce the carbon footprint of ML, we believe energy usage and CO2e should be a key metric in evaluating models, and we are collaborating with MLPerf developers to include energy usage during training and inference in this industry standard benchmark.

MLP-Mixer: An all-MLP Architecture for Vision


Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.