Deep Learning Weekly: Issue #197
PyTorchVideo, sports analytics with deep learning, the Atlas of AI book, GPU shortage, carbon emissions of large neural networks, and more
|Grégoire Jauvion||May 12||5|
This week in deep learning, we bring you the Bridge2AI initiative to propel biomedical research, Cerebras’ latest chip, MLPerf benchmark, a new ML library written entirely in C++, and some well-written paper notes.
You may also enjoy DeepMind’s take on sports analytics, a deep learning library for video understanding, a paper on the carbon footprint of large neural networks, another one on the new TransGAN architecture, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Nvidia warns that demand in GPUs, driven by gamers, cryptominers and AI practitioners, will continue to outstrip supply for the rest of the year, in particular for its GPU RTX 30-series.
In this interview, Kate Crawford, a researcher at Microsoft, presents her new book “Atlas of AI” which states that many applications and side effects of AI are in urgent need of regulation.
SEER is a new high-performance computer vision system that can learn from any collection of digital images without requiring data curation and labeling.
US National Institute of Health launches the Bridge2AI program to propel biomedical research forward by setting the stage for widespread adoption of artificial intelligence (AI) that tackles complex biomedical challenges.
AstraZeneca is working with Cerebras, a chip maker, on NLP language models able to search efficiently in medical literature, a critical capability for advancing drug discovery.
DeepMind presents its vision on how AI can help in football analytics, using state-of-the-art techniques in computer vision, statistical learning and game theory.
Mobile & Edge
Cerebras releases the WSE2, a chip the size of a dinner plate designed for AI workloads in large-scale data centers, with 50 times more transistors than the largest GPU on the market.
MLPerf benchmark compares how the main chips on the market perform on ML tasks. Its last version has been released as part of the MLCommons initiative.
A new neural network architecture designed by AI researchers at DarwinAI and the University of Waterloo will make it possible to perform image segmentation on computing devices with low-power capacity.
A technical post about the momentum term in the most commonly used optimizers for deep learning: the author shows that it is actually not always needed, and simple gradient descent can do the job.
Flashlight is Facebook Research’s new open source ML library, written entirely in C++, which makes it a powerful tool for doing research in high-performance computing environments.
Vitaly Kurin, a PhD student working on Multitask Graph-Based Reinforcement Learning at Oxford shares his notes of the papers he read.
This course presents the latest techniques using NLP methods to solve reinforcement learning tasks.
Libraries & Code
This framework provides an easy method to compute vector representations for sentences, paragraphs and images, using transformer networks like BERT.
PyTorchVideo is a deep learning library developed using PyTorch providing components needed to accelerate video understanding research.
Fastdebug is a very nice library designed around improving the quality of life when dealing with Pytorch and fastai errors, while also including some new sanity checks.
Papers & Publications
The recent explosive interest on transformers has suggested their potential to become powerful "universal" models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go - are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)? Driven by that curiosity, we conduct the first pilot study in building a GAN, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed TransGAN, consists of a memory-friendly transformer-based generator that progressively increases feature resolution while decreasing embedding dimension, and a patch-level discriminator that is also transformer-based. We then demonstrate TransGAN to notably benefit from data augmentations (more than standard GANs), a multi-task co-training strategy for the generator, and a locally initialized self-attention that emphasizes the neighborhood smoothness of natural images. Equipped with those findings, TransGAN can effectively scale up with bigger models and high-resolution image datasets. Specifically, our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones. Specifically, TransGAN sets new state-of-the-art IS score of 10.10 and FID score of 25.32 on STL-10. It also reaches competitive 8.64 IS score and 11.89 FID score on Cifar-10, and 12.23 FID score on CelebA 64×64, respectively. We also conclude with a discussion of the current limitations and future potential of TransGAN.
The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refine earlier estimates for the neural architecture search that found Evolved Transformer. We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters. Geographic location matters for ML workload scheduling since the fraction of carbon-free energy and resulting CO2e vary ~5X-10X, even within the same country and the same organization. We are now optimizing where and when large models are trained. Specific datacenter infrastructure matters, as Cloud datacenters can be ~1.4-2X more energy efficient than typical datacenters, and the ML-oriented accelerators inside them can be ~2-5X more effective than off-the-shelf systems. Remarkably, the choice of DNN, datacenter, and processor can reduce the carbon footprint up to ~100-1000X. These large factors also make retroactive estimates of energy cost difficult. To avoid miscalculations, we believe ML papers requiring large computational resources should make energy consumption and CO2e explicit when practical. We are working to be more transparent about energy use and CO2e in our future research. To help reduce the carbon footprint of ML, we believe energy usage and CO2e should be a key metric in evaluating models, and we are collaborating with MLPerf developers to include energy usage during training and inference in this industry standard benchmark.
Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.