Deep Learning Weekly: Issue #201

OpenAI Fund for startups that use GPT-3, a multimodal model 10 times larger than GPT-3, gauging unconsciousness under general anesthesia, Fourier Transform replacements for self-attention, and more

Hey folks,

This week in deep learning, we bring you OpenAI Startup Fund, BAAI's multimodal AI model which is 10 times larger than GPT-3, a hands-on TinyML virtual workshop for fitness solutions and data cascades in machine learning.

You may also enjoy deep learning algorithms for gauging unconsciousness under general anesthesia, Tensorflow's newly open sourced collection of tools for decision forests, a paper on External Attention, a paper on Fourier Transforms as replacements for self-attention, and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!


OpenAI Startup Fund

The OpenAI Startup Fund is investing $100 million to a few early-stage startups in impactful fields such as healthcare, climate change and education.

Start of AI for Oncology lab, a new ICAI Lab

Expertise in cancer research and AI technology blend together to create the new AI for Oncology lab.

AI Weekly: China’s massive multimodal model highlights AI research gap

Researchers at the Beijing Academy of Artificial Intelligence (BAAI) announced the release of Wu Dao 2.0, a multimodal AI model which is 10 times larger than OpenAI’s GPT-3.

Kaolin: Researchers Accelerate 3D Deep Learning with New Tools

NVIDIA Kaolin, an open-source collection of tools used to visualize and generate 3D datasets, is being used to accelerate 3D Deep Learning.

New algorithms show accuracy, reliability in gauging unconsciousness under general anesthesia

A research team based in MIT can now create multiple algorithms, trained on 2-second-long snippets of EEG recordings, that can reliably assess unconsciousness of patients under general anesthesia. 

Microsoft's first GPT-3 product hints at the commercial future of OpenAI

Microsoft partners with OpenAI for its fine-tuned GPT-3 tool that generates source code for office applications, possibly leading to a deeply integrated future of the two.

Mobile & Edge

Muscle AI: Learn How to Build Disruptive Fitness Solutions that Change the Game

A hands-on embedded machine learning workshop for classifying different physical motions and body positions using the SparkFun MicroMod Machine Learning Carrier Board.

TSMC’s 5nm chip enhancements steer AI driving and 5G

Taiwan Semiconductor Manufacturing Company announces chip and process technology enhancements, accelerating AI-enabled driver assistance tasks, smartphones and the like.

Using Embedded Machine Learning to Perform Smoke Detection

A brief demo of a TinyML and sensor fusion-based smoke detection model using the SparkFun Environmental Combo Breakout and some humidity sensors.

How TensorFlow helps Edge Impulse make ML accessible to embedded engineers

A comprehensive article detailing the end-to-end integration of Tensorflow with the Edge Impulse platform, enabling both embedded engineers and machine learning experts to understand the workflow. 


The FLORES-101 data set: Helping build better translation systems around the world

Facebook open-sources FLORES-101, a first-of-its-kind, many-to-many evaluation dataset covering 101 languages from all over the world.

Data Cascades in Machine Learning

A high-level article discussing, studying and validating downstream effects from data issues that result in technical debt over time, otherwise known as data cascades.

Introducing TensorFlow Decision Forests

A detailed article showcasing the newly open-sourced TF-DF, a collection of production-ready state-of-the-art algorithms for training, serving and interpreting decision forest models.

Few-shot learning in practice: GPT-Neo and the Accelerated Inference API

A technical blog explaining few-shot learning and exploring EleutherAI’s GPT-Neo and Hugging Face’s Accelerated Inference API.

Libraries & Code


PyTouch is a machine learning library for tactile touch sensing from Facebook.

aws/sagemaker-tensorflow-serving-container: A TensorFlow Serving solution for use in SageMaker.

SageMaker TensorFlow Serving Container is an open-source project that builds docker images for running TensorFlow Serving on Amazon SageMaker.

DeepSpeed: a Deep Learning Optimization Library

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Papers & Publications

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks


Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignores potential correlation between different samples. This paper proposes a novel attention mechanism which we call external attention, based on two external, small, learnable, shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all data samples. We further incorporate the multi-head mechanism into external attention to provide an all-MLP architecture, external attention MLP (EAMLP), for image classification. Extensive experiments on image classification, object detection, semantic segmentation, instance segmentation, image generation, and point cloud analysis reveal that our method provides results comparable or superior to the self-attention mechanism and some of its variants, with much lower computational and memory costs.

FNet: Mixing Tokens with Fourier Transforms


We show that Transformer encoder architectures can be massively sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that "mix" input tokens. These linear transformations, along with simple nonlinearities in feed-forward layers, are sufficient to model semantic relationships in several text classification tasks. Perhaps most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder with a standard, unparameterized Fourier Transform achieves 92% of the accuracy of BERT on the GLUE benchmark, but pre-trains and runs up to seven times faster on GPUs and twice as fast on TPUs. The resulting model, which we name FNet, scales very efficiently to long inputs, matching the accuracy of the most accurate "efficient" Transformers on the Long Range Arena benchmark, but training and running faster across all sequence lengths on GPUs and relatively shorter sequence lengths on TPUs. Finally, FNet has a light memory footprint and is particularly efficient at smaller model sizes: for a fixed speed and accuracy budget, small FNet models outperform Transformer counterparts.