Deep Learning Weekly: Issue #223

A real-time accent translation company from Stanford, new research on deep evolutionary reinforcement learning, an AI moral textbook, anticipative video transformers, and more

Hey folks,

This week in deep learning, we bring you a real-time accent translation company from Stanford, new research on deep evolutionary reinforcement learning, an AI moral textbook, and a paper on anticipative video transformers.

You may also enjoy GPT-3's availability in the suite of Azure cloud tools, improved on-device ML on Pixel 6, feature extraction using Torch FX, a paper on a tactile soft sensor that leverages machine learning and magnetic sensing, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


This 6-Million-Dollar AI Changes Accents as You Speak

Stanford students Shawn Zhang, Maxim Serebryakov, and Andres Perez Soderi founded a near real-time accent translation company based on deep learning that runs on a CPU.

AI is now learning to evolve like earthly lifeforms

In a new paper, AI researchers at Stanford University present a technique that uses a complex virtual environment and reinforcement learning to create agents that evolve both in their physical structure and learning capacities.

Microsoft is giving businesses access to OpenAI’s powerful AI language model GPT-3

Microsoft is making an upgraded version of the program, OpenAI’s GPT-3, available to business customers as part of its suite of Azure cloud tools.

Isomorphic Labs is Alphabet's play in AI drug discovery

Alphabet establishes Isomorphic Labs to spearhead innovation in the field of drug discovery, supercharged by AI capabilities.

How AI is shaping Adobe's product strategy

Sensei, Adobe’s AI platform, is now integrated into all the products of its Creative Cloud suite, along with new and improved deep learning features.

The Linux Foundation announces new initiatives for AI, IoT, and cloud computing

The Linux Foundation, a nonprofit technology consortium that enables innovation through open-source software, announced the launch of initiatives focused on infrastructure abstraction, dataset accessibility, and more.

Mobile & Edge

Improved On-Device ML on Pixel 6, with Neural Architecture Search

A comprehensive article showcasing the improvements in on-device machine learning made possible by designing the ML models for Google Tensor’s TPU.

Building a board game app with TensorFlow: a new TensorFlow Lite reference app

An end-to-end tutorial demonstrating how to use TensorFlow core, TensorFlow Agents, and TensorFlow Lite to build a game agent to play against a human user in a small board game app

Scale Your Fleet of tinyML Solutions Using AWS IoT

A proof-of-concept to demonstrate how easily you can deploy a fleet of edge devices running a tinyML model with object detection capabilities.

AI-powered thermal camera for safe camping

An implementation of a TinyML model that can alert campers if an animal or human is approaching.


Strong AI Requires Autonomous Building of Composable Models

A theoretical blog revolving around the dynamic creation of composable models and how these act as scaffolding over which to optimize behavior.

Machines Learn Good From Commonsense Norm Bank

Scientists have developed a new moral textbook customized for machines that was built from varied sources. With it, they trained an AI named Delphi that was 92.1% accurate on moral judgments when vetted by people.

Feature Extraction in TorchVision using Torch FX

A technical tutorial on how to use the new TorchVision utility that lets people access intermediate transformations of an input during the forward pass of a PyTorch Module.

Large Language Models: A New Moore's Law?

A blog on alternatives, which are more environmentally friendly and light weight, to be considered over large language models.

Imbalanced Classification Demystified

Comet Data Scientist Harpreet Sahota with a deep dive into the key challenges and approaches to solving imbalanced classification problems.

Libraries & Code


GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations for 27 emotion categories and for “Neutral” emojis.

Scenic: A Jax Library for Computer Vision Research and Beyond

Scenic is a codebase with a focus on research around attention-based models for computer vision.

WantWords: An open-source online reverse dictionary.

A BERT-based reverse dictionary that returns words semantically matching query descriptions.


Bagua is a deep learning training acceleration framework for PyTorch that supports distributed training, cached datasets, performance autotuning, and more.

Papers & Publications

Anticipative Video Transformer


We propose Anticipative Video Transformer (AVT), an end-to-end attention-based video modeling architecture that attends to the previously observed video in order to anticipate future actions. We train the model jointly to predict the next action in a video sequence, while also learning frame feature encoders that are predictive of successive future frames' features. Compared to existing temporal aggregation strategies, AVT has the advantage of both maintaining the sequential progression of observed actions while still capturing long-range dependencies--both critical for the anticipation task. Through extensive experiments, we show that AVT obtains the best reported performance on four popular action anticipation benchmarks: EpicKitchens-55, EpicKitchens-100, EGTEA Gaze+, and 50-Salads; and it wins first place in the EpicKitchens-100 CVPR'21 challenge.

ReSkin: versatile, replaceable, lasting tactile skins


Soft sensors have continued growing interest in robotics, due to their ability to enable both passive conformal contact from the material properties and active contact data from the sensor properties. However, the same properties of conformal contact result in faster deterioration of soft sensors and larger variations in their response characteristics over time and across samples, inhibiting their ability to be long-lasting and replaceable. ReSkin is a tactile soft sensor that leverages machine learning and magnetic sensing to offer a low-cost, diverse and compact solution for long-term use. Magnetic sensing separates the electronic circuitry from the passive interface, making it easier to replace interfaces as they wear out while allowing for a wide variety of form factors. Machine learning allows us to learn sensor response models that are robust to variations across fabrication and time, and our self-supervised learning algorithm enables finer performance enhancement with small, inexpensive data collection procedures. We believe that ReSkin opens the door to more versatile, scalable and inexpensive tactile sensation modules than existing alternatives.

Training Verifiers to Solve Math Word Problems


State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high test performance, despite the conceptual simplicity of this problem distribution. To increase performance, we propose training verifiers to judge the correctness of model completions. At test time, we generate many candidate solutions and select the one ranked highest by the verifier. We demonstrate that verification significantly improves performance on GSM8K, and we provide strong empirical evidence that verification scales more effectively with increased data than a finetuning baseline.

A guest post by
Industrial Engineering - Deep Learning - Music Production - Rock Climbing