Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #223
A real-time accent translation company from Stanford, new research on deep evolutionary reinforcement learning, an AI moral textbook, anticipative video transformers, and more
This week in deep learning, we bring you a real-time accent translation company from Stanford, new research on deep evolutionary reinforcement learning, an AI moral textbook, and a paper on anticipative video transformers.
You may also enjoy GPT-3's availability in the suite of Azure cloud tools, improved on-device ML on Pixel 6, feature extraction using Torch FX, a paper on a tactile soft sensor that leverages machine learning and magnetic sensing, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Stanford students Shawn Zhang, Maxim Serebryakov, and Andres Perez Soderi founded a near real-time accent translation company based on deep learning that runs on a CPU.
In a new paper, AI researchers at Stanford University present a technique that uses a complex virtual environment and reinforcement learning to create agents that evolve both in their physical structure and learning capacities.
Microsoft is making an upgraded version of the program, OpenAI’s GPT-3, available to business customers as part of its suite of Azure cloud tools.
Alphabet establishes Isomorphic Labs to spearhead innovation in the field of drug discovery, supercharged by AI capabilities.
Sensei, Adobe’s AI platform, is now integrated into all the products of its Creative Cloud suite, along with new and improved deep learning features.
The Linux Foundation, a nonprofit technology consortium that enables innovation through open-source software, announced the launch of initiatives focused on infrastructure abstraction, dataset accessibility, and more.
Mobile & Edge
A comprehensive article showcasing the improvements in on-device machine learning made possible by designing the ML models for Google Tensor’s TPU.
An end-to-end tutorial demonstrating how to use TensorFlow core, TensorFlow Agents, and TensorFlow Lite to build a game agent to play against a human user in a small board game app
A proof-of-concept to demonstrate how easily you can deploy a fleet of edge devices running a tinyML model with object detection capabilities.
An implementation of a TinyML model that can alert campers if an animal or human is approaching.
A theoretical blog revolving around the dynamic creation of composable models and how these act as scaffolding over which to optimize behavior.
Scientists have developed a new moral textbook customized for machines that was built from varied sources. With it, they trained an AI named Delphi that was 92.1% accurate on moral judgments when vetted by people.
A technical tutorial on how to use the new TorchVision utility that lets people access intermediate transformations of an input during the forward pass of a PyTorch Module.
A blog on alternatives, which are more environmentally friendly and light weight, to be considered over large language models.
Comet Data Scientist Harpreet Sahota with a deep dive into the key challenges and approaches to solving imbalanced classification problems.
Libraries & Code
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations for 27 emotion categories and for “Neutral” emojis.
Scenic is a codebase with a focus on research around attention-based models for computer vision.
A BERT-based reverse dictionary that returns words semantically matching query descriptions.
Bagua is a deep learning training acceleration framework for PyTorch that supports distributed training, cached datasets, performance autotuning, and more.
Papers & Publications
We propose Anticipative Video Transformer (AVT), an end-to-end attention-based video modeling architecture that attends to the previously observed video in order to anticipate future actions. We train the model jointly to predict the next action in a video sequence, while also learning frame feature encoders that are predictive of successive future frames' features. Compared to existing temporal aggregation strategies, AVT has the advantage of both maintaining the sequential progression of observed actions while still capturing long-range dependencies--both critical for the anticipation task. Through extensive experiments, we show that AVT obtains the best reported performance on four popular action anticipation benchmarks: EpicKitchens-55, EpicKitchens-100, EGTEA Gaze+, and 50-Salads; and it wins first place in the EpicKitchens-100 CVPR'21 challenge.
Soft sensors have continued growing interest in robotics, due to their ability to enable both passive conformal contact from the material properties and active contact data from the sensor properties. However, the same properties of conformal contact result in faster deterioration of soft sensors and larger variations in their response characteristics over time and across samples, inhibiting their ability to be long-lasting and replaceable. ReSkin is a tactile soft sensor that leverages machine learning and magnetic sensing to offer a low-cost, diverse and compact solution for long-term use. Magnetic sensing separates the electronic circuitry from the passive interface, making it easier to replace interfaces as they wear out while allowing for a wide variety of form factors. Machine learning allows us to learn sensor response models that are robust to variations across fabrication and time, and our self-supervised learning algorithm enables finer performance enhancement with small, inexpensive data collection procedures. We believe that ReSkin opens the door to more versatile, scalable and inexpensive tactile sensation modules than existing alternatives.
State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high test performance, despite the conceptual simplicity of this problem distribution. To increase performance, we propose training verifiers to judge the correctness of model completions. At test time, we generate many candidate solutions and select the one ranked highest by the verifier. We demonstrate that verification significantly improves performance on GSM8K, and we provide strong empirical evidence that verification scales more effectively with increased data than a finetuning baseline.