Deep Learning Weekly: Issue #203

Facebook’s fundamental theory on DNNs beyond the infinite-width limit, An award-winning CNN-based model called GarbageNet, a paper on Decision Transformers and more

Hey folks,

This week in deep learning, we bring you Amazon's Just Walk Out technology, a deep learning model that sorts trash exceptionally well, a TinyML tutorial that lets your edge device interpret your dog's mood and Facebook's fundamental theory on Deep Neural Networks beyond the infinite-width abstraction.

You may also enjoy a tool that lets you generate verses from your favorite rappers, a comprehensive tutorial on modeling and optimizing ML pipelines, a complete data augmentation library for audio, image, text and video, a paper on Decision Transformers and Sequence Modeling, and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!


What Google's AI-designed chip tells us about the nature of intelligence

Google’s next generation of Tensor Processing Units use a deep reinforcement learning technique for its highly complex floorplanning, highlighting the synergistic strength of human and artificial intelligence.

Amazon opens its first full-sized grocery store with cashierless checkout technology

Amazon opens a grocery store equipped with its Just Walk Out technology, which uses sensors and machine learning to let consumers shop without waiting in a checkout line. 

New AI Proves to Be a Trash Sorter Extraordinaire

GarbageNet, a CNN-based model, uses a three-pronged approach to categorize new garbage items it has not yet encountered with an overall accuracy of 96.96%.

This AI lets you generate new verses from your favorite rappers

Uberduck is a new text-to-speech tool that can synthesize verses from Tupac, Jay-Z, Kanye West and other rappers/celebrities in a matter of seconds.

How AI complicates enterprise risk management

Leaders, partners and directors from highly regarded corporations and consulting groups express their dilemmas concerning behavioral control of large-scale artificial intelligence.

Fiddler Labs opens AI 'black box' to solve bias problem and enable compliance

Fiddler is a platform for heterogeneous model explainability that can handle a variety of problems from retail to healthcare.

Mobile & Edge

Pupppi: a tiny, portable, edge ML device ready to interpret a dog’s mood

A technical tutorial using a Nano 33 BLE Sense and the Edge Impulse Studio to interpret a dog's mood based on vocal signals.

Hearing Substitution Using Haptic Feedback

A comprehensive article describing the use of machine learning, AWS Prediction and Neosensory Buzz's haptic feedback for deaf parents to connect to their kids.

MLPerf Tiny Inference Is a New Benchmarking Suite for tinyML Devices

MLCommons releases MLPerf Tiny Inference which offers benchmark reporting for key tasks (such as anomaly detection) and comparisons of tinyML devices, systems and software.

Easier object detection on mobile with TensorFlow Lite

A blog post showcasing how to leverage the latest offerings (on-device ML learning pathway, EfficientDet-lite, Object Detection model maker and metadata writer API) from TensorFlow Lite to build a state-of-the-art object detector.


Are Self-Driving Cars Really Safer Than Human Drivers?

A comprehensive article that discusses a qualitative analysis of automated driving systems.

Advancing AI theory with a first-principles understanding of deep neural networks

An introductory article to the Principles of Deep Learning Theory book, which lays out an effective theory of DNNs beyond the infinite-width abstraction.

HuBERT: Speech representations for recognition & generation

A detailed blog going through the state-of-the-art results of Facebook’s HuBERT, a new approach for learning self-supervised speech representations.

Modeling Pipeline Optimization With scikit-learn

A step-by-step tutorial for machine learning pipelines and optimization using scikit-learn.

Dashboards for Interpreting & Comparing Machine Learning Models

A technical article that discusses using Interpret, an open-source Python library for performance analysis,  to create dashboards for machine learning models.

Libraries & Code

facebookresearch/AugLy: A data augmentations library for audio, image, text, and video

AugLy is a data augmentations library that currently supports four modalities (audio, image, text and video) and over 100 augmentations.

tusharsarkar3/XBNet: Boosted neural network for tabular data

XBNet that is built on PyTorch combines tree-based models with neural networks to create a robust architecture that is trained by using a novel optimization technique.

gradio-app/gradio: Create UIs for prototyping your machine learning model in 3 minutes

Gradio is an open-source Python library that lets you create demos of your machine learning code, get feedback on model performance from users and debug your model interactively.

Papers & Publications

Decision Transformer: Reinforcement Learning via Sequence Modeling


We present a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

XCiT: Cross-Covariance Image Transformers


Following their success in natural language processing, transformers have recently shown much promise for computer vision. The self-attention operation underlying transformers yields global interactions between all tokens ,i.e. words or image patches, and enables flexible modelling of image data beyond the local interactions of convolutions. This flexibility, however, comes with a quadratic complexity in time and memory, hindering application to long sequences and high-resolution images. We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries. The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images. Our cross-covariance image transformer (XCiT) is built upon XCA. It combines the accuracy of conventional transformers with the scalability of convolutional architectures. We validate the effectiveness and generality of XCiT by reporting excellent results on multiple vision benchmarks, including image classification and self-supervised feature learning on ImageNet-1k, object detection and instance segmentation on COCO, and semantic segmentation on ADE20k.