Deep Learning Weekly: Issue #196

Unity gets into the synthetic data business, optimizations to better leverage CPUs for BERT, Facebook's new method to train Vision Transformers, Google’s DocAI platform hits GA, and more

Hey folks,

This week in deep learning, we bring you an animal emotion recognition system from Wageningen University, a bat sense algorithm that creates 3D pictures from audio data, and Waymo’s new CEO signals changes in self-driving.

You may also enjoy an adaptive framework for on-device recommendation, a blog series on BERT-like model inference on CPUs, evolving reinforcement learning algorithms, a paper on video generation using VQ-VAE and Transformers, and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!


Waymo’s leadership shift spotlights self-driving car challenges

John Krafcik, Waymo’s CEO since 2015, has announced he will step down and be replaced by former COO Tekedra Mawakana and former COO/CTO Dmitri Dolgov, signaling a few challenges in the industry.

A scientist created emotion recognition AI for animals

A researcher at Wageningen University & Research published an article detailing an animal emotion recognition system optimized to have an average accuracy of 85% on 9 different emotional states.

Google's Document AI service reaches general availability with more human input

Google’s DocAI platform, which has multiple NLP, OCR and Knowledge Graph-powered tools spanning loan applications and procurement, is now generally available on Google Cloud.

Unity launches synthetic datasets for computer vision AI

Unity is getting into the business of selling synthetic image datasets that it says can be used to create computer vision artificial intelligence models faster and at a much lower cost.

Vietnam's VinAI Adopts SuperPOD for Vingroup's AI Work

Vingroup, Vietnam’s largest conglomerate is installing an NVIDIA DGX SuperPOD, the most powerful AI supercomputer in the region, to power their global initiatives research that span autonomous vehicles, healthcare and consumer services. 

Mobile & Edge

Adaptive Framework for On-device Recommendation — The TensorFlow Blog

A technical article and step-by-step walkthrough introducing the Tensorflow Lite-based adaptive framework for on-device recommendation.

'Bat-sense' algorithm could be used to monitor people and property without cameras

A mobile machine learning algorithm developed at Glasgow University uses reflected echoes to produce 3D pictures of the surrounding environment.

Embedded Machine Learning Is Giving Industrial Machines the Brains They Always Wanted

A comprehensive article detailing the capabilities and functions of the Edge Impulse ecosystem and its effects on modern industrial use cases and general IoT.


Attention in the Human Brain and Its Applications in ML

An article explaining the neuroscientific motivations of the attention mechanism present in most of the recent deep learning architectures.

Scaling up BERT-like model Inference on modern CPU - Part 1

The first part of a comprehensive blog series which covers most of the hardware and software optimizations to better leverage CPUs for BERT model inference.

Reconstructing thousands of particles in one go at the CERN LHC with TensorFlow

A comprehensive article displaying Tensorflow’s role in high energy physics and reconstruction tasks.

Evolving Reinforcement Learning Algorithms

A technical blog about a method that learns generalizable RL algorithms by using a graph representation and applying optimization techniques from the AutoML community.

DINO and PAWS: Advancing the state of the art in computer vision

Working in collaboration with researchers at Inria, Facebook developed a new method, called DINO, to train Vision Transformers (ViT) with no supervision.

Libraries & Code

jina-ai/jina: An easier way to build neural search on the cloud

A cloud-native neural search framework for any kind of data.

AutoGluon: AutoML for Text, Image, and Tabular Data

A library that automates machine learning tasks for text, image and tabular data.

NorskRegnesentral/skweak: skweak: A software toolkit for weak supervision applied to NLP tasks

A python-based software toolkit that provides a concrete solution to the lack of labelled data using weak supervision.

Papers & Publications

VideoGPT: Video Generation using VQ-VAE and Transformers


We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGPT uses VQ-VAE that learns downsampled discrete latent representations of a raw video by employing 3D convolutions and axial self-attention. A simple GPT-like architecture is then used to autoregressively model the discrete latents using spatio-temporal position encodings. Despite the simplicity in formulation and ease of training, our architecture is able to generate samples competitive with state-of-the-art GAN models for video generation on the BAIR Robot dataset, and generate high fidelity natural images from UCF-101 and Tumbler GIF Dataset (TGIF). We hope our proposed architecture serves as a reproducible reference for a minimalistic implementation of transformer based video generation models.

MBRL-Lib: A Modular Library for Model-based Reinforcement Learning


Model-based reinforcement learning is a compelling framework for data-efficient learning of agents that interact with the world. This family of algorithms has many subcomponents that need to be carefully selected and tuned. As a result the entry-bar for researchers to approach the field and to deploy it in real-world tasks can be daunting. In this paper, we present MBRL-Lib -- a machine learning library for model-based reinforcement learning in continuous state-action spaces based on PyTorch. MBRL-Lib is designed as a platform for both researchers, to easily develop, debug and compare new algorithms, and non-expert user, to lower the entry-bar of deploying state-of-the-art algorithms.