Deep Learning Weekly: Issue #196
Unity gets into the synthetic data business, optimizations to better leverage CPUs for BERT, Facebook's new method to train Vision Transformers, Google’s DocAI platform hits GA, and more
This week in deep learning, we bring you an animal emotion recognition system from Wageningen University, a bat sense algorithm that creates 3D pictures from audio data, and Waymo’s new CEO signals changes in self-driving.
You may also enjoy an adaptive framework for on-device recommendation, a blog series on BERT-like model inference on CPUs, evolving reinforcement learning algorithms, a paper on video generation using VQ-VAE and Transformers, and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
John Krafcik, Waymo’s CEO since 2015, has announced he will step down and be replaced by former COO Tekedra Mawakana and former COO/CTO Dmitri Dolgov, signaling a few challenges in the industry.
A researcher at Wageningen University & Research published an article detailing an animal emotion recognition system optimized to have an average accuracy of 85% on 9 different emotional states.
Google’s DocAI platform, which has multiple NLP, OCR and Knowledge Graph-powered tools spanning loan applications and procurement, is now generally available on Google Cloud.
Unity is getting into the business of selling synthetic image datasets that it says can be used to create computer vision artificial intelligence models faster and at a much lower cost.
Vingroup, Vietnam’s largest conglomerate is installing an NVIDIA DGX SuperPOD, the most powerful AI supercomputer in the region, to power their global initiatives research that span autonomous vehicles, healthcare and consumer services.
Mobile & Edge
A technical article and step-by-step walkthrough introducing the Tensorflow Lite-based adaptive framework for on-device recommendation.
A mobile machine learning algorithm developed at Glasgow University uses reflected echoes to produce 3D pictures of the surrounding environment.
A comprehensive article detailing the capabilities and functions of the Edge Impulse ecosystem and its effects on modern industrial use cases and general IoT.
An article explaining the neuroscientific motivations of the attention mechanism present in most of the recent deep learning architectures.
The first part of a comprehensive blog series which covers most of the hardware and software optimizations to better leverage CPUs for BERT model inference.
A comprehensive article displaying Tensorflow’s role in high energy physics and reconstruction tasks.
A technical blog about a method that learns generalizable RL algorithms by using a graph representation and applying optimization techniques from the AutoML community.
Working in collaboration with researchers at Inria, Facebook developed a new method, called DINO, to train Vision Transformers (ViT) with no supervision.
Libraries & Code
A cloud-native neural search framework for any kind of data.
A library that automates machine learning tasks for text, image and tabular data.
A python-based software toolkit that provides a concrete solution to the lack of labelled data using weak supervision.
Papers & Publications
We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGPT uses VQ-VAE that learns downsampled discrete latent representations of a raw video by employing 3D convolutions and axial self-attention. A simple GPT-like architecture is then used to autoregressively model the discrete latents using spatio-temporal position encodings. Despite the simplicity in formulation and ease of training, our architecture is able to generate samples competitive with state-of-the-art GAN models for video generation on the BAIR Robot dataset, and generate high fidelity natural images from UCF-101 and Tumbler GIF Dataset (TGIF). We hope our proposed architecture serves as a reproducible reference for a minimalistic implementation of transformer based video generation models.
Model-based reinforcement learning is a compelling framework for data-efficient learning of agents that interact with the world. This family of algorithms has many subcomponents that need to be carefully selected and tuned. As a result the entry-bar for researchers to approach the field and to deploy it in real-world tasks can be daunting. In this paper, we present MBRL-Lib -- a machine learning library for model-based reinforcement learning in continuous state-action spaces based on PyTorch. MBRL-Lib is designed as a platform for both researchers, to easily develop, debug and compare new algorithms, and non-expert user, to lower the entry-bar of deploying state-of-the-art algorithms.