Deep Learning Weekly Issue #145

Facebook's open source chatbot, OpenAI's music generation neural net, ML on embedded systems, and more...

Hey folks,

This week in deep learning we bring you highlights from MIT’s new wearable that lets you control drones with Jedi-like arm gestures, Facebook’s 'Blender' open-source chatbot, TensorFlow’s new runtime, Facebook’s role-playing game that teaches AI to complete tasks by reading descriptions and Open AI’s neural nets that produce music.

You may also enjoy a look at machine learning on embedded systems, fine-tuning ResNet with Keras, TensorFlow, & deep learning, Google‘s AI tool for processing Paycheck Protection Program loans, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Google releases AI tool for processing Paycheck Protection Program loans

Google’s new PPP Lending AI integrates existing document ingestion tools in an effort to help lenders expedite application processing for Coronavirus relief funding.

Facebook’s role-playing game teaches AI to complete tasks by reading descriptions

Facebook researchers propose a game-like language challenge — Read to Fight Monsters (RTFM) — where ML agents must complete tasks by reading descriptions of their environment.

TFRT: A new TensorFlow runtime

TensorFlow open-sourced an early version of their new run-time, TFRT, that automates the process of optimizing graphs for different hardware while providing a 28% speed increase on GPUs.

Facebook releases its 'Blender' chatbot as an open-source project

Facebook releases its lifelike chatbot called “Blender”. The model contains a massive 9.4 billion parameters so it may not be immediately useful for your next project.

Ruha Benjamin on deep learning: Computational depth without sociological depth is ‘superficial learning’

Princeton University associate professor urges engineers deploying AI models to look beyond datasets.

Jukebox: A Generative Model for Music

Open AI introduces a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.

MIT presents AI frameworks that compress models and encourage agents to explore

MIT researchers investigate new ways to motivate software agents to explore their environment and pruning algorithms to make AI apps run faster.

Mobile + Edge

Make deep learning models run fast on embedded hardware

Explore how to make models running on edge devices smaller and computationally cheap.

Go read this analysis of the new iPhone SE’s AI-powered depth camera system

iPhone SE’s AI-powered depth camera can estimate depth in flat images.

Machine learning on embedded systems [Lecture]

Learn how to build models that run on embedded systems.

Samsung AI Uses WiFi Signals to Generate Consistent In-Home User Localization Data

In a recent paper from researchers at Samsung uses WiFi signals to establish a submeter-level localization system that employs WiFi propagation characteristics as users’ location fingerprints.

Amazon’s AI uses a microphone array to localize multiple speakers in a room

A group of Amazon researchers propose an AI-driven approach to multiple-source localization.

MIT’s new wearable lets you control drones with Jedi-like arm gestures

The Conduct-A-Bot system uses muscle and motion sensors to pilot robots.


Fine-tuning ResNet with Keras, TensorFlow, and Deep Learning

Learn how to fine-tune ResNet using Keras, TensorFlow, and Deep Learning.

Time Series Classification for Human Activity Recognition with LSTMs using TensorFlow 2 and Keras

Learn how to classify human activity from accelerometer data with Keras and TensorFlow 2 in Python

Conv2d: Finally Understand What Happens in the Forward Pass

A visual and mathematical explanation of the 2D convolution layer and its arguments.

Difference Between Algorithms and Models in Machine Learning

In this post, you will discover the difference between machine learning “algorithms” and “models.”

Yet More Google Compute Cluster Trace Data

Google releases a new trace dataset for the month of May 2019 covering eight Google compute clusters.

Libraries & Code


Xiaomi’s micro version of their Mace framework for embedded ML.


Everything we actually know about the Apple Neural Engine (ANE)


An implementation of GhostNet for Tensorflow 2.1. (From the paper "GhostNet: More Features from Cheap Operations")

Papers & Publications

Politeness Transfer: A Tag and Generate Approach

Abstract: "This paper introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning. We also provide a dataset of more than 1.39 instances automatically labeled for politeness to encourage benchmark evaluations on this new task. We design a tag and generate pipeline that identifies stylistic attributes and subsequently generates a sentence in the target style while preserving most of the source content. For politeness as well as five other transfer tasks, our model outperforms the state-of-the-art methods on automatic metrics for content preservation, with a comparable or better performance on style transfer accuracy. Additionally, our model surpasses existing methods on human evaluations for grammaticality, meaning preservation and transfer accuracy across all the six style transfer tasks."

MakeItTalk: Speaker-Aware Talking Head Animation

Abstract: "We present a method that generates expressive talking heads from a single facial image with audio as the only input. In contrast to previous approaches that attempt to learn direct mappings from audio to raw pixels or points for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking head dynamics. Another key component of our method is the prediction of facial landmarks reflecting speaker-aware dynamics. Based on this intermediate representation, our method is able to synthesize photorealistic videos of entire talking heads with full range of motion and also animate artistic paintings, sketches, 2D cartoon characters, Japanese mangas, stylized caricatures in a single unified framework. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking heads of significantly higher quality compared to prior state-of-the-art."

Consistent Video Depth Estimation

Abstract: "We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video. We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video. Unlike the ad-hoc priors in classical reconstruction, we use a learning-based prior, i.e., a convolutional neural network trained for single-image depth estimation. At test time, we fine-tune this network to satisfy the geometric constraints of a particular input video, while retaining its ability to synthesize plausible depth details in parts of the video that are less constrained. We show through quantitative validation that our method achieves higher accuracy and a higher degree of geometric consistency than previous monocular reconstruction methods. Visually, our results appear more stable. Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion. The improved quality of the reconstruction enables several applications, such as scene reconstruction and advanced video-based visual effects."

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

Abstract: "We propose a simple data augmentation technique that can be applied to standard model-free reinforcement learning algorithms, enabling robust learning directly from pixels without the need for auxiliary losses or pre-training. The approach leverages input perturbations commonly used in computer vision tasks to regularize the value function. Existing model-free approaches, such as Soft Actor-Critic (SAC), are not able to train deep networks effectively from image pixels. However, the addition of our augmentation method dramatically improves SAC's performance, enabling it to reach state-of-the-art performance on the DeepMind control suite, surpassing model-based (Dreamer, PlaNet, and SLAC) methods and recently proposed contrastive learning (CURL). Our approach can be combined with any model-free reinforcement learning algorithm, requiring only minor modifications."

Curated by Derrick Mwiti