Deep Learning Weekly Issue #145
Facebook's open source chatbot, OpenAI's music generation neural net, ML on embedded systems, and more...
This week in deep learning we bring you highlights from MIT’s new wearable that lets you control drones with Jedi-like arm gestures, Facebook’s 'Blender' open-source chatbot, TensorFlow’s new runtime, Facebook’s role-playing game that teaches AI to complete tasks by reading descriptions and Open AI’s neural nets that produce music.
You may also enjoy a look at machine learning on embedded systems, fine-tuning ResNet with Keras, TensorFlow, & deep learning, Google‘s AI tool for processing Paycheck Protection Program loans, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Google’s new PPP Lending AI integrates existing document ingestion tools in an effort to help lenders expedite application processing for Coronavirus relief funding.
Facebook researchers propose a game-like language challenge — Read to Fight Monsters (RTFM) — where ML agents must complete tasks by reading descriptions of their environment.
TensorFlow open-sourced an early version of their new run-time, TFRT, that automates the process of optimizing graphs for different hardware while providing a 28% speed increase on GPUs.
Facebook releases its lifelike chatbot called “Blender”. The model contains a massive 9.4 billion parameters so it may not be immediately useful for your next project.
Princeton University associate professor urges engineers deploying AI models to look beyond datasets.
Open AI introduces a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.
MIT researchers investigate new ways to motivate software agents to explore their environment and pruning algorithms to make AI apps run faster.
Mobile + Edge
Explore how to make models running on edge devices smaller and computationally cheap.
iPhone SE’s AI-powered depth camera can estimate depth in flat images.
Machine learning on embedded systems [Lecture]
Learn how to build models that run on embedded systems.
In a recent paper from researchers at Samsung uses WiFi signals to establish a submeter-level localization system that employs WiFi propagation characteristics as users’ location fingerprints.
A group of Amazon researchers propose an AI-driven approach to multiple-source localization.
The Conduct-A-Bot system uses muscle and motion sensors to pilot robots.
Learn how to fine-tune ResNet using Keras, TensorFlow, and Deep Learning.
Learn how to classify human activity from accelerometer data with Keras and TensorFlow 2 in Python
A visual and mathematical explanation of the 2D convolution layer and its arguments.
In this post, you will discover the difference between machine learning “algorithms” and “models.”
Google releases a new trace dataset for the month of May 2019 covering eight Google compute clusters.
Libraries & Code
Xiaomi’s micro version of their Mace framework for embedded ML.
Everything we actually know about the Apple Neural Engine (ANE)
An implementation of GhostNet for Tensorflow 2.1. (From the paper "GhostNet: More Features from Cheap Operations")
Papers & Publications
Abstract: "This paper introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning. We also provide a dataset of more than 1.39 instances automatically labeled for politeness to encourage benchmark evaluations on this new task. We design a tag and generate pipeline that identifies stylistic attributes and subsequently generates a sentence in the target style while preserving most of the source content. For politeness as well as five other transfer tasks, our model outperforms the state-of-the-art methods on automatic metrics for content preservation, with a comparable or better performance on style transfer accuracy. Additionally, our model surpasses existing methods on human evaluations for grammaticality, meaning preservation and transfer accuracy across all the six style transfer tasks."
MakeItTalk: Speaker-Aware Talking Head Animation
Abstract: "We present a method that generates expressive talking heads from a single facial image with audio as the only input. In contrast to previous approaches that attempt to learn direct mappings from audio to raw pixels or points for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking head dynamics. Another key component of our method is the prediction of facial landmarks reflecting speaker-aware dynamics. Based on this intermediate representation, our method is able to synthesize photorealistic videos of entire talking heads with full range of motion and also animate artistic paintings, sketches, 2D cartoon characters, Japanese mangas, stylized caricatures in a single unified framework. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking heads of significantly higher quality compared to prior state-of-the-art."
Abstract: "We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video. We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video. Unlike the ad-hoc priors in classical reconstruction, we use a learning-based prior, i.e., a convolutional neural network trained for single-image depth estimation. At test time, we fine-tune this network to satisfy the geometric constraints of a particular input video, while retaining its ability to synthesize plausible depth details in parts of the video that are less constrained. We show through quantitative validation that our method achieves higher accuracy and a higher degree of geometric consistency than previous monocular reconstruction methods. Visually, our results appear more stable. Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion. The improved quality of the reconstruction enables several applications, such as scene reconstruction and advanced video-based visual effects."
Abstract: "We propose a simple data augmentation technique that can be applied to standard model-free reinforcement learning algorithms, enabling robust learning directly from pixels without the need for auxiliary losses or pre-training. The approach leverages input perturbations commonly used in computer vision tasks to regularize the value function. Existing model-free approaches, such as Soft Actor-Critic (SAC), are not able to train deep networks effectively from image pixels. However, the addition of our augmentation method dramatically improves SAC's performance, enabling it to reach state-of-the-art performance on the DeepMind control suite, surpassing model-based (Dreamer, PlaNet, and SLAC) methods and recently proposed contrastive learning (CURL). Our approach can be combined with any model-free reinforcement learning algorithm, requiring only minor modifications."
Curated by Derrick Mwiti