Deep Learning Weekly Issue #154
A self-driving car demo with TFLite; SOTA voice separation; corporate deepfake use; working with GANs; and more
This week in deep learning we bring you a new, state-of-the-art voice separation model that distinguishes multiple speakers simultaneously, a robotic painter from Carnegie Mellon that learned human art techniques, and corporate use of deepfakes in training videos.
You may also enjoy Pixelopolis, a self-driving car demo from Google built with TF-Lite, this tutorial covering common GAN training problems and solutions, the SegFix method for refining segmentation prediction boundaries, or this research on evolving ML algorithms from scratch and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Coronavirus restrictions make it harder and more expensive to shoot videos. So some companies are turning to synthetic media instead.
Carnegie Mellon’s robotic painter is a step toward AI that can learn art techniques by watching people
Can a robot painter learn from observing a human artist’s brushstrokes? That’s the question Carnegie Mellon University researchers set out to answer in a recent study (paper).
A new way to train AI systems could keep them safer from hackers
A GAN-like training method can make deep-learning-based image reconstruction systems less vulnerable to attacks (paper).
A new, state-of-the-art voice separation model that distinguishes multiple speakers simultaneously
Facebook AI introduced a new method to separate up to five voices speaking simultaneously on a single microphone.
Microsoft’s AI generates voices that sing in Chinese and English
Researchers at Zhejiang University and Microsoft claim they’ve developed an AI system — DeepSinger — that can generate singing voices in multiple languages by training on data from music websites
Mobile + Edge
Machine vision with low-cost camera modules
This article describes how to get a live image feed ready for computer vision applications using an Arduino Nano 33 BLE Sense and a low-cost VGA camera module.
Grounding Natural Language Instructions to Mobile UI Actions
In “Mapping Natural Language Instructions to Mobile UI Action Sequences”, published at ACL 2020, Google AI presents the first step towards addressing the problem of automatic action sequence mapping, creating three new datasets used to train deep learning models that ground natural language instructions to executable mobile UI actions.
Sharing Pixelopolis, a self-driving car demo from Google I/O built with TF-Lite
Pixelopolis is an interactive installation that showcases self-driving miniature cars powered by TensorFlow Lite.
Introducing cAInvas for TinyML devices
Target your favorite ML models to tiny devices.
TensorFlow 2 meets the Object Detection API
TF Object Detection API (OD API) officially supports TensorFlow 2.
AutoML-Zero: Evolving Code that Learns
Google AI recently demonstrated that it is possible to successfully evolve ML algorithms from scratch. Their approach, called AutoML-Zero, starts from empty programs and, using only basic mathematical operations as building blocks, applies evolutionary methods to automatically find the code for complete ML algorithms.
What is going on with my GAN?
The challenges, solutions, and future of GANs.
Duality — A New Approach to Reinforcement Learning
In “Reinforcement Learning via Fenchel-Rockafellar Duality” researchers at Google AI have developed a new approach to RL that enables algorithms that are both useful in practice and mathematically principled — that is to say, the proposed algorithms avoid the use of exceedingly rough approximations to translate their mathematical foundations to practical implementation.
Libraries & Code
The Pytorch implementation of OCNet series and SegFix.
Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai.
Papers & Publications
SegFix: Model-Agnostic Boundary Refinement for Segmentation
Abstract: We present a model-agnostic post-processing scheme to improve the boundary quality for the segmentation result that is generated by any existing segmentation model. Motivated by the empirical observation that the label predictions of interior pixels are more reliable, we propose to replace the originally unreliable predictions of boundary pixels by the predictions of interior pixels. Our approach processes only the input image through two steps: (i) localize the boundary pixels and (ii) identify the corresponding interior pixel for each boundary pixel. We build the correspondence by learning a direction away from the boundary pixel to an interior pixel. Our method requires no prior information of the segmentation models and achieves nearly real-time speed. We empirically verify that our SegFix consistently reduces the boundary errors for segmentation results generated from various state-of-the-art models on Cityscapes, ADE20K and GTA5. Code is available at: this https URL.
Long-term Human Motion Prediction with Scene Context
Abstract: Human movement is goal-directed and influenced by the spatial layout of the objects in the scene. To plan future human motion, it is crucial to perceive the environment -- imagine how hard it is to navigate a new room with lights off. Existing works on predicting human motion do not pay attention to the scene context and thus struggle in long-term prediction. In this work, we propose a novel three-stage framework that exploits scene context to tackle this task. Given a single scene image and 2D pose histories, our method first samples multiple human motion goals, then plans 3D human paths towards each goal, and finally predicts 3D human pose sequences following each path. For stable training and rigorous evaluation, we contribute a diverse synthetic dataset with clean annotations. In both synthetic and real datasets, our method shows consistent quantitative and qualitative improvements over existing methods.