Deep Learning Weekly Issue #173
Facebook's battle with harmful content, TensorFlow for Mac, Language Interpretability tools from Google, & more
This week in deep learning we bring you this article about how AI is transforming medical imaging, how role-playing a dragon can teach an AI to manipulate and persuade, this article about Facebook’s improved AI that isn’t preventing harmful content from spreading, and this reinforcement learning library for automated stock training.
You may also enjoy learning about interpretability in machine learning, how to colorize images in an iOS App using DeOldify and a Flask API, how to build a keyword spotting model with your own voice in 30K RAM, how very deep VAEs generalize Autoregressive Models and can outperform them on images, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Facebook claims it’s becoming better at detecting — and removing — objectionable content from its platform, despite the fact that misleading, untrue, and otherwise harmful posts continue to make their way into millions of users’ feeds.
In recent years there has been tremendous AI work in the field of medical imaging mainly focusing on cardiovascular, ophthalmology, neurology, and cancer detection.
A faster way to estimate uncertainty in AI-assisted decision-making could lead to safer outcomes.
Combining natural-language processing and reinforcement learning in a text-based adventure game shows machines how to use language as a tool.
The process used to build most of the machine-learning models we use today can't tell if they will work in the real world or not—and that’s a problem.
Mobile + Edge
Apple’s new Mac-optimized TensorFlow 2.4 fork lets you speed up training on Macs, resulting in up to 7x faster performance on platforms with the new M1 chip!
In collaboration with nonprofit organization Guiding Eyes for the Blind, Google today piloted an AI system called Project Guideline, designed to help blind and low-vision people run races independently with just a smartphone.
Build an API hosted on Colab with a free GPU that performs image colorization, and consume it with an iOS application.
This tutorial guides you through every step required to build a real TinyML model that responds to your voice.
This essay provides a broad overview of the sub-field of machine learning interpretability.
In this tutorial you will learn how to build image pairs for training siamese networks.
The new Smart Scrolling feature for the Recorder app uses a lightweight, on-device ML model to automatically mark important sections in a transcript and surface representative keywords on the scrollbar to allow easy searching and navigation.
The Language Interpretability Tool from Google is an interactive platform to explore and better understand the behavior of NLP models using a number of approaches, from visualization to counterfactual generation and others.
Libraries & Code
Visual Studio Code extension to quickly generate docstrings for python functions using AI(NLP) technology.
A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance.
Papers & Publications
Abstract: We present a hierarchical VAE that, for the first time, outperforms the PixelCNN in log-likelihood on all natural image benchmarks. We begin by observing that VAEs can actually implement autoregressive models, and other, more efficient generative models, if made sufficiently deep. Despite this, autoregressive models have traditionally outperformed VAEs. We test if insufficient depth explains the performance gap by by scaling a VAE to greater stochastic depth than previously explored and evaluating it on CIFAR-10, ImageNet, and FFHQ. We find that, in comparison to the PixelCNN, these very deep VAEs achieve higher likelihoods, use fewer parameters, generate samples thousands of times faster, and are more easily applied to high-resolution images. We visualize the generative process and show the VAEs learn efficient hierarchical visual representations. We release our source code and models at this https URL.
Abstract: We tackle human image synthesis, including human motion imitation, appearance transfer, and novel view synthesis, within a unified framework. It means that the model, once being trained, can be used to handle all these tasks. The existing task-specific methods mainly use 2D keypoints to estimate the human body structure. However, they only express the position information with no abilities to characterize the personalized shape of the person and model the limb rotations. In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape. It can not only model the joint location and rotation but also characterize the personalized body shape. To preserve the source information, such as texture, style, color, and face identity, we propose an Attentional Liquid Warping GAN with Attentional Liquid Warping Block (AttLWB) that propagates the source information in both image and feature spaces to the synthesized reference. Specifically, the source features are extracted by a denoising convolutional auto-encoder for characterizing the source identity well. Furthermore, our proposed method can support a more flexible warping from multiple sources. To further improve the generalization ability of the unseen source images, a one/few-shot adversarial learning is applied. In detail, it firstly trains a model in an extensive training set. Then, it finetunes the model by one/few-shot unseen image(s) in a self-supervised way to generate high-resolution (512 x 512 and 1024 x 1024) results. Also, we build a new dataset, namely iPER dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis. Extensive experiments demonstrate the effectiveness of our methods in terms of preserving face identity, shape consistency, and clothes details. All codes and dataset are available on this https URL.