Deep Learning Weekly Issue #179

OpenAI's 2 new models combining images & text, DeepMind mastering games w/ DL, the year of the deepfake, & more

Hey folks,

Happy new year, and welcome back to our first weekly roundup of 2021! This week in deep learning we bring you DeepMind's MuZero that masters games while learning how to play them, how quantum computing relates to AI, this paper on machine translation of manga, and Chatroulette's comeback with the help of AI-based nudity censoring.

You may also enjoy learning about OpenAI’s two new models: DALL·E: Creating Images from Text and CLIP: Connecting Text and Images, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


No rules, no problem: DeepMind's MuZero masters games while learning how to play them

DeepMind has made it a mission to show that not only can an AI truly become proficient at a game, it can do so without even being told the rules. Its newest AI agent, called MuZero, accomplishes this not just with visually simple games with complex strategies, like Go, Chess and Shogi, but with visually complex Atari games.

Chatroulette Is On the Rise Again—With Help From AI

The hottest app of early 2010 faded quickly when it was flooded with unwanted nudity. Smarter content moderation is helping to revive it.

Why 2020 was a pivotal, contradictory year for facial recognition

The racial justice movement pushed problems with the technology into public consciousness—but despite scandals and bans, its growth isn’t slowing.

A Clever Strategy to Distribute Covid Aid—With Satellite Data

The small nation of Togo used image analysis algorithms to target economic support for its most vulnerable residents.

The year deepfakes went mainstream

In 2020, AI-synthetic media started moving away from the darker corners of the internet.

Mobile + Edge

Google introduces Entity Extraction, Selfie Segmentation APIs to ML Kit

Google has introduced two new APIs to ML Kit: Entity Extraction and Selfie Segmentation.

The Next Wave Of Connectivity

“5G will change how we manage our compute loads,” explains StClair, “no matter where you are in the world, what 5G enables you to do is bring machine learning and AI right to the edge of the network.”

Swipeless Tinder Using iOS 14 Vision Hand Pose Estimation

This tutorial uses the power of Apple’s Vision Framework to detect hand gestures on iOS in order to create a gesture-based "swipe" interface.


When BERT Plays The Lottery, All Tickets Are Winning

Does the Lottery Ticket Hypothesis hold for BERT? Anna Roger's experiments with structured and magnitude pruning reveal some good and some bad news.

Knocking on Turing's door: Quantum Computing and Machine Learning

Quantum computing promises exponential compute power that we have yet to attain due to decoherence and cryogenic requirements, and is in a fascinating place; check out The Gradient’s latest piece by Ather Fawaz on the state of quantum computing and how it relates to AI.

End-to-End, Transferable Deep RL for Graph Optimization

In “Transferable Graph Optimizers for ML Compilers”, recently published as an oral paper at NeurIPS 2020, Google AI proposes an end-to-end, transferable deep reinforcement learning method for computational graph optimization.

DALL·E: Creating Images from Text

OpenAI trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language.

CLIP: Connecting Text and Images

OpenAI is introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP (Contrastive Language–Image Pre-training) can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ”zero-shot” capabilities of GPT-2 and 3.


[GitHub] facebookresearch/deit

This repository contains PyTorch evaluation code, training code and pretrained models for DeiT (Data-Efficient Image Transformers).

[GitHub] openai/CLIP

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs.

Papers & Publications

Towards Fully Automated Manga Translation

Abstract: We tackle the problem of machine translation of manga, Japanese comics. Manga translation involves two important problems in machine translation: context-aware and multimodal translation. Since text and images are mixed up in an unstructured fashion in Manga, obtaining context from the image is essential for manga translation. However, it is still an open problem how to extract context from image and integrate into MT models. In addition, corpus and benchmarks to train and evaluate such model is currently unavailable. In this paper, we make the following four contributions that establishes the foundation of manga translation research. First, we propose multimodal context-aware translation framework. We are the first to incorporate context information obtained from manga image. It enables us to translate texts in speech bubbles that cannot be translated without using context information (e.g., texts in other speech bubbles, gender of speakers, etc.). Second, for training the model, we propose the approach to automatic corpus construction from pairs of original manga and their translations, by which large parallel corpus can be constructed without any manual labeling. Third, we created a new benchmark to evaluate manga translation. Finally, on top of our proposed methods, we devised a first comprehensive system for fully automated manga translation.

Evaluating Agents without Rewards

Abstract: Reinforcement learning has enabled agents to solve challenging tasks in unknown environments. However, manually crafting reward functions can be time consuming, expensive, and error prone to human error. Competing objectives have been proposed for agents to learn without external supervision, but it has been unclear how well they reflect task rewards or human behavior. To accelerate the development of intrinsic objectives, we retrospectively compute potential objectives on pre-collected datasets of agent behavior, rather than optimizing them online, and compare them by analyzing their correlations. We study input entropy, information gain, and empowerment across seven agents, three Atari games, and the 3D game Minecraft. We find that all three intrinsic objectives correlate more strongly with a human behavior similarity metric than with task reward. Moreover, input entropy and information gain correlate more strongly with human similarity than task reward does, suggesting the use of intrinsic objectives for designing agents that behave similarly to human players.