Deep Learning Weekly Issue #159

GPT-3 blog reaches the top of HackerNews, on-device body pose tracking, AI-accelerated MRIs, and more

Hey folks,

This week in deep learning we bring you Google's open-source Language Interpretability Tool for evaluating natural language models, AI-accelerated MRIs from FastMRI that are diagnostically interchangeable with traditional MRIs, even faster TensorFlow Lite mobile GPU inference with OpenCL, and on-device, real-time body pose tracking with MediaPipe BlazePose.

You may also enjoy this minimal PyTorch implementation of the OpenAI GPT model and learning algorithm, the Keras example implementation of CycleGAN for image-to-image translation and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


FastMRI breakthrough shows AI-accelerated MRIs interchangeable with traditional MRIs

In a rigorous new clinical study, radiologists found fastMRI’s AI-generated images — created with about 4x less data from the scanning machine — were diagnostically interchangeable with traditional MRIs.

A college kid created a fake, AI-generated blog. It reached #1 on Hacker News.

“It was super easy actually,” he says, “which was the scary part.”

AI-enabled future crimes ranked: Deepfakes, spearphishing, and more

A study explores the possible range and risk of attacks from military robots and autonomous attack drones to AI-assisted stalking. Here are the top 5.

Google open-sources LIT, a toolset for evaluating natural language models

Google-affiliated researchers released the Language Interpretability Tool (LIT), an open source, framework-agnostic platform and API for visualizing, understanding, and auditing natural language processing models.

Deepfake video app Reface is just getting started on shapeshifting selfie culture

Selfie culture has a fancy new digital looking glass: Reface (previously Doublicat) is an app that uses AI-powered deepfake technology to let users try on another face/form for size.

Mobile + Edge

Google, Harvard, and EdX Team Up to Offer TinyML Training

A new certification program aims to foster more development in a segment of machine learning that would run on small devices at the edge.

Even Faster Mobile GPU Inference with OpenCL — The TensorFlow Blog

This post announces the official launch of TFLite’s OpenCL-based mobile GPU inference engine for Android, which offers up to ~2x speedup over the existing OpenGL backend.

Chip startup Blaize debuts AI modules for machine learning at the edge

Venture-backed chip startup Blaize Inc. has introduced three new hardware modules for running artificial intelligence models at edge locations such as factories.

On-device, Real-time Body Pose Tracking with MediaPipe BlazePose

In contrast to current pose models based on the standard COCO topology, BlazePose accurately localizes more keypoints, making it uniquely suited for fitness applications.


A Simulation Suite for Tackling Applied Reinforcement Learning Challenges

The Real-World RL suite is a set of simulated tasks inspired by applied reinforcement learning challenges, the goal of which is to enable fast algorithmic iterations for both researchers and practitioners, without having to run slow, expensive experiments on real-systems.

REALM: Integrating Retrieval into Language Representation Models

Google AI researchers developed a novel paradigm for language model pre-training, which augments a language representation model with a knowledge retriever, allowing models to retrieve textual world knowledge explicitly from raw text documents, instead of memorizing all the knowledge in the model parameters.

Keras + KerasTuner best practices

This notebook presents how to use KerasTuner to find a high-performing model in just a few lines of code.


[GitHub] Wangjing1551/LogoDet-3K-Dataset

This dataset contains 158,652 images with ~200,000 manually annotated logo objects across 3,000 logo categories.

Libraries & Code

[GitHub] karpathy/minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

[GitHub] pair-code/lit

The Language Interpretability Tool: Interactively analyze NLP models for model understanding in an extensible and framework agnostic interface.

Keras - CycleGAN

Official Keras CycleGAN implementation.

Papers & Publications

Fast reinforcement learning with generalized policy updates

Abstract: The combination of reinforcement learning with deep learning is a promising approach to tackle important sequential decision-making problems that are currently intractable. One obstacle to overcome is the amount of data needed by learning systems of this type. In this article, we propose to address this issue through a divide-and-conquer approach. We argue that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel. By associating each task with a reward function, this problem decomposition can be seamlessly accommodated within the standard reinforcement-learning formalism. The specific way we do so is through a generalization of two fundamental operations in reinforcement learning: policy improvement and policy evaluation. The generalized version of these operations allow one to leverage the solution of some tasks to speed up the solution of others. If the reward function of a task can be well approximated as a linear combination of the reward functions of tasks previously solved, we can reduce a reinforcement-learning problem to a simpler linear regression. When this is not the case, the agent can still exploit the task solutions by using them to interact with and learn about the environment. Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.

Vid2Player: Controllable Video Sprites that Behave and Appear like Professional Tennis Players

Abstract: We present a system that converts annotated broadcast video of tennis matches into interactively controllable video sprites that behave and appear like professional tennis players. Our approach is based on controllable video textures, and utilizes domain knowledge of the cyclic structure of tennis rallies to place clip transitions and accept control inputs at key decision-making moments of point play. Most importantly, we use points from the video collection to model a player's court positioning and shot selection decisions during points. We use these behavioral models to select video clips that reflect actions the real-life player is likely to take in a given match play situation, yielding sprites that behave realistically at the macro level of full points, not just individual tennis motions. Our system can generate novel points between professional tennis players that resemble Wimbledon broadcasts, enabling new experiences such as the creation of matchups between players that have not competed in real life, or interactive control of players in the Wimbledon final. According to expert tennis players, the rallies generated using our approach are significantly more realistic in terms of player behavior than video sprite methods that only consider the quality of motion transitions during video synthesis.