Deep Learning Weekly Issue #126

AI assistant preferences, Pinterest's Lens, a SimCity-based gym, on-device training with Core ML 3, and more...

Hey folks,

This week in deep learning we bring you a study on AI assistant preferences from Apple, a new real-time strategy gym from Facebook, a request for comments on AI patents from the USPTO, and a new conditional transformer architecture from Salesforce.

You may also enjoy a guide to on-device training with Core ML 3, an interview with François Chollet, a SimCity-based AI gym, and new research from Google using BERT to understand videos.

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Apple study suggests chattier users prefer chattier AI assistants [VentureBeat]

It turns out that people love talking to themselves.

Keras 2.3.0 released

The final major release of Keras brings support for a TF 2.0 backend. Maintainers now recommend users switch to tf.keras for future work.

Pinterest’s Lens can now recognize 2.5 billion home and fashion objects [VentureBeat]

Visual search at a massive scale.

Coming Soon to a Battlefield: Robots That Can Kill [The Atlantic]

Tomorrow’s wars will be faster, more high-tech, and less human than ever before. Welcome to a new era of machine-driven warfare.

Request for Comments on Patenting Artificial Intelligence Inventions

The US Patent office is seeking comments on what to do about increasing numbers of AI patents.

[Facebook] Teaching AI to plan using language in a new open-source strategy game

Facebook open sources a real-time strategy AI gym.


On-device training with Core ML – part 3

The third installment of a deep dive into on-device model training with Apple’s most recent release of Core ML.

Research Guide for Video Frame Interpolation with Deep Learning

Nearly everything you need to know in 2019.

Learning Cross-Modal Temporal Representations from Unlabeled Videos

Google creates VideoBERT, using transformers to learn representations for videos.

The world’s most freakishly realistic text-generating A.I. just got gamified

GPT-2 is used to generate a unique, text-based game each time a user plays.

[Video] François Chollet: Keras, Deep Learning, and the Progress of AI | Artificial Intelligence Podcast

François Chollet, original creator of Keras, sits down with Lex Fridman.

[Video] How Backpropagation Works

A short explainer video on backpropagation.


Google: Coached Conversational Preference Elicitation

Google announces a new dataset of conversations where an assistant elicits movie preferences from a user.

Libraries & Code

[GitHub] p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch

PyTorch implementations of deep reinforcement learning algorithms and environments.

[GitHub] smearle/gym-city

An interface with micropolis for city-building agents, packaged as an OpenAI gym environment.

Papers & Publications

Cross View Fusion for 3D Human Pose Estimation

Abstract: We present an approach to recover absolute 3D human poses from multi-view images by incorporating multi-view geometric priors in our model. It consists of two separate steps: (1) estimating the 2D poses in multi-view images and (2) recovering the 3D poses from the multi-view 2D poses. First, we introduce a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views. Consequently, the 2D pose estimation for each view already benefits from other views. Second, we present a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D poses. It gradually improves the accuracy of 3D pose with affordable computational cost….

CTRL: A Conditional Transformer Language Model for Controllable Generation

Abstract: Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.6 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data via model-based source attribution. We have released multiple full-sized, pretrained versions of CTRL at