Deep Learning Weekly Issue #126
AI assistant preferences, Pinterest's Lens, a SimCity-based gym, on-device training with Core ML 3, and more...
Hey folks,
This week in deep learning we bring you a study on AI assistant preferences from Apple, a new real-time strategy gym from Facebook, a request for comments on AI patents from the USPTO, and a new conditional transformer architecture from Salesforce.
You may also enjoy a guide to on-device training with Core ML 3, an interview with François Chollet, a SimCity-based AI gym, and new research from Google using BERT to understand videos.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Apple study suggests chattier users prefer chattier AI assistants [VentureBeat]
It turns out that people love talking to themselves.
The final major release of Keras brings support for a TF 2.0 backend. Maintainers now recommend users switch to tf.keras for future work.
Pinterest’s Lens can now recognize 2.5 billion home and fashion objects [VentureBeat]
Visual search at a massive scale.
Coming Soon to a Battlefield: Robots That Can Kill [The Atlantic]
Tomorrow’s wars will be faster, more high-tech, and less human than ever before. Welcome to a new era of machine-driven warfare.
Request for Comments on Patenting Artificial Intelligence Inventions
The US Patent office is seeking comments on what to do about increasing numbers of AI patents.
[Facebook] Teaching AI to plan using language in a new open-source strategy game
Facebook open sources a real-time strategy AI gym.
Learning
On-device training with Core ML – part 3
The third installment of a deep dive into on-device model training with Apple’s most recent release of Core ML.
Research Guide for Video Frame Interpolation with Deep Learning
Nearly everything you need to know in 2019.
Learning Cross-Modal Temporal Representations from Unlabeled Videos
Google creates VideoBERT, using transformers to learn representations for videos.
The world’s most freakishly realistic text-generating A.I. just got gamified
GPT-2 is used to generate a unique, text-based game each time a user plays.
François Chollet, original creator of Keras, sits down with Lex Fridman.
[Video] How Backpropagation Works
A short explainer video on backpropagation.
Datasets
Google: Coached Conversational Preference Elicitation
Google announces a new dataset of conversations where an assistant elicits movie preferences from a user.
Libraries & Code
[GitHub] p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch
PyTorch implementations of deep reinforcement learning algorithms and environments.
An interface with micropolis for city-building agents, packaged as an OpenAI gym environment.
Papers & Publications
Cross View Fusion for 3D Human Pose Estimation
Abstract: We present an approach to recover absolute 3D human poses from multi-view images by incorporating multi-view geometric priors in our model. It consists of two separate steps: (1) estimating the 2D poses in multi-view images and (2) recovering the 3D poses from the multi-view 2D poses. First, we introduce a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views. Consequently, the 2D pose estimation for each view already benefits from other views. Second, we present a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D poses. It gradually improves the accuracy of 3D pose with affordable computational cost….
CTRL: A Conditional Transformer Language Model for Controllable Generation
Abstract: Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.6 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data via model-based source attribution. We have released multiple full-sized, pretrained versions of CTRL at github.com/salesforce/ctrl.