Deep Learning Weekly Issue #126
AI assistant preferences, Pinterest's Lens, a SimCity-based gym, on-device training with Core ML 3, and more...
This week in deep learning we bring you a study on AI assistant preferences from Apple, a new real-time strategy gym from Facebook, a request for comments on AI patents from the USPTO, and a new conditional transformer architecture from Salesforce.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
It turns out that people love talking to themselves.
The final major release of Keras brings support for a TF 2.0 backend. Maintainers now recommend users switch to tf.keras for future work.
Visual search at a massive scale.
Tomorrow’s wars will be faster, more high-tech, and less human than ever before. Welcome to a new era of machine-driven warfare.
The US Patent office is seeking comments on what to do about increasing numbers of AI patents.
Facebook open sources a real-time strategy AI gym.
The third installment of a deep dive into on-device model training with Apple’s most recent release of Core ML.
Nearly everything you need to know in 2019.
Google creates VideoBERT, using transformers to learn representations for videos.
GPT-2 is used to generate a unique, text-based game each time a user plays.
François Chollet, original creator of Keras, sits down with Lex Fridman.
A short explainer video on backpropagation.
Google announces a new dataset of conversations where an assistant elicits movie preferences from a user.
Libraries & Code
PyTorch implementations of deep reinforcement learning algorithms and environments.
An interface with micropolis for city-building agents, packaged as an OpenAI gym environment.
Papers & Publications
Abstract: We present an approach to recover absolute 3D human poses from multi-view images by incorporating multi-view geometric priors in our model. It consists of two separate steps: (1) estimating the 2D poses in multi-view images and (2) recovering the 3D poses from the multi-view 2D poses. First, we introduce a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views. Consequently, the 2D pose estimation for each view already benefits from other views. Second, we present a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D poses. It gradually improves the accuracy of 3D pose with affordable computational cost….
Abstract: Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.6 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data via model-based source attribution. We have released multiple full-sized, pretrained versions of CTRL at github.com/salesforce/ctrl.