Deep Learning Weekly Issue #114
Detecting Photoshops, TensorFlow Text, Facebook's photo-realistic simulator, PizzaGAN and more...
This week in deep learning we bring you a photoshop detector from Adobe, beauty try-on in YouTube, text tools in TensorFlow, and a car pose API.
You may also be interested in a new speech synthesis model from Google, weight agnostic neural networks, photo-realistic simulation environments from Facebook, and a GAN-based pizza maker.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Adobe is now making tools to figure out if someone used their tools to manipulate an image.
YouTube viewers can use augmented reality and face tracking to try on virtual makeup and follow along with other creators.
New tools for manipulating text and building NLP models with TensorFlow.
Google applies evolution-based neural architecture search to find better transformer architectures for solving sequence-to-sequence problems.
A 2D pose estimation model for tracking cars.
Results from Capacitron, Google’s latest iteration on the Tacotron speech synthesis model, are very impressive.
A new training technique that searches for model architectures that can solve tasks rather than training weights.
Generating images by compositing layers created by GANs.
Informative deep dive into very practical techniques to achieve better performance for self-driving RC cars.
Training a model to detect a cat trying to bring prey into the house.
Libraries & Code
A very early alpha for the Keras hyperparameter tuning project is up.
Facebook open sources tools to create photo-realistic environments that can be used to train agents via reinforcement learning.
A PyTorch implementation of "Robust Universal Neural Vocoding"
Papers & Publications
Abstract: In this paper, we propose Text2Scene, a model that generates various forms of compositional scene representations from natural language descriptions. Unlike recent works, our method does NOT use Generative Adversarial Networks (GANs). Text2Scene instead learns to sequentially generate objects and their attributes (location, size, appearance, etc) at every time step by attending to different parts of the input text and the current status of the generated scene. We show that under minor modifications, the proposed framework can handle the generation of different forms of scene representations, including cartoon-like scenes, object layouts corresponding to real images, and synthetic images….
Abstract: ….We describe an unsupervised version of capsule networks, in which a neural encoder, which looks at all of the parts, is used to infer the presence and poses of object capsules. The encoder is trained by backpropagating through a decoder, which predicts the pose of each already discovered part using a mixture of pose predictions. The parts are discovered directly from an image, in a similar manner, by using a neural encoder, which infers parts and their affine transformations. The corresponding decoder models each image pixel as a mixture of predictions made by affine-transformed parts. We learn object- and their part-capsules on unlabeled data, and then cluster the vectors of presences of object capsules…