| Learning the Depths of Moving People by Watching Frozen People |
Abstract: We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving....In this paper, we take a data-driven approach and learn human depth priors from a new source of data: thousands of Internet videos of people imitating mannequins, i.e., freezing in diverse, natural poses, while a hand-held camera tours the scene. Since the people are stationary, training data can be created from these videos using multi-view stereo reconstruction. At inference time, our method uses motion parallax cues from the static areas of the scenes, and shows clear improvement over state-of-the-art monocular depth prediction methods…
Fashion++: Minimal Edits for Outfit Improvement
Abstract: Minimal outfit edits suggest minor changes to an existing outfit in order to improve its fashionability. For example, changes might entail (left) removing an accessory; (middle) changing to a blouse with higher neckline; (right) tucking in a shirt. Our model consists of a deep image generation neural network that learns to synthesize clothing conditioned on learned per-garment encodings. The latent encodings are explicitly factorized according to shape and texture, thereby allowing direct edits for both fit/presentation and color/patterns/material, respectively. We show how to bootstrap Web photos to automatically train a fashionability model, and develop an activation maximization-style approach to transform the input image into its more fashionable self….