Deep Learning Weekly: Issue #221
Facebook’s long-term project on egocentric perception, a paper on a state-of-the-art 3D-aware image synthesis model, an analysis of aligned StyleGAN models, and more
This week in deep learning, we bring you Facebook's long-term project on egocentric perception, reflections on foundation models, a PyTorch-based library for semi-supervised learning, and a paper on a state-of-the-art 3D-aware image synthesis model.
You may also enjoy Google’s new algorithm for computing representative points within the framework of differential privacy, a graphical image annotation tool, a paper on non-deep networks, an analysis of aligned StyleGAN models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Facebook AI is announcing Ego4D, an ambitious long-term project aimed at solving research challenges in egocentric perception.
Smart image analysis algorithms, fed by cameras carried by drones and ground vehicles, are helping power companies that prevent forest fires.
Moveworks announces support for French, Italian, German, Spanish, and Portuguese languages on its platform.
Amazon Web Services Inc. announced the general availability of AWS Panorama, an appliance that companies can deploy at locations such as factories to run computer vision software on-premises.
Earlier this month, Mailchimp released Content Optimizer, a new product that uses artificial intelligence to help improve the performance of email marketing campaigns.
Mobile & Edge
A TinyML implementation that is used to monitor a performer’s moves with a Nano 33 BLE Sense, triggering the appropriate AV experience based on the recognized movement.
Hear companies in a GTC panel talk about pioneering paths to improve operational efficiency with intelligent video analytics.
Squats Counter can count the number of squats performed by an individual using the onboard accelerometer readings and a TinyML model.
A comprehensive collection of reflections on foundation models, their trajectory, and underlying philosophies.
A summary of Google’s paper on a key failure mode that is especially prevalent in modern ML systems.
A podcast on researchers using GPU-based deep learning algorithms to categorize sherds — tiny fragments of ancient pottery.
A blog showcasing Google’s new algorithm for computing representative points (cluster centers) within the framework of differential privacy.
An article from Hugging Face highlighting the current relationship between machine learning and software engineering, and what practitioners should expect in the future.
Libraries & Code
An all-in-one toolkit based on PyTorch for semi-supervised learning (SSL).
A graphical image annotation tool that is written in Python and uses Qt for its graphical interface.
Papers & Publications
The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses. The recently proposed NeRF-based GANs made great progress towards 3D-aware generators, but they are unable to generate high-quality images yet. This paper presents CIPS-3D, a style-based, 3D-aware generator that is composed of a shallow NeRF network and a deep implicit neural representation (INR) network. The generator synthesizes each pixel value independently without any spatial convolution or upsampling operation. In addition, we diagnose the problem of mirror symmetry that implies a suboptimal solution and solve it by introducing an auxiliary discriminator. Trained on raw, single-view images, CIPS-3D sets new records for 3D-aware image synthesis with an impressive FID of 6.97 for images at the 256×256 resolution on FFHQ. We also demonstrate several interesting directions for CIPS-3D such as transfer learning and 3D-aware face stylization.
Depth is the hallmark of deep neural networks. But more depth means more sequential computation and higher latency. This begs the question -- is it possible to build high-performing "non-deep" neural networks? We show that it is. To do so, we use parallel subnetworks instead of stacking one layer after another. This helps effectively reduce depth while maintaining high performance. By utilizing parallel substructures, we show, for the first time, that a network with a depth of just 12 can achieve top-1 accuracy over 80% on ImageNet, 96% on CIFAR10, and 81% on CIFAR100. We also show that a network with a low-depth (12) backbone can achieve an AP of 48% on MS-COCO. We analyze the scaling rules for our design and show how to increase performance without changing the network's depth. Finally, we provide a proof of concept for how non-deep networks could be used to build low-latency recognition systems
In this paper, we perform an in-depth study of the properties and applications of aligned generative models. We refer to two models as aligned if they share the same architecture, and one of them (the child) is obtained from the other (the parent) via fine-tuning to another domain, a common practice in transfer learning. Several works already utilize some basic properties of aligned StyleGAN models to perform image-to-image translation. Here, we perform the first detailed exploration of model alignment, also focusing on StyleGAN. First, we empirically analyze aligned models and provide answers to important questions regarding their nature. In particular, we find that the child model's latent spaces are semantically aligned with those of the parent, inheriting incredibly rich semantics, even for distant data domains such as human faces and churches. Second, equipped with this better understanding, we leverage aligned models to solve a diverse set of tasks. In addition to image translation, we demonstrate fully automatic cross-domain image morphing. We further show that zero-shot vision tasks may be performed in the child domain, while relying exclusively on supervision in the parent domain. We demonstrate qualitatively and quantitatively that our approach yields state-of-the-art results, while requiring only simple fine-tuning and inversion.