Deep Learning Weekly: Issue #221
Facebook’s long-term project on egocentric perception, a paper on a state-of-the-art 3D-aware image synthesis model, an analysis of aligned StyleGAN models, and more
Hey folks,
This week in deep learning, we bring you Facebook's long-term project on egocentric perception, reflections on foundation models, a PyTorch-based library for semi-supervised learning, and a paper on a state-of-the-art 3D-aware image synthesis model.
You may also enjoy Google’s new algorithm for computing representative points within the framework of differential privacy, a graphical image annotation tool, a paper on non-deep networks, an analysis of aligned StyleGAN models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Teaching AI to perceive the world through your eyes
Facebook AI is announcing Ego4D, an ambitious long-term project aimed at solving research challenges in egocentric perception.
Smart image analysis algorithms, fed by cameras carried by drones and ground vehicles, are helping power companies that prevent forest fires.
How Moveworks’ AI platform broke through the multilingual NLP barrier
Moveworks announces support for French, Italian, German, Spanish, and Portuguese languages on its platform.
AWS launches its AWS Panorama computer vision appliance into general availability
Amazon Web Services Inc. announced the general availability of AWS Panorama, an appliance that companies can deploy at locations such as factories to run computer vision software on-premises.
Mailchimp wants to optimize your email campaigns using AI — here's how
Earlier this month, Mailchimp released Content Optimizer, a new product that uses artificial intelligence to help improve the performance of email marketing campaigns.
Mobile & Edge
Mapping Dance syncs movement and stage lighting using tinyML
A TinyML implementation that is used to monitor a performer’s moves with a Nano 33 BLE Sense, triggering the appropriate AV experience based on the recognized movement.
GTC Panel Offers Four Views on Edge AI
Hear companies in a GTC panel talk about pioneering paths to improve operational efficiency with intelligent video analytics.
Squats Counter can count the number of squats performed by an individual using the onboard accelerometer readings and a TinyML model.
Learning
Reflections on Foundation Models
A comprehensive collection of reflections on foundation models, their trajectory, and underlying philosophies.
How Underspecification Presents Challenges for Machine Learning
A summary of Google’s paper on a key failure mode that is especially prevalent in modern ML systems.
Northern Arizona University Researchers Use AI to Study Distant Cultures
A podcast on researchers using GPU-based deep learning algorithms to categorize sherds — tiny fragments of ancient pottery.
Practical Differentially Private Clustering
A blog showcasing Google’s new algorithm for computing representative points (cluster centers) within the framework of differential privacy.
The Age of Machine Learning As Code Has Arrived
An article from Hugging Face highlighting the current relationship between machine learning and software engineering, and what practitioners should expect in the future.
Libraries & Code
TorchSSL/TorchSSL: A PyTorch-based library for semi-supervised learning
An all-in-one toolkit based on PyTorch for semi-supervised learning (SSL).
A graphical image annotation tool that is written in Python and uses Qt for its graphical interface.
Papers & Publications
CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis
Abstract:
The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses. The recently proposed NeRF-based GANs made great progress towards 3D-aware generators, but they are unable to generate high-quality images yet. This paper presents CIPS-3D, a style-based, 3D-aware generator that is composed of a shallow NeRF network and a deep implicit neural representation (INR) network. The generator synthesizes each pixel value independently without any spatial convolution or upsampling operation. In addition, we diagnose the problem of mirror symmetry that implies a suboptimal solution and solve it by introducing an auxiliary discriminator. Trained on raw, single-view images, CIPS-3D sets new records for 3D-aware image synthesis with an impressive FID of 6.97 for images at the 256×256 resolution on FFHQ. We also demonstrate several interesting directions for CIPS-3D such as transfer learning and 3D-aware face stylization.
Abstract:
Depth is the hallmark of deep neural networks. But more depth means more sequential computation and higher latency. This begs the question -- is it possible to build high-performing "non-deep" neural networks? We show that it is. To do so, we use parallel subnetworks instead of stacking one layer after another. This helps effectively reduce depth while maintaining high performance. By utilizing parallel substructures, we show, for the first time, that a network with a depth of just 12 can achieve top-1 accuracy over 80% on ImageNet, 96% on CIFAR10, and 81% on CIFAR100. We also show that a network with a low-depth (12) backbone can achieve an AP of 48% on MS-COCO. We analyze the scaling rules for our design and show how to increase performance without changing the network's depth. Finally, we provide a proof of concept for how non-deep networks could be used to build low-latency recognition systems
StyleAlign: Analysis and Applications of Aligned StyleGAN Models
Abstract:
In this paper, we perform an in-depth study of the properties and applications of aligned generative models. We refer to two models as aligned if they share the same architecture, and one of them (the child) is obtained from the other (the parent) via fine-tuning to another domain, a common practice in transfer learning. Several works already utilize some basic properties of aligned StyleGAN models to perform image-to-image translation. Here, we perform the first detailed exploration of model alignment, also focusing on StyleGAN. First, we empirically analyze aligned models and provide answers to important questions regarding their nature. In particular, we find that the child model's latent spaces are semantically aligned with those of the parent, inheriting incredibly rich semantics, even for distant data domains such as human faces and churches. Second, equipped with this better understanding, we leverage aligned models to solve a diverse set of tasks. In addition to image translation, we demonstrate fully automatic cross-domain image morphing. We further show that zero-shot vision tasks may be performed in the child domain, while relying exclusively on supervision in the parent domain. We demonstrate qualitatively and quantitatively that our approach yields state-of-the-art results, while requiring only simple fine-tuning and inversion.