Deep Learning Weekly: Issue #241
DeepMind's neural network that can restore damaged inscriptions, guided text generation with constrained beam search using Hugging Face Transformers, Google's data generation pipeline, and more.
This week in deep learning, we bring you DeepMind's neural network that can restore damaged inscriptions called Ithaca, guided text generation with constrained beam search using Hugging Face Transformers, Google's data generation pipeline for creating semi-realistic synthetic multi-object videos and a paper on StyleNeRF.
You may also enjoy a deep AI-based synthesizer called Neurorack, how to secure S3 access for isolated SageMaker instances, recommender systems using TensorFlow Ranking, a paper on Shift-Robust GNNs, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
DeepMind introduces Ithaca, the first deep neural network that can restore the missing text of damaged inscriptions, identify their original location, and help establish the date they were created.
Developers from the Artificial Creative Intelligence and Data Science (ACIDS) group based at the IRCAM Laboratory combine the power of deep generative models and the compactness of a Eurorack machine.
With PyTorch 1.11, there will be native support for Fully Sharded Data Parallel (FSDP), currently available as a prototype feature.
You.com launched a search app built in collaboration with OpenAI that generates snippets — or even documents — of text when given a prompt.
Google AI proposes a solution to weak label issues, like temporal noise, that uses a simple learning framework to conduct effective pre-training on untrimmed videos.
An article describing how the combination of open-source technologies and cloud providers was leveraged to simultaneously lower training costs by more than 8x, and decreasing model training time by 4x.
A post that demonstrates how to securely launch notebook instances in a private subnet of an Amazon Virtual Private Cloud, and to securely connect to Amazon S3 using VPC endpoints.
Andrew Ng’s presentation to the Future of Data-Centric AI virtual conference, where he discussed some practical tips for responsible data-centric AI development.
An exploratory article discussing why the AI market is trending more modular.
This blog describes running Ray and Ludwig on cloud Kubernetes clusters, using Nodeless K8s as a smart cluster provisioner to add right-sized GPU resources to the K8s cluster.
A deep dive into self-supervised learning – an evolving machine learning technique poised to solve the challenges posed by the over-dependence of labeled data.
A technical post that quickly goes over what the new constrained beam search feature can do for you and then goes into deeper details about how it works under the hood.
This article discusses how TensorFlow ranking can be used to build a recommendation system based on the learning-to-rank concept.
Libraries & Code
A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
Ludwig is a data-centric deep learning framework that allows users to train and test deep learning models by specifying a declarative configuration that matches the schema of the data. It is built on top of PyTorch.
Papers & Publications
Generative models are now capable of producing highly realistic images that look nearly indistinguishable from the data on which they are trained. This raises the question: If we have good enough generative models, do we still need datasets? We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from data. Given an off-the-shelf image generator without any access to its training data, we train representations from the samples output by this generator. We compare several representation learning methods that can be applied to this setting, using the latent space of the generator to generate multiple “views” of the same semantic content. We show that, for contrastive methods, this multiview data can naturally be used to identify positive pairs (nearby in latent space) and negative pairs (far apart in latent space). We find that the resulting representations rival or even outperform those learned directly from real data, but that good performance requires care in the sampling strategy applied and the training method. Generative models can be viewed as a compressed and organized copy of a dataset, and we envision a future where more and more “model zoos” proliferate while datasets become increasingly unwieldy, missing, or private. This paper suggests several techniques for dealing with visual representation learning in such a future.
We propose StyleNeRF, a 3D-aware generative model for photo-realistic high-resolution image synthesis with high multi-view consistency, which can be trained on unstructured 2D images. Existing approaches either cannot synthesize high-resolution images with fine details or yield noticeable 3D-inconsistent artifacts. In addition, many of them lack control over style attributes and explicit 3D camera poses. StyleNeRF integrates the neural radiance field (NeRF) into a style-based generator to tackle the aforementioned challenges, i.e., improving rendering efficiency and 3D consistency for high-resolution image generation. We perform volume rendering only to produce a low-resolution feature map and progressively apply upsampling in 2D to address the first issue. To mitigate the inconsistencies caused by 2D upsampling, we propose multiple designs, including a better upsampler and a new regularization loss. With these designs, StyleNeRF can synthesize high-resolution images at interactive rates while preserving 3D consistency at high quality. StyleNeRF also enables control of camera poses and different levels of styles, which can generalize to unseen views. It also supports challenging tasks, including zoom-in and-out, style mixing, inversion, and semantic editing.
There has been a recent surge of interest in designing Graph Neural Networks (GNNs) for semi-supervised learning tasks. Unfortunately this work has assumed that the nodes labeled for use in training were selected uniformly at random (i.e. are an IID sample). However in many real world scenarios gathering labels for graph nodes is both expensive and inherently biased – so this assumption can not be met. GNNs can suffer poor generalization when this occurs, by overfitting to superfluous regularities present in the training data. In this work we present a method, Shift-Robust GNN (SR-GNN), designed to account for distributional differences between biased training data and a graph’s true inference distribution. SR-GNN adapts GNN models to the presence of distributional shift between the nodes labeled for training and the rest of the dataset. We illustrate the effectiveness of SR-GNN in a variety of experiments with biased training datasets on common GNN benchmark datasets for semi-supervised learning, where we see that SRGNN outperforms other GNN baselines in accuracy, addressing at least ∼40% of the negative effects introduced by biased training data. On the largest dataset we consider, ogb-arxiv, we observe a 2% absolute improvement over the baseline and are able to mitigate 30% of the negative effects from training data bias.