Deep Learning Weekly Issue #141
NeurIPS 2020, AI in Enterprises, Transformers and GNNs, generating 3D meshes, replacing camera ISPs and more...
This week in deep learning we bring changes to NeurIPS for 2020, stats on PyTorch vs TensorFlow usage, a report on AI adoption in enterprises, a new moonshot from Alphabet using AI to monitor the ocean, a look at using TensorFlow.js for make-up try on in the browser, and a new face enhancing app.
You may also enjoy research into replacing camera ISPs with deep learning models, links between transformers and graph neural networks, a review of important DL research in the 2010s, TPU support in PyTorch-lightening, a method for generating scene layouts from a single 2D image, autoregressive models for generating 3D meshes, and more.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Creator of the popular “YOLO” object detection model architecture has stepped away from CV research due to concerns of military and other privacy-related concerns about the field.
Changes to this year’s NeurIPS conference.
A thread by François Chollet with statistics on the use of PyTorch and Tensorflow / Keras across academia and industry.
An industry report from the MIT Sloan Management Review quantifying the state of AI projects in enterprises.
A new project from X, Alphabet’s moonshot factory, looks to AI to help preserve the oceans.
Mobile + Edge
Interesting research into replacing bespoke image signal processing pipelines in smartphones with deep learning models.
A great use case for real-time, on-device inference.
While not technically an edge-deployed model, the Remini Face Enhancer app is the latest GAN craze. This twitter thread shows surprising results for extremely low resolution images.
Training and implementing BERT on iOS using Swift, Flask, and Hugging Face’s Transformers Python package
A very interesting post detailing conceptual and mathematical links between Transformer models to Graph Neural Networks.
A review article that summarizes 40 papers using various BERT architectures.
A gentle introduction to self-supervised learning with a new dataset challenge.
Professor Schmidhuber’s comprehensive round-up of important ML research in the 2010s.
Benchmarks and cost-efficiency analysis of popular deep learning GPUs.
Generating top-down maps from front-facing cameras.
An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization
Libraries & Code
Code for the paper: CNN-generated images are surprisingly easy to spot... for now
The lightweight PyTorch wrapper for ML researchers. Scale your models. Write less boilerplate. Now with TPU support.
A collection of optimizers for Pytorch
A Collection of Variational Autoencoders (VAE) in PyTorch.
Applying PyTorch models to frames captured from a PS4.
Dataset for the Collection of Facial Expressions from Japanese Artwork
Papers & Publications
Abstract: Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models the mesh directly, predicting mesh vertices and faces sequentially using a Transformer-based architecture. Our model can condition on a range of inputs, including object classes, voxels, and images, and because the model is probabilistic it can produce samples that capture uncertainty in ambiguous scenarios. We show that the model is capable of producing high-quality, usable meshes, and establish log-likelihood benchmarks for the mesh-modelling task. We also evaluate the conditional models on surface reconstruction metrics against alternative methods, and demonstrate competitive performance despite not training directly on this task.
Abstract: We introduce Wavesplit, an end-to-end speech separation system. From a single recording of mixed speech, the model infers and clusters representations of each speaker and then estimates each source signal conditioned on the inferred representations. The model is trained on the raw waveform to jointly perform the two tasks. Our model infers a set of speaker representations through clustering, which addresses the fundamental permutation problem of speech separation. Moreover, the sequence-wide speaker representations provide a more robust separation of long, challenging sequences, compared to previous approaches. We show that Wavesplit outperforms the previous state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2mix, WSJ0-3mix), as well as in noisy (WHAM!) and reverberated (WHAMR!) conditions. As an additional contribution, we further improve our model by introducing online data augmentation for separation.
Abstract: Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable sub-networks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (Gur-Ari et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here, we examine the changes that deep neural networks undergo during this early phase of training. We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset. We find that, within this framework, deep networks are not robust to reinitializing with random weights while maintaining signs, and that weight distributions are highly non-independent even after only a few hundred iterations. Despite this behavior, pre-training with blurred inputs or an auxiliary self-supervised task can approximate the changes in supervised networks, suggesting that these changes are not inherently label-dependent, though labels significantly accelerate this process. Together, these results help to elucidate the network changes occurring during this pivotal initial period of learning.