Deep Learning Weekly Issue #134
TensorFlow 2.1.0 Release candidate, an AI-keyboard from Amazon, a replacement for Batch Norm and more...
This week in deep learning we bring you a new TensorFlow release candidate, a guide to stemming AI misinformation, new face bounding box annotations for COCO images, a soccer reinforcement learning environment from Google, and an alternative to batch normalization.
For some reason there were a large number of music related projects released this week. Amazon announced DeepComposer, an AI-powered piano; Project Magenta released a browser-based drum machine; Facebook open-sourced Demucs, a music source separation system; and a new project uses electronic music to generate StyleGAN evolutions.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
You can now buy an Amazon branded MIDI keyboard, play a short riff, and use a deep-learning model to generate a song which will be uploaded to SoundCloud.
The latest version consolidates CPU and GPU packages and brings experimental TPU support to Keras.
Advice for researchers, readers, and journalists on how to present and digest work without feeding dangerous hype cycles.
Project Magenta from Google adds a browser-based drum machine to their list of instruments.
The final installment of a fantastic series exploring Core ML 3’s model personalization features.
A custom StyleGAN evolution is controlled based on musical input.
A reinforcement learning environment for football (soccer) including a Game Server that lets ML agents compete online.
Adds "face" bounding boxes to the COCO images dataset.
Libraries & Code
Code for the paper Music Source Separation in the Waveform Domain.
An open source implementation of agents, algorithms, and environments related to the paper Optimizing Agent Behavior over Long Time Scales by Transporting Value.
Papers & Publications
Abstract: Sparse neural networks have been shown to be more parameter and compute efficient compared to dense networks and in some cases are used to decrease wall clock inference times. There is a large body of work on training dense networks to yield sparse networks for inference. This limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. Importantly, by adjusting the topology it can start from any initialization - not just "lucky" ones. We demonstrate state-of-the-art sparse training results with ResNet-50, MobileNet v1 and MobileNet v2 on the ImageNet-2012 dataset, WideResNets on the CIFAR-10 dataset and RNNs on the WikiText-103 dataset. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static.
Abstract: Batch Normalization (BN) is a highly successful and widely used batch dependent training method. Its use of mini-batch statistics to normalize the activations introduces dependence between samples, which can hurt the training if the mini-batch size is too small, or if the samples are correlated. Several alternatives, such as Batch Renormalization and Group Normalization (GN), have been proposed to address these issues. However, they either do not match the performance of BN for large batches, or still exhibit degradation in performance for smaller batches, or introduce artificial constraints on the model architecture. In this paper we propose the Filter Response Normalization (FRN) layer, a novel combination of a normalization and an activation function, that can be used as a drop-in replacement for other normalizations and activations. Our method operates on each activation map of each batch sample independently, eliminating the dependency on other batch samples or channels of the same sample. Our method outperforms BN and all alternatives in a variety of settings for all batch sizes. FRN layer performs ≈0.7−1.0% better on top-1 validation accuracy than BN with large mini-batch sizes on Imagenet classification on InceptionV3 and ResnetV2-50 architectures. Further, it performs >1% better than GN on the same problem in the small mini-batch size regime. For object detection problem on COCO dataset, FRN layer outperforms all other methods by at least 0.3−0.5% in all batch size regimes.