Deep Learning Weekly Issue #117
The Batch Norm Patent, AI cameras, multi-task learning, training sparse networks, and more
|Jameson Toole||Jul 17, 2019|
This week in deep learning we bring you on-device people detection from Wyze, the batch norm patent, DeepMind’s Starcraft AI on the public ladder, and a look at where facial recognition datasets come from.
You may also enjoy Andrej Karpathy’s talk on multi-task learning, the latest MLPerf results, a new method for training sparse networks from scratch, a simple TensorFlow implementation of StyleGAN, and a large repository of ML algorithms written in NumPy.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Wyze has partnered with XNOR to add on-device AI capabilities to their cameras.
Betaworks put out a new blog post on the future of generated media. It was later revealed the entire post was generated by a GPT-2 model.
Is this the start of the AI patent wars?
Blizzard has announced that DeepMind’s Starcraft AI will begin playing games on the public ladder.
You can now buy a Kinect sensor (video + depth) for use in your next deep learning project.
The NYTs takes a deep dive into what comprises facial recognition training data.
Telsa’s Andrej Karpathy speaks at ICML about multi-task learning in the real world.
Pairing deep learning with data from multiple sensors.
Google’s TPU v3’s top MLPerf results for most models.
Very promising results training small, sparse networks in one shot without iterative pruning.
Training a neural network to control lights through dance.
Creating 3D adversarial objects to fool LiDAR-based AR models in vehicles.
An OCR dataset containing ~20,000 images of scientific papers taken from mobile phones.
Libraries & Code
Simple & Intuitive Tensorflow implementation of "A Style-Based Generator Architecture for Generative Adversarial Networks"
A whole bunch of ML algorithms implemented in NumPy
A PyTorch implementation of “Few-Shot Adversarial Learning of Realistic Neural Talking Head Models”.
Papers & Publications
Abstract: This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time. This memory layer allows us to tackle very large scale language modeling tasks....In particular, we found that a memory augmented model with only 12 layers outperforms a baseline transformer model with 24 layers, while being twice faster at inference time….
Abstract: We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context.... Maximizing mutual information between features extracted from these views requires capturing information about high-level factors whose influence spans multiple views -- e.g., presence of certain objects or occurrence of certain events. Following our proposed approach, we develop a model which learns image representations that significantly outperform prior methods on the tasks we consider. Most notably, using self-supervised learning, our model learns representations which achieve 68.1% accuracy on ImageNet using standard linear evaluation…