Deep Learning Weekly Issue #117

The Batch Norm Patent, AI cameras, multi-task learning, training sparse networks, and more

Hey folks,

This week in deep learning we bring you on-device people detection from Wyze, the batch norm patent, DeepMind’s Starcraft AI on the public ladder, and a look at where facial recognition datasets come from.

You may also enjoy Andrej Karpathy’s talk on multi-task learning, the latest MLPerf results, a new method for training sparse networks from scratch, a simple TensorFlow implementation of StyleGAN, and a large repository of ML algorithms written in NumPy.

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Wyze adds AI-powered people detection to its $20 Wyze Cam line [The Verge]

Wyze has partnered with XNOR to add on-device AI capabilities to their cameras.

Investing in Synthetic Media [Betaworks]

Betaworks put out a new blog post on the future of generated media. It was later revealed the entire post was generated by a GPT-2 model.

Google awarded a patent for Batch Normalization

Is this the start of the AI patent wars?

DeepMind research on Ladder

Blizzard has announced that DeepMind’s Starcraft AI will begin playing games on the public ladder.

Microsoft’s $399 Azure Kinect AI camera is now shipping in the US and China [TechCrunch]

You can now buy a Kinect sensor (video + depth) for use in your next deep learning project.

Facial Recognition Tech Is Growing Stronger, Thanks to Your Face [New York Times]

The NYTs takes a deep dive into what comprises facial recognition training data.


Multi-task Learning in the Wilderness

Telsa’s Andrej Karpathy speaks at ICML about multi-task learning in the real world.

ActiveStereoNet: The first deep learning solution for active stereo systems

Pairing deep learning with data from multiple sensors.

MLPerf Training v0.6 Results

Google’s TPU v3’s top MLPerf results for most models.

Sparse Networks from Scratch: Faster Training without Losing Performance

Very promising results training small, sparse networks in one shot without iterative pruning.

Building Dab and T-Pose Controlled Lights

Training a neural network to control lights through dance.


Creating 3D adversarial objects to fool LiDAR-based AR models in vehicles.


Brno Mobile OCR Dataset

An OCR dataset containing ~20,000 images of scientific papers taken from mobile phones.

Libraries & Code

[Github] taki0112/StyleGAN-TensorFlow

Simple & Intuitive Tensorflow implementation of "A Style-Based Generator Architecture for Generative Adversarial Networks"

[Github] ddbourgin/numpy-ml

A whole bunch of ML algorithms implemented in NumPy

[Github] vincent-thevenin/Realistic-Neural-Talking-Head-Models

A PyTorch implementation of “Few-Shot Adversarial Learning of Realistic Neural Talking Head Models”.

Papers & Publications

Large Memory Layers with Product Keys

Abstract: This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time. This memory layer allows us to tackle very large scale language modeling tasks....In particular, we found that a memory augmented model with only 12 layers outperforms a baseline transformer model with 24 layers, while being twice faster at inference time….

Learning Representations by Maximizing Mutual Information Across Views

Abstract: We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context.... Maximizing mutual information between features extracted from these views requires capturing information about high-level factors whose influence spans multiple views -- e.g., presence of certain objects or occurrence of certain events. Following our proposed approach, we develop a model which learns image representations that significantly outperform prior methods on the tasks we consider. Most notably, using self-supervised learning, our model learns representations which achieve 68.1% accuracy on ImageNet using standard linear evaluation…