Deep Learning Weekly Issue #156
AI plays Super Mario, masks vs facial recognition, a technical overview of SnapML, and more
This week in deep learning we bring you Shortcuts: How Neural Networks Love to Cheat, why Facebook hacks it's own AI programs, and the code for a Proximal Policy Optimization algorithm an AI used to learn how to play Super Mario Bros.
For news related to facial recognition, check out this model that protects BLM protestors from facial recognition, this NIST study that finds that masks defeat most facial recognition algorithms, and this article about the decision to ban facial recognition in New York schools for two years.
You may also enjoy this article about how Facebook intends to look for racial bias in its algorithms, this applied computer vision project that tracks a tennis serve, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Attackers increasingly try to confuse and bypass machine-learning systems. So the companies that deploy them are getting creative.
Facebook says it will look for racial bias in its algorithms
Facebook says it is setting up new internal teams to look for racial bias in the algorithms that drive its main social network and Instagram.
This AI uses emoji to protect BLM protestors from facial recognition
The system slaps a BLM fist emoji on the faces of protestors.
NIST study finds that masks defeat most facial recognition algorithm
In a report published today by the National Institutes of Science and Technology (NIST), researchers attempted to evaluate the performance of facial recognition algorithms on faces partially covered by protective masks.
New York legislature votes to halt facial recognition tech in schools for two year
The state of New York voted this week to pause for two years any implementation of facial recognition technology in schools.
Mobile + Edge
See how integration of the XNNPACK library with TensorFlow Lite improves neural network inference performance by 2.3X on average.
Typewise taps $1M to build an offline next word prediction engine
Swiss keyboard startup Typewise has bagged a $1 million seed round to build out a typo-busting, ‘privacy-safe’ next word prediction engine designed to run entirely offline. No cloud connectivity, no data mining risk is the basic idea.
Alexa will soon be able to launch Android and iOS apps using voice commands
Amazon is working on a new feature for its Alexa voice assistant that will let the software launch Android and iOS apps using voice commands, a first for Amazon’s assistant and a bold expansion of its strategy to position Alexa as a platform-agnostic alternative to Apple’s Siri and Google Assistant.
Exploring SnapML: A Technical Overview
Slight disclaimer--this one is from me. I’ve been experimenting with SnapML, Snap’s new framework for working with custom neural networks inside Lens Studio, so I wanted to share my initial impressions from a technical perspective.
In this article, the authors of Shortcut Learning in Deep Neural Networks dive into the idea of “shortcut learning” and how many difficulties in deep learning can be seen as symptoms of this underlying problem.
Trouble with your tennis serves and penalty kicks? There’s an AI for that
In this article, the author walks through a real-life applied computer vision project in which she models and analyzes her tennis serve.
Announcing ScaNN: Efficient Vector Similarity Search
Google AI researchers recently open-sourced ScaNN (Scalable Nearest Neighbors), a method for efficient vector similarity search at scale.
Working with preprocessing layers in Keras
This guide gives an overview of how to leverage preprocessing layers to create end-to-end models in Keras.
Meta Learning Framework for TensorFlow 2.0. Dataset download instructions are in the readme.
Libraries & Code
Proximal Policy Optimization (PPO) algorithm for Super Mario Bros.
This is the official repo for ECCV 2020 paper "Whole-Body Human Pose Estimation in the Wild." This repo contains the COCO-WholeBody annotations proposed in the paper.
Papers & Publications
Abstract: Recently developed deep learning models are able to learn to segment scenes into component objects without supervision. This opens many new and exciting avenues of research, allowing agents to take objects (or entities) as inputs, rather that pixels. Unfortunately, while these models provide excellent segmentation of a single frame, they do not keep track of how objects segmented at one time-step correspond (or align) to those at a later time-step. The alignment (or correspondence) problem has impeded progress towards using object representations in downstream tasks. In this paper we take steps towards solving the alignment problem, presenting the AlignNet, an unsupervised alignment module.
CrossTransformers: spatially-aware few-shot transfer
Abstract: Given new tasks with very little data--such as new classes in a classification problem or a domain shift in the input--performance of modern vision systems degrades remarkably quickly. In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains. We then propose two methods to mitigate this problem. First, we employ self-supervised learning to encourage general-purpose features that transfer better. Second, we propose a novel Transformer based neural network architecture called CrossTransformers, which can take a small number of labeled images and an unlabeled query, find coarse spatial correspondence between the query and the labeled images, and then infer class membership by computing distances between spatially-corresponding features. The result is a classifier that is more robust to task and domain shift, which we demonstrate via state-of-the-art performance on Meta-Dataset, a recent dataset for evaluating transfer from ImageNet to many other vision datasets.