Deep Learning Weekly Issue #180

Intel's new facial recognition tech, a new Python library for NLP, a robot dog with a neural net brain, and more

Hey folks,

This week in deep learning we bring you AI models from Microsoft and Google that already surpass human performance on the SuperGLUE language benchmark, Intel's addition of facial recognition tech to its RealSense portfolio, research that shows that machine learning models still struggle to detect hate speech, and this method rooted in neuroscience to protect against adversarial attacks.

You may also enjoy Ecco, a python library for explaining Natural Language Processing models using interactive visualizations, this robot dog with an ensemble of neural networks for a brain and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


AI models from Microsoft and Google already surpass human performance on the SuperGLUE language benchmark

When SuperGLUE was introduced, there was a nearly 20-point gap between the best-performing model and human performance, but now Microsoft’s DeBERTa and Google’s T5 + Meena have surpassed the human baselines.

Watch a Robot Dog Learn How to Deftly Fend Off a Human

Kick over this robot and it’ll quickly right itself—not because someone told it how, but because it taught itself to overcome the embarrassment.

Five ways to make AI a greater force for good in 2021

There's more attention on AI’s influence than ever before. Let's make it count.

New York City Proposes Regulating Algorithms Used in Hiring

A bill would require firms to disclose when they use software to assess candidates, and vendors would have to ensure that their tech doesn’t discriminate.

Researchers find machine learning models still struggle to detect hate speech

Researchers have developed HateCheck, a benchmark for hate speech detection models. Testing near-state-of-the-art detection models on HateCheck revealed “critical weaknesses” in these models.

Mobile + Edge

Intel adds facial recognition tech to its RealSense portfolio

Intel Corp. announced new facial recognition hardware and software that can be used in retail stores to facilitate payments or other settings to provide access to restricted areas.

Stanford researchers design accelerator chip that speeds up AI inferencing

Researchers at Stanford have developed hardware that can run AI tasks quickly and energy-efficiently by harnessing special-built chips.

Ambarella unveils 8K AI vision processor for car, drone, and robot cameras

Ambarella has unveiled the CV5 AI Vision Processor to handle 8K image processing for car, consumer, crone, and robot cameras.

EETimes - Syntiant SoC Runs Multiple AI Models for Under 1mW

AI accelerator startup Syntiant’s second-generation product for edge AI at ultra-low power is based on a new in-house designed processor core. It can run multiple neural networks simultaneously with a power budget of 1mW for always-on battery-powered devices such as smartphones.


Is neuroscience the key to protecting AI from adversarial attacks?

One interesting method to protect deep learning systems against adversarial attacks is to apply findings in neuroscience to close the gap between neural networks and the mammalian vision system.

Soft Labeling: How to apply soft labeling to help address noisy labels

This post will walk through how to to use soft labeling in fastai, and demonstrate how it helps with noisy labels to improve training and your metrics.

Researchers design AI that can infer whole floor plans from short video clips

Facebook, the University of Texas at Austin, and Carnegie Mellon University are exploring an AI technique that leverages visuals and audio to reconstruct a floor plan from a short video clip.

Columbia University Model Learns Predictability From Unlabelled Video

Researchers propose a novel framework and hierarchical predictive model that learns to identify what is predictable from unlabelled video.


[GitHub] RUCAIBox/TextBox

TextBox is an open-source library for building text generation systems.

[GitHub] jalammar/ecco

Ecco is a python library for explaining Natural Language Processing models using interactive visualizations.

Papers & Publications

Where2Act: From Pixels to Actions for Articulated 3D Objects

Abstract: One of the fundamental goals of visual perception is to allow agents to meaningfully interact with their environment. In this paper, we take a step towards that long-term goal -- we extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts. For example, given a drawer, our network predicts that applying a pulling force on the handle opens the drawer. We propose, discuss, and evaluate novel network architectures that given image and depth data, predict the set of actions possible at each pixel, and the regions over articulated parts that are likely to move under the force. We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation (SAPIEN) and generalizes across categories. But more importantly, our learned models even transfer to real-world data. Check the project website for the code and data release.

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Abstract: Neural networks are often over-parameterized and hence benefit from aggressive regularization. Conventional regularization methods, such as Dropout or weight decay, do not leverage the structures of the network's inputs and hidden states. As a result, these conventional methods are less effective than methods that leverage the structures, such as SpatialDropout and DropBlock, which randomly drop the values at certain contiguous areas in the hidden states and setting them to zero. Although the locations of dropout areas random, the patterns of SpatialDropout and DropBlock are manually designed and fixed. Here we propose to learn the dropout patterns. In our method, a controller learns to generate a dropout pattern at every channel and layer of a target network, such as a ConvNet or a Transformer. The target network is then trained with the dropout pattern, and its resulting validation performance is used as a signal for the controller to learn from. We show that this method works well for both image recognition on CIFAR-10 and ImageNet, as well as language modeling on Penn Treebank and WikiText-2. The learned dropout patterns also transfers to different tasks and datasets, such as from language model on Penn Treebank to English-French translation on WMT 2014. Our code will be available.