Deep Learning Weekly Issue #171

Interview with Geoffrey Hinton, Google's new Document AI platform, GANs for mobile devices, and more

Hey folks,

This week in deep learning we bring you this interview with AI pioneer Geoff Hinton, OutSense’s AI that looks for life-threatening diseases in your poop, and Google Cloud's new Document AI Platform.

You may also enjoy learning about GANs for mobile devices, performing super resolution with OpenCV in both images and real-time video streams, Cornell & Facebook AI's simplified graph learning approach that outperforms SOTA GNNs, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


AI pioneer Geoff Hinton: “Deep learning is going to be able to do everything”

Thirty years ago, Hinton’s belief in neural networks was contrarian. Now it’s hard to find anyone who disagrees, he says.

Google Cloud debuts new Document AI Platform

The latest addition to Google LLC’s public cloud is the Document AI Platform, which enables enterprises to extract information contained in digital and printed documents automatically using machine learning.

This could lead to the next big breakthrough in common sense AI

Researchers are teaching giant language models how to "see" to help them understand the world.

OutSense’s AI looks for life-threatening diseases in your poop

You can learn a lot by analyzing a person’s bodily excretions, from their income and diet to — more importantly — their health. Now a fledgling Israeli startup wants to help prevent life-threatening diseases by passively monitoring human waste, circumventing the need to collect physical samples.

Algorithmia debuts a monitoring tool to prevent drift in machine learning models

Artificial intelligence operations and management software provider Algorithmia Inc. is taking on the chore of machine learning model performance monitoring with a new tool announced today that it says provides greater visibility into algorithm inference metrics.

Mobile + Edge

Apple Adds New AR-Enhanced ‘People Detection’ Accessibility Feature To iOS 14.2 Developer Beta

Apple has included a new accessibility feature in today’s release of the iOS 14.2 beta called People Detection. The software, which is actually a subset of the Magnifier app introduced with iOS 10, uses augmented reality and machine learning to detect where humans and objects are in space.

New signal processing performance metrics in Edge Impulse Studio

To help developers choose better parameters for their signal processing blocks, and to show whether a model will fit the latency and memory constraints that their application has, Edge Impulse introduced real-time performance metrics for all processing blocks in the Edge Impulse Studio.

Generative Adversarial Networks (GANs) for Mobile Devices

GANs for Mobile Devices


Adapting on the Fly to Test Time Distribution Shift

In this post, the author discusses methods of addressing covariate shift including their recent paper titled Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift.

OpenCV Super Resolution with Deep Learning

By the end of this tutorial, you’ll be able to perform super resolution with OpenCV in both images and real-time video streams.

Amazon Alexa AI’s ‘Language Model Is All You Need’ Explores NLU as QA

Amazon Alexa AI paper asks whether NLU problems could be mapped to question-answering (QA) problems using transfer learning.

Cornell & Facebook AI Simplified Graph Learning Approach Outperforms SOTA GNNs

Cornell and Facebook AI “Correct and Smooth” high-accuracy graph learning method is fast to train and outperforms big Graph Neural Network (GNN) models.

Libraries & Code

[GitHub] InterDigitalInc/CompressAI

A PyTorch library and evaluation platform for end-to-end compression research.

[GitHub] RUCAIBox/RecBole

A unified, comprehensive and efficient recommendation library.


[GitHub] google-research-datasets/Objectron/

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes.

Papers & Publications

CapWAP: Captioning with a Purpose

Abstract: The traditional image captioning task uses generic reference captions to provide textual information about images. Different user populations, however, will care about different visual aspects of images. In this paper, we propose a new task, Captioning with a Purpose (CapWAP). Our goal is to develop systems that can be tailored to be useful for the information needs of an intended population, rather than merely provide generic information about an image. In this task, we use question-answer (QA) pairs---a natural expression of information need---from users, instead of reference captions, for both training and post-inference evaluation. We show that it is possible to use reinforcement learning to directly optimize for the intended information need, by rewarding outputs that allow a question answering model to provide correct answers to sampled user questions. We convert several visual question answering datasets into CapWAP datasets, and demonstrate that under a variety of scenarios our purposeful captioning system learns to anticipate and fulfill specific information needs better than its generic counterparts, as measured by QA performance on user questions from unseen images, when using the caption alone as context.

Intriguing Properties of Contrastive Losses

Abstract: Contrastive loss and its variants have become very popular recently for learning visual representations without supervision. In this work, we first generalize the standard contrastive loss based on cross entropy to a broader family of losses that share an abstract form of L_alignment + λL_distribution, where hidden representations are encouraged to (1) be aligned under some transformations/augmentations, and (2) match a prior distribution of high entropy. We show that various instantiations of the generalized loss perform similarly under the presence of a multi-layer nonlinear projection head, and the temperature scaling (τ) widely used in the standard contrastive loss is (within a range) inversely related to the weighting (λ) between the two loss terms. We then study an intriguing phenomenon of feature suppression among competing features shared across augmented views, such as “color distribution” vs “object class”. We construct datasets with explicit and controllable competing features, and show that, for contrastive learning, a few bits of easy-to-learn shared features could suppress, and even fully prevent, the learning of other sets of competing features. Interestingly, this characteristic is much less detrimental in autoencoders based on a reconstruction loss. Existing contrastive learning methods critically rely on data augmentation to favor certain sets of features than others, while one may wish that a network would learn all competing features as much as its capacity allows.