Deep Learning Weekly Issue #121

A (possible) Apple Acquisition, a PyTorch hackathon, converting TensorFlow Models to PyTorch, and more...

Hey folks,

This week in deep learning we bring a new PyTorch release, a (possible) acquisition by Apple, $60,000 in hackathon prizes from Facebook, and a new project from DeepMind to track wildlife in Tanzania.

You may also enjoy a new method for learning temporal characteristics in videos, a guide to converting from TensorFlow to PyTorch, a visual explanation of feedforward and backpropagation, a new long-tail segmentation dataset from Facebook, an SVG generated GAN, and more.

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Apple May Have Acquired AI Visual Search Startup Fashwell

Though with less fanfare than some big players, Apple continues to add deep learning talent.

Google Maps AR walking directions arrive on iOS and Android

After what feels like years of seeing demos, Google is finally bringing an AR experience to Maps.

PyTorch 1.2 is now available.

The latest release brings improvements to TorchScript, a new Transformer module, and additional support for ONNX.

The Global PyTorch Hackathon begins.

Facebook announces over $61,000 in prizes as part of the Global PyTorch Hackathon. Submissions are due September 16th.

The Democratic Party deepfaked its own chairman to highlight 2020 concerns

Politicians in the US are taking the implications of DeepFake technologies seriously.

DeepMind uses AI to track Serengeti wildlife with photos

DeepMind partners with conservationists to track animals in Tanzania's Serengeti National Park.


Video Understanding Using Temporal Cycle-Consistency Learning

New research from Google learns temporal characteristics of videos to recognize actions and even transfer sound.

On-device training with Core ML – part 2

Part two of an in-depth look at on-device model training with Apple’s Core ML framework.

From TensorFlow to PyTorch

The team behind the popular PyTorch-Transformers repo provides a guide for converting TensorFlow models to PyTorch.

Neural Networks: Feedforward and Backpropagation Explained & Optimization

A primer on the math behind forward and backpropagation including some great visuals.

The Illustrated GPT-2 (Visualizing Transformer Language Models)

Very nice visualization of the latest generation of Transformer architectures.


LVIS: A Dataset for Large Vocabulary Instance Segmentation

Facebook announces a new dataset for long-tail instance segmentation. Over 1200 object types and over 700,000 instances so far.

Libraries & Code

[GitHub] artBoffin/GAN-XML-Fixer

Creating SVG art by training NLP models to produce valid-ish vector graphics.

[GitHub] ShenYujun/InterFaceGAN

InterFaceGAN: Interpreting the Latent Space of GANs for Semantic Face Editing.

Papers & Publications

Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection

Abstract: ….In this paper, we propose an interaction mechanism between a teacher and two students to generate more reliable pseudo labels for unlabeled data, which are beneficial to semi-supervised facial landmark detection. Specifically, the two students are instantiated as dual detectors. The teacher learns to judge the quality of the pseudo labels generated by the students and filter out unqualified samples before the retraining stage. In this way, the student detectors get feedback from their teacher and are retrained by premium data generated by itself. Since the two students are trained by different samples, a combination of their predictions will be more robust as the final prediction compared to either prediction….

SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition

Abstract: ….We introduce SpatialSense, a dataset specializing in spatial relation recognition which captures a broad spectrum of such challenges, allowing for proper benchmarking of computer vision techniques. SpatialSense is constructed through adversarial crowdsourcing, in which human annotators are tasked with finding spatial relations that are difficult to predict using simple cues such as 2D spatial configuration or language priors. Adversarial crowdsourcing significantly reduces dataset bias and samples more interesting relations in the long tail compared to existing datasets. On SpatialSense, state-of-the-art recognition models perform comparably to simple baselines, suggesting that they rely on straightforward cues instead of fully reasoning about this complex task…