Deep Learning Weekly Issue #121
A (possible) Apple Acquisition, a PyTorch hackathon, converting TensorFlow Models to PyTorch, and more...
This week in deep learning we bring a new PyTorch release, a (possible) acquisition by Apple, $60,000 in hackathon prizes from Facebook, and a new project from DeepMind to track wildlife in Tanzania.
You may also enjoy a new method for learning temporal characteristics in videos, a guide to converting from TensorFlow to PyTorch, a visual explanation of feedforward and backpropagation, a new long-tail segmentation dataset from Facebook, an SVG generated GAN, and more.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Though with less fanfare than some big players, Apple continues to add deep learning talent.
After what feels like years of seeing demos, Google is finally bringing an AR experience to Maps.
The latest release brings improvements to TorchScript, a new Transformer module, and additional support for ONNX.
Facebook announces over $61,000 in prizes as part of the Global PyTorch Hackathon. Submissions are due September 16th.
Politicians in the US are taking the implications of DeepFake technologies seriously.
DeepMind partners with conservationists to track animals in Tanzania's Serengeti National Park.
New research from Google learns temporal characteristics of videos to recognize actions and even transfer sound.
On-device training with Core ML – part 2
Part two of an in-depth look at on-device model training with Apple’s Core ML framework.
The team behind the popular PyTorch-Transformers repo provides a guide for converting TensorFlow models to PyTorch.
A primer on the math behind forward and backpropagation including some great visuals.
Very nice visualization of the latest generation of Transformer architectures.
Facebook announces a new dataset for long-tail instance segmentation. Over 1200 object types and over 700,000 instances so far.
Libraries & Code
Creating SVG art by training NLP models to produce valid-ish vector graphics.
InterFaceGAN: Interpreting the Latent Space of GANs for Semantic Face Editing.
Papers & Publications
Abstract: ….In this paper, we propose an interaction mechanism between a teacher and two students to generate more reliable pseudo labels for unlabeled data, which are beneficial to semi-supervised facial landmark detection. Specifically, the two students are instantiated as dual detectors. The teacher learns to judge the quality of the pseudo labels generated by the students and filter out unqualified samples before the retraining stage. In this way, the student detectors get feedback from their teacher and are retrained by premium data generated by itself. Since the two students are trained by different samples, a combination of their predictions will be more robust as the final prediction compared to either prediction….
Abstract: ….We introduce SpatialSense, a dataset specializing in spatial relation recognition which captures a broad spectrum of such challenges, allowing for proper benchmarking of computer vision techniques. SpatialSense is constructed through adversarial crowdsourcing, in which human annotators are tasked with finding spatial relations that are difficult to predict using simple cues such as 2D spatial configuration or language priors. Adversarial crowdsourcing significantly reduces dataset bias and samples more interesting relations in the long tail compared to existing datasets. On SpatialSense, state-of-the-art recognition models perform comparably to simple baselines, suggesting that they rely on straightforward cues instead of fully reasoning about this complex task…