Deep Learning Weekly: Issue #188

SoTA image generation models, PyTorch 1.8, Facebook's new architecture for object detection on mobile, and more

Hey folks,

This week in deep learning we bring you neurons in OpenAI's CLIP that respond to the same concept whether presented literally, symbolically, or conceptually, this compilation of State-of-the-Art image-generating models, the release of PyTorch 1.8, and Facebook’s Detectron2Go for training and deploying efficient deep learning object detection models on mobile devices.

You may also enjoy this Hugging Face blog post that summarizes and compares Long-Range Attention Transformer models, this library for generating synthetic computer vision datasets with Blender, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


PyTorch 1.8 Release, including Compiler and Distributed Training updates, and New Mobile Tutorials

Highlights include updates for compiler, code optimization, frontend APIs for scientific computing, large scale training for pipeline and model parallelism, and Mobile tutorials.

Algorithm helps artificial intelligence systems dodge “adversarial” inputs

This method builds on gaming techniques to help autonomous vehicles navigate in the real world, where signals may be imperfect.

How to poison the data that Big Tech uses to surveil you

Algorithms are meaningless without good data. The public can exploit that to demand change.

We first need to understand how the brain works if we want true AI

Neuroscientist and tech entrepreneur Jeff Hawkins claims he’s figured out how intelligence works—and he wants every AI lab in the world to know about it. 

Tom Cruise deepfake creator says public shouldn’t be worried about ‘one-click fakes’

Weeks of work and a top impersonator were needed to make the viral clips.

Why you shouldn't expect Tesla's 'Full Self Driving' to come out of beta any time soon

Tesla’s Full Self Driving only works under certain circumstances to perform specific tasks related to driving: it cannot safely perform an end-to-end traversal that requires it to navigate city streets, highways, and parking lots in unknown territory.

Mobile + Edge

D2Go brings Detectron2 to mobile

The Mobile Vision team at Facebook Reality Labs (FRL) is expanding on Detectron2 with the introduction of Detectron2Go (D2Go), a new, state-of-the-art extension for training and deploying efficient deep learning object detection models on mobile devices and hardware.

Build a Cat-or-Dog Classification Flutter App with TensorFlow Lite

Using a pre-trained classification TensorFlow Lite model to build an ML-powered Flutter app.

ESP32 Tensorflow micro speech with the external microphone

This tutorial covers how to use TensorFlow micro speech with ESP32 with an external microphone I2S.

Accelerating Neural Networks on Mobile and Web with Sparse Inference

Google AI released new sparsity features in the XNNPACK acceleration library that is powering TFLite. Sparse inference improves efficiency without degrading quality in applications like Google Meet's background effects.


OpenAI's Multimodal Neurons in Artificial Neural Networks

OpenAI discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually. This may explain CLIP’s accuracy in classifying surprising visual renditions of concepts, and is also an important step toward understanding the associations and biases that CLIP and similar models learn.

Hugging Face Reads, Feb. 2021 - Long-range Transformers

This blog post summarizes and compares Long-Range Attention in four select Transformer architectures: custom attention patterns (with Longformer), recurrence (with Compressive Transformer), low-rank approximations (with Linformer), and kernel approximations (with Performer).

State-of-the-Art Image Generative Models

In this blog post, the author aggregated some of the SotA image generative models released recently, with short summaries, visualizations, and comments.

AI Dubs Over Subs? Translating and Dubbing Videos with AI

In this post, the authors shows how to use machine learning to transcribe, translate, and voice-act videos from one language to another.

PAIRED: A New Multi-agent Approach for Adversarial Environment Generation

Check out PAIRED, a new multi-agent approach for Reinforcement Learning that tunes the difficulty of a simulated environment in order to create an automatic curriculum of increasingly challenging training tasks.


[GitHub] ZumoLabs/zpy

Synthetic data in Blender.

[GitHub] mit-han-lab/anycost-gan

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing.


Do Transformer Modifications Transfer Across Implementations and Applications?

Abstract: The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we find that most modifications do not meaningfully improve performance. Furthermore, most of the Transformer variants we found beneficial were either developed in the same codebase that we used or are relatively minor changes. We conjecture that performance improvements may strongly depend on implementation details and correspondingly make some recommendations for improving the generality of experimental results.

Self-supervised Pretraining of Visual Features in the Wild

Abstract: Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: this https URL.