Deep Learning Weekly: Issue #188
SoTA image generation models, PyTorch 1.8, Facebook's new architecture for object detection on mobile, and more
This week in deep learning we bring you neurons in OpenAI's CLIP that respond to the same concept whether presented literally, symbolically, or conceptually, this compilation of State-of-the-Art image-generating models, the release of PyTorch 1.8, and Facebook’s Detectron2Go for training and deploying efficient deep learning object detection models on mobile devices.
You may also enjoy this Hugging Face blog post that summarizes and compares Long-Range Attention Transformer models, this library for generating synthetic computer vision datasets with Blender, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Highlights include updates for compiler, code optimization, frontend APIs for scientific computing, large scale training for pipeline and model parallelism, and Mobile tutorials.
This method builds on gaming techniques to help autonomous vehicles navigate in the real world, where signals may be imperfect.
Algorithms are meaningless without good data. The public can exploit that to demand change.
Neuroscientist and tech entrepreneur Jeff Hawkins claims he’s figured out how intelligence works—and he wants every AI lab in the world to know about it.
Weeks of work and a top impersonator were needed to make the viral clips.
Tesla’s Full Self Driving only works under certain circumstances to perform specific tasks related to driving: it cannot safely perform an end-to-end traversal that requires it to navigate city streets, highways, and parking lots in unknown territory.
Mobile + Edge
The Mobile Vision team at Facebook Reality Labs (FRL) is expanding on Detectron2 with the introduction of Detectron2Go (D2Go), a new, state-of-the-art extension for training and deploying efficient deep learning object detection models on mobile devices and hardware.
Using a pre-trained classification TensorFlow Lite model to build an ML-powered Flutter app.
This tutorial covers how to use TensorFlow micro speech with ESP32 with an external microphone I2S.
Google AI released new sparsity features in the XNNPACK acceleration library that is powering TFLite. Sparse inference improves efficiency without degrading quality in applications like Google Meet's background effects.
OpenAI discovered neurons in CLIP that respond to the same concept whether presented literally, symbolically, or conceptually. This may explain CLIP’s accuracy in classifying surprising visual renditions of concepts, and is also an important step toward understanding the associations and biases that CLIP and similar models learn.
This blog post summarizes and compares Long-Range Attention in four select Transformer architectures: custom attention patterns (with Longformer), recurrence (with Compressive Transformer), low-rank approximations (with Linformer), and kernel approximations (with Performer).
In this blog post, the author aggregated some of the SotA image generative models released recently, with short summaries, visualizations, and comments.
In this post, the authors shows how to use machine learning to transcribe, translate, and voice-act videos from one language to another.
Check out PAIRED, a new multi-agent approach for Reinforcement Learning that tunes the difficulty of a simulated environment in order to create an automatic curriculum of increasingly challenging training tasks.
Synthetic data in Blender.
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing.
Abstract: The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, we find that most modifications do not meaningfully improve performance. Furthermore, most of the Transformer variants we found beneficial were either developed in the same codebase that we used or are relatively minor changes. We conjecture that performance improvements may strongly depend on implementation details and correspondingly make some recommendations for improving the generality of experimental results.
Abstract: Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: this https URL.