Deep Learning Weekly Issue #184
Replicating human DNA with GANs, deep learning in the search for dark matter, AR's pandemic boost, and more
This week in deep learning we bring you fake human DNA sequences made by a GAN, Google's new AI-powered heart and breathing monitors for Pixel phones, deep residual neural networks used to find potential gravitational lenses in the search for dark matter, and the code that powers Google's Colorization Transformer.
You may also enjoy these papers titled baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotemporal Modeling, DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
This is how we lost control of our faces
The largest ever study of facial-recognition data shows how much the rise of deep learning has fueled a loss of privacy.
AI detects 1,210 potential gravitational lenses in the search for dark matter
The lenses were spotted by deep residual neural networks.
Machine-learning model helps determine protein structures
Use of a new neural network architecture reveals many possible conformations that a protein may take.
This human genome does not exist: Researchers taught an AI to generate fake DNA
A team of researchers from Estonia have developed a machine learning system capable of generating unique genome sequences.
55 Researchers From 44 Institutions Propose GEM, a ‘Living Benchmark’ for NLG
To track progress in natural language generation (NLG) models, 55 researchers from more than 40 prestigious institutions have proposed GEM (Generation, Evaluation, and Metrics), a "living benchmark" NLG evaluation environment.
Mobile + Edge
Google announces new AI-powered heart and breathing monitors for Pixel phones
The features use computer vision and sensors to make the measurements
Augmented Reality Gets Pandemic Boost
Whether to fix cars or apply makeup, companies hamstrung by the pandemic found immediate uses for the distanced expertise AR technology can provide.
RSL10 Smart Shot Camera from ON Semiconductor Enables Event Triggered Imaging with AI
Ultra-low-power platform based on RSL10 SIP and ARX3A0 brings automatic image recognition to the IoT.
Optimizing a low-cost camera for machine vision
In this deep dive article, performance optimization specialist Larry Bank (a.k.a. The Performance Whisperer) takes a look at the work he did for the Arduino team on the latest version of the Arduino_OV767x library.
Hugging Face on PyTorch / XLA TPUs: Faster and cheaper training
This blog post provides an overview of changes made in the Hugging Face library, what the PyTorch / XLA library does, an example to get you started training your favorite transformers on Cloud TPUs, and some performance benchmarks.
Georgia Tech & Facebook Tensor Train Approach Achieves 112x Size Reduction in DL Recommendation Models
A new study by the Georgia Institute of Technology and Facebook AI introduces TT-Rec, a way to drastically compress the size of memory-intensive Deep Learning Recommendation Models (DLRM) and make them easier to deploy at scale.
TracIn — A Simple Method to Estimate Training Data Influence
Researchers at Google AI developed TracIn, a simple, scalable approach to understand the influence of training data on ML models by tracing the changes in prediction as individual training examples are visited during the training process
Staying Abreast of Changing Language: DeepMind Explores Temporal Generalization in Language Models
In a bid to solve the temporal generalization problem of modern language models, a team of DeepMind researchers propose it’s time to develop adaptive language models that will remain up-to-date in our ever-changing world.
Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning
Brought to you by Google AI, the World Models Library, an open-source, platform-agnostic suite of tasks and tools for examination of world model design and performance in visual model-based reinforcement learning.
Source code accompanying the paper Colorization Transformer to be presented at ICLR 2021. Work by Manoj Kumar, Dirk Weissenborn and Nal Kalchbrenner.
Joint detection and tracking model named DEFT, or “Detection Embeddings for Tracking.”
Papers & Publications
baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotemporal Modeling
Abstract: Multi-agent spatiotemporal modeling is a challenging task from both an algorithmic design and computational complexity perspective. Recent work has explored the efficacy of traditional deep sequential models in this domain, but these architectures are slow and cumbersome to train, particularly as model size increases. Further, prior attempts to model interactions between agents across time have limitations, such as imposing an order on the agents, or making assumptions about their relationships. In this paper, we introduce baller2vec, a multi-entity generalization of the standard Transformer that, with minimal assumptions, can simultaneously and efficiently integrate information across entities and time. We test the effectiveness of baller2vec for multi-agent spatiotemporal modeling by training it to perform two different basketball-related tasks: (1) simultaneously forecasting the trajectories of all players on the court and (2) forecasting the trajectory of the ball. Not only does baller2vec learn to perform these tasks well, it also appears to "understand" the game of basketball, encoding idiosyncratic qualities of players in its embeddings, and performing basketball-relevant functions with its attention heads.
DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech
Abstract: With the number of smart devices increasing, the demand for on-device text-to-speech (TTS) increases rapidly. In recent years, many prominent End-to-End TTS methods have been proposed, and have greatly improved the quality of synthesized speech. However, to ensure the qualified speech, most TTS systems depend on large and complex neural network models, and it's hard to deploy these TTS systems on-device. In this paper, a small-footprint, fast, stable network for on-device TTS is proposed, named as DeviceTTS. DeviceTTS makes use of a duration predictor as a bridge between encoder and decoder so as to avoid the problem of words skipping and repeating in Tacotron. As we all know, model size is a key factor for on-device TTS. For DeviceTTS, Deep Feedforward Sequential Memory Network (DFSMN) is used as the basic component. Moreover, to speed up inference, mix-resolution decoder is proposed for balance the inference speed and speech quality. Experiences are done with WORLD and LPCNet vocoder. Finally, with only 1.4 million model parameters and 0.099 GFLOPS, DeviceTTS achieves comparable performance with Tacotron and FastSpeech. As far as we know, the DeviceTTS can meet the needs of most of the devices in practical application.