Deep Learning Weekly Issue #181

NASA's AI for Mars crater detection, applying NLP to viral genetics, sign language recognition techniques, & more

Hey folks,

This week in deep learning we bring you NASA's AI for detecting fresh craters on Mars, a visual history of model interpretation for image recognition, the application of NLP algorithms to genetic information in viruses, and this article on detecting COVID with edge AI and wearable biosensors.

You may also enjoy exploring sign language recognition techniques with ML, Google's Pr-VIPE - a new approach to pose perception that uses probabilistic embeddings, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

NASA Is Training an AI to Detect Fresh Craters on Mars

An algorithm discovered dozens of Martian craters. It’s a promising remote method for exploring our solar system and understanding planetary history.

AIs that read sentences are now catching coronavirus mutations

NLP algorithms designed for words and sentences can also be used to interpret genetic changes in viruses—speeding up lab work to spot new variants.

Jumbled-up sentences show that AIs still don’t really understand language

They also reveal an easy way to make them better.

Stanford AI scholar Fei-Fei Li writes about humility in tech

Stanford professor Fei-Fei Li says a culture of transparency, openness, and respect will lead to breakthroughs that help society.

Google launches suite of AI-powered solutions for retailers

Google announced the launch of Product Discovery Solutions for Retail, a suite of services designed to enhance retailers’ ecommerce capabilities and help them deliver personalized customer experiences.

Mobile + Edge

Wearables Provide Speedy COVID Screening

Using edge AI and wearable biosensors, an app detects COVID within two minutes—even in asymptomatic patients.

Simple Audio Recognition on a Raspberry Pi using Machine Learning (I2S, TensorFlow Lite)

This post shows you how to adapt the official TensorFlow simple audio recognition example to use live audio data from an I2S microphone on a Raspberry Pi.

Neural Network Quantization: Research Review 2020

A roundup of the latest advances and insights for neural network compression with quantization.

Learning

A Visual History of Interpretation for Image Recognition

In this piece, the authors provide an overview of the interpretation methods invented for image recognition, discuss their tradeoffs and provide examples and code to try them out yourself using Gradio.

Recognizing Pose Similarity in Images and Videos

Pr-VIPE is a new approach to pose perception that uses probabilistic embeddings that are view-invariant to avoid the ambiguity arising from the 2D projection of 3D poses. The model is simple and compact, and can be trained in ~1 day on CPUs.

Max Planck Institute & Facebook Model Performs Human Re-Rendering From a Single Image

In a new paper, researchers from Max Planck Institute for Informatics and Facebook Reality Labs propose an end-to-end trainable method that enables re-rendering of humans from one single image.

ToTTo: A Controlled Table-to-Text Generation Dataset

ToTTo from Google AI is an open domain table-to-text generation dataset, a challenging benchmark for high-precision text generation, and a controlled text generation task that can be used to assess model “hallucination”.

Exploring Sign Language Recognition techniques with Machine Learning

Understanding Indian Sign Language recognition and its techniques with a focus on the state-of-the-art hierarchical neural network approach.

Code

[GitHub] VITA-Group/TENAS

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang.

[GitHub] PaddlePaddle/PaddleSeg

End-to-end image segmentation kit based on PaddlePaddle.

Papers & Publications

VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency

Abstract: We introduce a new approach for audio-visual speech separation. Given a video, the goal is to extract the speech associated with a face in spite of simultaneous background sounds and/or other human speakers. Whereas existing methods focus on learning the alignment between the speaker's lip movements and the sounds they generate, we propose to leverage the speaker's face appearance as an additional prior to isolate the corresponding vocal qualities they are likely to produce. Our approach jointly learns audio-visual speech separation and cross-modal speaker embeddings from unlabeled video. It yields state-of-the-art results on five benchmark datasets for audio-visual speech separation and enhancement, and generalizes well to challenging real-world videos of diverse scenarios. Our video results and code: this http URL.

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Abstract: In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example. The result is a sparsely-activated model -- with outrageous numbers of parameters -- but a constant computational cost. However, despite several notable successes of MoE, widespread adoption has been hindered by complexity, communication costs and training instability -- we address these with the Switch Transformer. We simplify the MoE routing algorithm and design intuitive improved models with reduced communication and computational costs. Our proposed training techniques help wrangle the instabilities and we show large sparse models may be trained, for the first time, with lower precision (bfloat16) formats. We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources. These improvements extend into multilingual settings where we measure gains over the mT5-Base version across all 101 languages. Finally, we advance the current scale of language models by pre-training up to trillion parameter models on the "Colossal Clean Crawled Corpus" and achieve a 4x speedup over the T5-XXL model.