Deep Learning Weekly Issue #160
The AI in Duolingo's app, building a Snap Lens with a custom ML model, another Apple AI acquisition, and more
This week in deep learning we bring you this article on how Duolingo uses AI in every part of its app, this tool lets users edit GANs with simple copy-and-paste commands, this AI vs. human virtual dogfight, and Apple's secret acquisition of Camerai.
You may also enjoy this post I wrote about building a Snapchat lens that adds color to a face mask, Google AI's progress towards better understanding deep learning on non-synthetic noisy labels, Adobe's AI-powered Character Animator features, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
How Duolingo uses AI in every part of its app
This article is a close look at how Duolingo uses AI in all aspects of the app, including the AI behind Stories, Smart Tips, podcasts, reports, and even notifications.
Adobe launches AI-powered Character Animator features in beta
Several new Adobe Character Animator features are powered by Sensei, Adobe’s cross-platform machine learning technology, and leverage algorithms to generate animation from recorded speech and align mouth movements for speaking parts.
Apple secretly acquired Israeli photography startup Camerai
The AR and computer vision company was purchased a year and a half ago, an example of the tech giant’s strategy of silently buying up small tech companies.
A Dogfight Renews Concerns About AI's Lethal Potential
Alphabet's DeepMind pioneered reinforcement learning. A California company used it to create an algorithm that defeated an F-16 pilot in a simulation.
Rewriting the rules of machine-generated art
An artificial intelligence tool lets users edit generative adversarial network models with simple copy-and-paste commands.
Mobile + Edge
Building a Custom Face Mask Snapchat Lens with Fritz AI and Lens Studio
Disclaimer: I wrote this one. In this post, I explain the end-to-end process of making a Snapchat Lens that uses a segmentation model trained with Fritz AI Studio to add color to face masks.
Hoka One One opens virtual pop-up shop with Snapchat AR
Hoka One One, the maker of athletic apparel owned by Deckers Brands, is offering a mobile shopping experience on Snapchat that includes an AR pop-up store and virtual try-on of running shoes.
Run image classification in a containerized environment with BalenaCloud
In this tutorial, Edge Impulse created an image classification system for Raspberry Pi users that can distinguish between different types of shoes: flip flops, sneakers, or running shoes.
Deep Learning on MCUs is the Future of Edge Computing
Today, processors capable of delivering many TOPS are not required to perform ML. In an increasing number of cases, the latest microcontrollers, some with embedded ML accelerators, can bring ML to edge devices.
This article covers the basics of one-shot learning from a computer vision perspective.
Understanding Deep Learning on Controlled Noisy Labels
To better understand the impact of noisy labels on Machine Learning model training, Google AI is announcing MentorMix, a new method to mitigate the impact of noisy labels, as well as a benchmark and dataset on real-world label noise.
Language-Agnostic BERT Sentence Embedding
Introducing LaBSE, a multilingual BERT model for the generation of cross-lingual sentence embeddings that exhibits exceptional performance for both high- and low-resource languages.
Tackling Open Challenges in Offline Reinforcement Learning
To address the challenges unique to offline Reinforcement Learning, Google AI is releasing an open-source benchmark, D4RL, as well as a simple and effective offline RL algorithm, called conservative Q-learning (CQL).
Libraries & Code
TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for TensorFlow 2 (supported languages include English, Korean, and Chinese).
Relatively simple 3D monocular object detection pipeline written in TensorFlow 2.
Recently, contrastive learning approaches have significantly advanced the SoTA of unsupervised (visual) representation learning. This repo contains pytorch implementation of a set of (improved) SoTA methods using the same training and evaluation pipeline.
Papers & Publications
CosyPose: Consistent multi-view multi-object 6D pose estimation
Abstract: We introduce an approach for recovering the 6D pose of multiple known objects in a scene captured by a set of input images with unknown camera viewpoints. First, we present a single-view single-object 6D pose estimation method, which we use to generate 6D object pose hypotheses. Second, we develop a robust method for matching individual 6D object pose hypotheses across different input images in order to jointly estimate camera viewpoints and 6D poses of all objects in a single consistent scene. Our approach explicitly handles object symmetries, does not require depth measurements, is robust to missing or incorrect object hypotheses, and automatically recovers the number of objects in the scene. Third, we develop a method for global scene refinement given multiple object hypotheses and their correspondences across views. This is achieved by solving an object-level bundle adjustment problem that refines the poses of cameras and objects to minimize the reprojection error in all views. We demonstrate that the proposed method, dubbed CosyPose, outperforms current state-of-the-art results for single-view and multi-view 6D object pose estimation by a large margin on two challenging benchmarks: the YCB-Video and T-LESS datasets. Code and pre-trained models are available on the project webpage this https URL.
Adversarial Attack and Defense Strategies for Deep Speaker Recognition Systems
Abstract: Robust speaker recognition, including in the presence of malicious attacks, is becoming increasingly important and essential, especially due to the proliferation of several smart speakers and personal agents that interact with an individual's voice commands to perform diverse, and even sensitive tasks. Adversarial attack is a recently revived domain which is shown to be effective in breaking deep neural network-based classifiers, specifically, by forcing them to change their posterior distribution by only perturbing the input samples by a very small amount. Although, significant progress in this realm has been made in the computer vision domain, advances within speaker recognition is still limited. The present expository paper considers several state-of-the-art adversarial attacks to a deep speaker recognition system, employing strong defense methods as countermeasures, and reporting on several ablation studies to obtain a comprehensive understanding of the problem. The experiments show that the speaker recognition systems are vulnerable to adversarial attacks, and the strongest attacks can reduce the accuracy of the system from 94% to even 0%. The study also compares the performances of the employed defense methods in detail, and finds adversarial training based on Projected Gradient Descent (PGD) to be the best defense method in our setting. We hope that the experiments presented in this paper provide baselines that can be useful for the research community interested in further studying adversarial robustness of speaker recognition systems.