Deep Learning Weekly Issue #122

Turi Create 5.7, human annotators, fear Rekognition, pop music generators, and more...

Hey folks,

This week in deep learning we bring you an update to Apple’s Turi Create framework, PyTorch-Transformer’s implementation of RoBERTa, Amazon adds “fear” to Rekognition, and the New York Times looks at the humans behind data annotation.

You may also enjoy creating pop music with transformers, an improvement to the Adam optimizer, an on-device hand pose model, a TensorFlow implementation of U-GAT-IT, and a neat pair of AI-powered glasses.

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


Amazon says it can detect fear on your face

Amazon adds “fear” to the list of emotions detected by their Rekognition API, sparking concerns over accuracy and bias given its use by law enforcement.

Apple releases Turi Create 5.7

The latest update to the high-level training tool includes beta support for one-shot object detection.

PyTorch-Transformers 1.1.0 Released

The latest release of PyTorch-Transformers brings support for Facebook’s RoBERTa model.

Nvidia breaks records in training and inference for real-time conversational AI

Nvidia trains a normal-sized BERT model in 53 minutes and an 8.3 billion parameter version just because.

A.I. Is Learning From Humans. Many Humans.

The New York Times provides an in-depth profile of the thousands of human labelers creating datasets for companies to train models with.


Creating a Pop Music Generator with the Transformer

Applying transformer architectures to generate sequences of musical notations for virtual instruments to play.

SignalTrain: Profiling Audio Compressors with Deep Neural Networks

Paper and code for neural compression effects on audio signals.

New State of the Art AI Optimizer: Rectified Adam (RAdam). Improve your AI accuracy instantly versus Adam, and why it works.

A look at a promising improvement to the popular Adam optimizer that increases accuracy and decreases dependency on learning rates.

Joint Speech Recognition and Speaker Diarization via Sequence Transduction

A new technique from Google decreases error rates on speaker diarization by a factor of 10.

A 2019 Guide to Deep Learning-Based Image Compression

An overview of DL-based compression techniques.

On-Device, Real-Time Hand Tracking with MediaPipe

Google releases a hand pose tracking system powered by TensorFlow Lite and their new media pipe API.

Libraries & Code

[GitHub] taki0112/UGATIT

The official TensorFlow implementation of U-GAT-IT (a new GAN architecture).

[GitHub] nickbild/shaides

Effect change in your surroundings by wearing these AI-enabled glasses.

[GitHub] taokong/FoveaBox

FoveaBox: Beyond Anchor-based Object Detector

Papers & Publications

Seq2SQL: Generating Structured Queries from natural language using reinforcement learning

Abstract: ….We propose Seq2SQL, a deep neural network for translating natural language questions to corresponding SQL queries. Our model leverages the structure of SQL queries to significantly reduce the output space of generated queries. Moreover, we use rewards from in-the-loop query execution over the database to learn a policy to generate unordered parts of the query, which we show are less suitable for optimization via cross entropy loss. In addition, we will publish WikiSQL, a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia. This dataset is required to train our model and is an order of magnitude larger than comparable datasets….

Explicit Shape Encoding for Real-Time Instance Segmentation

Abstract: In this paper, we propose a novel top-down instance segmentation framework based on explicit shape encoding, named \textbf{ESE-Seg}. It largely reduces the computational consumption of the instance segmentation by explicitly decoding the multiple object shapes with tensor operations, thus performs the instance segmentation at almost the same speed as the object detection. ESE-Seg is based on a novel shape signature Inner-center Radius (IR), Chebyshev polynomial fitting and the strong modern object detectors. ESE-Seg with YOLOv3 outperforms the Mask R-CNN on Pascal VOC 2012 at mAP r@0.5 while 7 times faster.