Deep Learning Weekly Issue #106
Face recognition, AI chips from Tesla, faster GPUs on Colab, RNNs for text summarization, painting with CNNs
Hi folks,
This week in deep learning we bring you a simple (and creepy) facial recognition system, new AI chips from Tesla, OpenAI Dota results, and faster T4 GPUs on Google Colab that you can use to train your own GPT-2 text generator.
From there you can read up on 2D pose estimation, using RNNs for text summarization, and a new technique from Amazon that improves speech recognition.
As always happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
We Built an ‘Unbelievable’ (but Legal) Facial Recognition Machine [The New York Times]
The team behind the Privacy Project at the New York Times discusses their recent work applying facial recognition models to public camera feeds from Bryant Park to positively identify at least one person in just 9 hours of data collection.
How to train your AI five [OpenAI]
OpenAI's reinforcement learning agents won 99.4% of over 7000 Dota games in a global internet tournament hosted over the weekend.
Facebook's AI extracts playable characters from real-world videos [VentureBeat]
New research from Facebook extracts objects from live video, generates new video frames of the object based on user inputs, then seamless blends these new frames back into the original video stream.
Tesla’s new self-driving chip is here [The Verge]
Tesla held it's autonomy event this week and promised fleets of self-driving robotaxis as early as next year (we've heard that before). The cars are powered by a new AI chip capable of processing images at a staggering 2300 frames per second.
A neural network for summarizing scientific papers. [MIT News]
A novel RNN architecture termed Rotational Unit of Memory (RUM) produces much better summaries of long, complicated text.
Google makes new Telsa T4 GPUs available on Colab Notebooks
Users of Google's Colab notebooks have noticed that new NVIDIA Tesla T4 GPUs are available in place of the older K80s.
Learning
A 2019 guide to Human Pose Estimation with Deep Learning [nanonets.com]
A great summary of the 2D pose estimation literature.
Taming RNNs for better summarization [abigailsee.com]
A nice explainer on how RNNs can be used to summarize text.
Using Wake Word Acoustics to Filter Out Background Speech Improves Speech Recognition by 15% [amazon.com]
A new technique from the Alexa team improves speech recognition by comparing acoustic properties of the triggering audio with that of the query audio.
Libraries & Code
[Github] minimaxir/gpt-2-simple: Python package for retraining GPT-2 text generation models on Google Colab (for free).
A great reason to try out the new Tesla T4 GPUs.
[Github] mosheman5/DNP: Audio Denoising with Deep Network Priors
This repository provides a PyTorch implementation of "Audio Denoising with Deep Network Priors"
[Github] reiinakano/neural-painters: Neural painters
A generative model for brushstrokes learned from a real non-differentiable and non-deterministic painting program.
Papers & Publications
CenterNet: Keypoint Triplets for Object Detection
Abstract: ....This paper presents an efficient solution which explores the visual patterns within each cropped region with minimal costs. We build our framework upon a representative one-stage keypoint-based detector named CornerNet. Our approach, named CenterNet, detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall. Accordingly, we design two customized modules named cascade corner pooling and center pooling, which play the roles of enriching information collected by both top-left and bottom-right corners and providing more recognizable information at the central regions, respectively. On the MS-COCO dataset, CenterNet achieves an AP of 47.0%, which outperforms all existing one-stage detectors by at least 4.9%....
TextCaps : Handwritten Character Recognition with Very Small Datasets
Abstract: Many localized languages struggle to reap the benefits of recent advancements in character recognition systems due to the lack of substantial amount of labeled training data. This is due to the difficulty in generating large amounts of labeled data for such languages and inability of deep learning techniques to properly learn from small number of training samples. We solve this problem by introducing a technique of generating new training samples from the existing samples, with realistic augmentations which reflect actual variations that are present in human hand writing, by adding random controlled noise to their corresponding instantiation parameters...