Discover more from Deep Learning Weekly
Deep Learning Weekly Issue #106
Face recognition, AI chips from Tesla, faster GPUs on Colab, RNNs for text summarization, painting with CNNs
This week in deep learning we bring you a simple (and creepy) facial recognition system, new AI chips from Tesla, OpenAI Dota results, and faster T4 GPUs on Google Colab that you can use to train your own GPT-2 text generator.
From there you can read up on 2D pose estimation, using RNNs for text summarization, and a new technique from Amazon that improves speech recognition.
As always happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
The team behind the Privacy Project at the New York Times discusses their recent work applying facial recognition models to public camera feeds from Bryant Park to positively identify at least one person in just 9 hours of data collection.
OpenAI's reinforcement learning agents won 99.4% of over 7000 Dota games in a global internet tournament hosted over the weekend.
New research from Facebook extracts objects from live video, generates new video frames of the object based on user inputs, then seamless blends these new frames back into the original video stream.
Tesla held it's autonomy event this week and promised fleets of self-driving robotaxis as early as next year (we've heard that before). The cars are powered by a new AI chip capable of processing images at a staggering 2300 frames per second.
A novel RNN architecture termed Rotational Unit of Memory (RUM) produces much better summaries of long, complicated text.
Users of Google's Colab notebooks have noticed that new NVIDIA Tesla T4 GPUs are available in place of the older K80s.
A great summary of the 2D pose estimation literature.
A nice explainer on how RNNs can be used to summarize text.
Using Wake Word Acoustics to Filter Out Background Speech Improves Speech Recognition by 15% [amazon.com]
A new technique from the Alexa team improves speech recognition by comparing acoustic properties of the triggering audio with that of the query audio.
Libraries & Code
[Github] minimaxir/gpt-2-simple: Python package for retraining GPT-2 text generation models on Google Colab (for free).
A great reason to try out the new Tesla T4 GPUs.
This repository provides a PyTorch implementation of "Audio Denoising with Deep Network Priors"
A generative model for brushstrokes learned from a real non-differentiable and non-deterministic painting program.
Papers & Publications
Abstract: ....This paper presents an efficient solution which explores the visual patterns within each cropped region with minimal costs. We build our framework upon a representative one-stage keypoint-based detector named CornerNet. Our approach, named CenterNet, detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall. Accordingly, we design two customized modules named cascade corner pooling and center pooling, which play the roles of enriching information collected by both top-left and bottom-right corners and providing more recognizable information at the central regions, respectively. On the MS-COCO dataset, CenterNet achieves an AP of 47.0%, which outperforms all existing one-stage detectors by at least 4.9%....
Abstract: Many localized languages struggle to reap the benefits of recent advancements in character recognition systems due to the lack of substantial amount of labeled training data. This is due to the difficulty in generating large amounts of labeled data for such languages and inability of deep learning techniques to properly learn from small number of training samples. We solve this problem by introducing a technique of generating new training samples from the existing samples, with realistic augmentations which reflect actual variations that are present in human hand writing, by adding random controlled noise to their corresponding instantiation parameters...