Deep Learning Weekly Issue #146

Google's reading companion, AI pixel-art, AR + AI cut and paste, guitar effect emulation, and more...

Hey folks,

This week in deep learning, we bring you a new app from Google that uses audio recognition to help kids with reading, an state-of-the-art depth model from Facebook, a federated learning project to detect brain tumors from Intel, a partnership between Hailo and Foxconn for AI-specific chips, and an impressive DIY AI stethoscope for under a dollar.

You may also enjoy a podcast discussing the promise of TinyML, an incredible AR + AI project that lets you paste objects from the real world into documents on your laptop, a simple but effective pruning technique from MIT, audio style transfer, and research that suggests state-of-the-art models aren’t all that different after all, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


New Google Lens features to help you be more productive at home

Google Lens now lets users take pictures of hand written notes and copy text directly to your laptop.

Google launches ‘Read Along,’ a free app that helps young children practice reading

A new app from Google uses audio recognition to help kids with reading skills.

Facebook’s 3D Photos feature now simulates depth for any image

Facebook unveiled a new model that estimates high quality, consistent depth maps from 2D images.

Intel partners with Penn Medicine to develop brain tumor classifier with federated learning

Federated learning used by research institutions to train tumor classifiers on disparate data sources to preserve privacy.

Hailo, Foxconn, and Socionext partner for AI processing, video analytics at the edge

Chipmaker Hailo has partnered with Foxconn to produce a new SoC for applying AI to video on edge devices.

Mobile + Edge

Why TinyML will be huge

A new podcast explores ML on embedded devices with Pete Warden.

A Tale of Model Quantization in TF Lite

A nice exploration of performance versus speed / size tradeoffs (or lack thereof) with TensorFlow Lite.

An All-Neural On-Device Speech Recognizer

Google’s on-device speech recognition model now boasts higher accuracy than their own production server-side models.

Digital Stethoscope AI

How to build a digital AI stethoscope with $1 worth of equipment.


A foolproof way to shrink deep learning models

New pruning technique from researchers at MIT is both simple and effective. Train, prune, retrain, and repeat.

DDSP: Differentiable Digital Signal Processing

Impressive audio style transfer project.

Deep Learning for Guitar Effect Emulation

Training neural networks to emulate guitar pedal effects.

New efficientnet checkpoints

New pre-trained efficientnet checkpoints are now available on TensorFlow Hub.

Facebook’s AutoScale decides if AI inference runs on your phone or in the cloud

New research from Facebook details a system capable of choosing which platform to run model inference on based on available resources and network latency.


Convert selfies to pixel art.

Libraries & Code

[GitHub] cyrildiagne/ar-cutpaste

Incredible demonstration using AI and AR to cut salient objects out of images and beam them directly into documents on your laptop.

[GitHub] yemount/pose-animator/

Fun mashup of browser-based pose estimation with TFJS and real-time SVG animation.

Papers & Publications

A Metric Learning Reality Check

Abstract: Deep metric learning papers from the past four years have consistently claimed great advances in accuracy, often more than doubling the performance of decade-old methods. In this paper, we take a closer look at the field to see if this is actually true. We find flaws in the experimental setup of these papers, and propose a new way to evaluate metric learning algorithms. Finally, we present experimental results that show that the improvements over time have been marginal at best.

Big Transfer (BiT): General Visual Representation Learning

Abstract: Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.