Deep Learning Weekly Issue #182

Building a GPT-3 product, combining GANs+federated learning for health data, detecting emotion on mobile, & more

Hey folks,

This week in deep learning we bring you three mysteries in deep learning: ensembles, knowledge distillation, and self-distillation, what it takes to create a GPT-3 product, Microsoft researchers combination of GANs and federated learning for anonymous data sharing for health care providers, and this research that suggests that deep learning doesn’t need to be a black box.

You may also enjoy learning how to perform custom object detection in the browser using TensorFlow.js, this tutorial on detecting facial emotions using TFLite, this technical review of optimization basics, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


What it takes to create a GPT-3 product

Developments so far show that those who stand to benefit the most from GPT-3 are companies that already wield much of the power in AI, not the ones who want to start from scratch.

Microsoft researchers tap AI for anonymous data sharing for health care providers

Researchers at Microsoft and the University of British Columbia developed a framework called Federated Learning with a Centralized Adversary (FELICIA) that combines GANs with federated learning to enable stakeholders like medical centers to collaborate with each other and improve models in a privacy-preserving, distributed data-sharing way.

Google is investigating another top AI ethicist

Following the dismissal of Timnit Gebru late last year.

Using artificial intelligence to manage extreme weather events

Can combining deep learning with social network analysis make social media contributions about extreme weather events a useful tool for crisis managers, first responders and government scientists?

These virtual robot arms get smarter by training each other

By playing a game in which one tries to outsmart the other, OpenAI’s bots can learn to solve a wide range of problems without retraining.

Mobile + Edge

Custom object detection in the browser using TensorFlow.js

In this post, the author develops an end-to-end solution using TensorFlow to train a custom object-detection model in Python, then put it into production, and run real-time inferences in the browser through TensorFlow.js.

Detect Facial Emotions on Mobile and IoT Devices Using TensorFlow Lite

Training a Keras-VGG16 model for facial emotion recognition on low-power devices.

Edge computing is about to solve the IoT’s biggest problems

Low latency and decentralised servers will boost the power and spread of Internet of Things devices.


3 deep learning mysteries: Ensemble, knowledge- and self-distillation

Do neural networks trained from different random initializations actually learn very different functions?

Synced Tradition and Machine Learning Series | Part 2: Optimization Basics

This is the second in a special Synced series of introductory articles on traditionally theoretical fields of studies and their impact on modern-day machine learning.

Stabilizing Live Speech Translation in Google Translate

Learn how masking and biasing enable high accuracy with low erasure and minimal lag in the live transcription feature in the Google Translate app.

Deep learning doesn’t need to be a black box

In a paper published in the peer-reviewed journal Nature Machine Intelligence, scientists at Duke University propose “concept whitening,” a technique that can help steer neural networks toward learning specific concepts without sacrificing performance.


[GitHub] naver-ai/relabel_imagenet

Official PyTorch implementation of Re-labeling ImageNet

[GitHub] landmark-detection

Four landmark detection algorithms, implemented in PyTorch.


A new open data set for multilingual speech research

Facebook AI is releasing Multilingual LibriSpeech (MLS), a large-scale, open source data set designed to help advance research in automatic speech recognition (ASR).

Papers & Publications

Persistent Anti-Muslim Bias in Large Language Models

Abstract: It has been observed that large-scale language models capture undesirable societal biases, e.g. relating to race and gender; yet religious bias has been relatively unexplored. We demonstrate that GPT-3, a state-of-the-art contextual language model, captures persistent Muslim-violence bias. We probe GPT-3 in various ways, including prompt completion, analogical reasoning, and story generation, to understand this anti-Muslim bias, demonstrating that it appears consistently and creatively in different uses of the model and that it is severe even compared to biases about other religious groups. For instance, "Muslim" is analogized to "terrorist" in 23% of test cases, while "Jewish" is mapped to "money" in 5% of test cases. We quantify the positive distraction needed to overcome this bias with adversarial text prompts, and find that use of the most positive 6 adjectives reduces violent completions for "Muslims" from 66% to 20%, but which is still higher than for other religious groups.

CheXtransfer: Performance and Parameter Efficiency of ImageNet Models for Chest X-Ray Interpretation

Abstract: Deep learning methods for chest X-ray interpretation typically rely on pretrained models developed for ImageNet. This paradigm assumes that better ImageNet architectures perform better on chest X-ray tasks and that ImageNet-pretrained weights provide a performance boost over random initialization. In this work, we compare the transfer performance and parameter efficiency of 16 popular convolutional architectures on a large chest X-ray dataset (CheXpert) to investigate these assumptions. First, we find no relationship between ImageNet performance and CheXpert performance for both models without pretraining and models with pretraining. Second, we find that, for models without pretraining, the choice of model family influences performance more than size within a family for medical imaging tasks. Third, we observe that ImageNet pretraining yields a statistically significant boost in performance across architectures, with a higher boost for smaller architectures. Fourth, we examine whether ImageNet architectures are unnecessarily large for CheXpert by truncating final blocks from pretrained models, and find that we can make models 3.25x more parameter-efficient on average without a statistically significant drop in performance. Our work contributes new experimental evidence about the relation of ImageNet to chest x-ray interpretation performance.