Deep Learning Weekly: Issue #213

Forecasting Arctic ice conditions with AI, PnG Bert and Non-Attentive Tacotron for voice recreation, uses for Graph Neural Networks, a paper on image restoration using Swin Transformers, and more

Hey folks,

This week in deep learning, we bring you an AI tool that forecasts Arctic ice conditions, a new text-to-speech model that merges PnG Bert and Non-Attentive Tacotron for voice recreation, a paper on Pixel Difference Networks for edge detection and a paper on foundation models.

You may also enjoy Paige's AI-powered tech that diagnoses cancer using tissue samples, 3D pose detection using TensorFlow.js and GHUM, a repository of best practices on recommender systems, a paper on image restoration using Swin Transformers, and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!


AI tool predicts Arctic sea ice loss caused by climate change

A research team led by British Antarctic Survey (BAS) and The Alan Turing Institute have built IceNet, an AI tool that forecasts Arctic sea ice conditions.

Recreating Natural Voices for People with Speech Impairments

Google briefly describes the new text-to-speech synthesis model that merges PnG BERT and Non-Attentive Tacotron (NAT). This was recently used to recreate the voice of a former NFL player for Lou Gehrig Day.

Paige's AI Diagnostic Tech Is Revolutionizing Cancer Diagnosis

Paige uses deep learning to help pathologists make faster, more accurate cancer diagnoses from images of tissue samples. 

Without Code for DeepMind’s Protein AI, This Lab Wrote Its Own

In June, a full month before the publication of DeepMind’s manuscript, a team led by David Baker, director of the Institute for Protein Design at the University of Washington, released their own model for protein structure prediction.

Demetria Launches AI-based Agtech Solution to Boost the Growth of High Value Coffee

The first AI-powered taste and quality intelligence SaaS startup for the coffee supply chain unveils an application that identifies the successful reproduction of high value coffee seedlings.'s Signals platform lets businesses assess viability of new technologies

A platform that tracks and auto-analyzes more than 10 million emerging technologies in real time.

Mobile & Edge

Pixel Difference Networks for Efficient Edge Detection

A paper that proposes a simple, lightweight yet effective architecture named Pixel Difference Network (PiDiNet) for efficient edge detection.

Battery Life Cycle Predictor Powered by Edge Impulse

A TinyML model to predict the Lithium Ion battery's life cycle within a shorter time using Edge Impulse.

uTensor: TinyML Inference Library

An extremely light-weight machine learning inference framework built on Tensorflow and optimized for Arm targets. It consists of a runtime library and an offline tool that handles most of the model translation work.


3D Pose Detection with MediaPipe BlazePose GHUM and TensorFlow.js

TensorFlow’s technical tutorial on pose detection based on a statistical 3D human body model called GHUM.

Optimizing Elastic Deep Learning in GPU Clusters with AdaptDL for PyTorch

An introduction to AdaptDL, a resource-adaptive deep learning training and scheduling framework.

Applications of Graph Neural Networks

A visual article that describes the different applications and cases of Graph Neural Networks.

Principles of Good ML System Design

A comprehensive blog highlighting the necessary questions and checklists for an end-to-end ML solution.

Libraries & Code

microsoft/recommenders: Best Practices on Recommendation Systems

A repository that contains examples and best practices for building recommendation systems, provided as Jupyter notebooks

esimov/caire: Content aware image resize library

A content aware image resize library based on Seam Carving for Content-Aware Image Resizing paper.

aleju/imgaug: Image augmentation for machine learning experiments.

 A python library helps you with augmenting images for your machine learning projects.

Papers & Publications

On the Opportunities and Risks of Foundation Models


AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities, and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

SwinIR: Image Restoration Using Swin Transformer


Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by up to 0.14∼0.45dB, while the total number of parameters can be reduced by up to 67%.