Deep Learning Weekly: Issue #213
Forecasting Arctic ice conditions with AI, PnG Bert and Non-Attentive Tacotron for voice recreation, uses for Graph Neural Networks, a paper on image restoration using Swin Transformers, and more
This week in deep learning, we bring you an AI tool that forecasts Arctic ice conditions, a new text-to-speech model that merges PnG Bert and Non-Attentive Tacotron for voice recreation, a paper on Pixel Difference Networks for edge detection and a paper on foundation models.
You may also enjoy Paige's AI-powered tech that diagnoses cancer using tissue samples, 3D pose detection using TensorFlow.js and GHUM, a repository of best practices on recommender systems, a paper on image restoration using Swin Transformers, and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
A research team led by British Antarctic Survey (BAS) and The Alan Turing Institute have built IceNet, an AI tool that forecasts Arctic sea ice conditions.
Google briefly describes the new text-to-speech synthesis model that merges PnG BERT and Non-Attentive Tacotron (NAT). This was recently used to recreate the voice of a former NFL player for Lou Gehrig Day.
Paige uses deep learning to help pathologists make faster, more accurate cancer diagnoses from images of tissue samples.
In June, a full month before the publication of DeepMind’s manuscript, a team led by David Baker, director of the Institute for Protein Design at the University of Washington, released their own model for protein structure prediction.
The first AI-powered taste and quality intelligence SaaS startup for the coffee supply chain unveils an application that identifies the successful reproduction of high value coffee seedlings.
A platform that tracks and auto-analyzes more than 10 million emerging technologies in real time.
Mobile & Edge
A paper that proposes a simple, lightweight yet effective architecture named Pixel Difference Network (PiDiNet) for efficient edge detection.
A TinyML model to predict the Lithium Ion battery's life cycle within a shorter time using Edge Impulse.
An extremely light-weight machine learning inference framework built on Tensorflow and optimized for Arm targets. It consists of a runtime library and an offline tool that handles most of the model translation work.
TensorFlow’s technical tutorial on pose detection based on a statistical 3D human body model called GHUM.
An introduction to AdaptDL, a resource-adaptive deep learning training and scheduling framework.
A visual article that describes the different applications and cases of Graph Neural Networks.
A comprehensive blog highlighting the necessary questions and checklists for an end-to-end ML solution.
Libraries & Code
A repository that contains examples and best practices for building recommendation systems, provided as Jupyter notebooks
A content aware image resize library based on Seam Carving for Content-Aware Image Resizing paper.
A python library helps you with augmenting images for your machine learning projects.
Papers & Publications
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities, and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by up to 0.14∼0.45dB, while the total number of parameters can be reduced by up to 67%.