Deep Learning Weekly Issue #174
AlphaFold solves protein folding with AI, building mobile ML pipelines, shrinking huge language models, and more
This week in deep learning we bring you AlphaFold: a solution to a 50-year-old grand challenge of protein folding in biology, this article about using image recognition to solve problems in the animal kingdom, and Amazon's Panorama, a device that adds machine learning technology to any camera.
You may also enjoy this tutorial on Siamese networks with Keras, TensorFlow, and Deep Learning, Facebook's NeuralProphet, a time series forecasting model, and more!
Research groups across the world are turning to image recognition AI, the same kind used for human facial recognition and identification, to solve problems in the animal kingdom
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
AlphaFold: a solution to a 50-year-old grand challenge in biology
In a major scientific breakthrough, the latest version of AlphaFold has been recognized as a solution to one of biology's grand challenges - the “protein folding problem”.
Shrinking massive neural networks used to model language
A new approach could lower computing costs and increase accessibility to state-of-the-art natural language processing.
DoNotPay, the AI lawyer for your inbox, now lets you report businesses for tax fraud
Plus, browser integration is on the way
We should hold animals accountable for their crimes. Here's how AI can help
You have the right to remain silent. Anything you roar, bark, chirp, meow, or growl can and will be used against you in a court of law.
Nvidia launches Monai framework to speed up AI healthcare training
Nvidia Corp. is stepping up its efforts in healthcare with today’s launch of its Medical Open Network for AI, or Monai, an open-source framework that’s used to train artificial intelligence-powered models for medical imaging.
Mobile + Edge
Next Generation Machine Learning and Deep Learning Infrastructure
In this post, the author outlines the key considerations in building a mobile-focused ML pipeline from end-to-end.
AWS announces Panorama, a device that adds machine learning technology to any camera
AWS has launched a new hardware device, the AWS Panorama Appliance, which, alongside the AWS Panorama SDK, will transform existing on-premises cameras into computer vision enabled super-powered surveillance devices.
Creating an Image Classifier Model
Train a machine learning model to classify images, and add it to your Core ML app.
Human Pose Estimation in Android using Fritz AI
Add pose estimation capabilities to your Android applications.
Siamese networks with Keras, TensorFlow, and Deep Learning
In this tutorial you will learn how to implement and train siamese networks using Keras, TensorFlow, and Deep Learning.
University of Alberta U^2-Net: Generating Realistic Pencil Portraits Using Salient Object Detection
University of Alberta recently proposed U^2-Net, a novel deep network architecture that achieves very competitive performance in salient object detection.
USC & Amazon ‘SLADE’ Self-Training Framework Uses Unlabelled Data to Improve Information Retrieval
The SeLf-trAining framework for Distance mEtric learning (SLADE) framework combines self-supervised learning and distance metric learning methods to improve information retrieval performance.
Do We Really Need Green Screens for High-Quality Real-Time Human Matting?
Researchers from the City University of Hong Kong and SenseTime propose a lightweight matting objective decomposition network (MODNet) that can smoothly process real-time human matting from a single input image with diverse and dynamic backgrounds.
Libraries & Code
NeuralProphet - a Neural Network based Time-Series model.
A PyTorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
Papers & Publications
Layered Neural Rendering for Retiming People in Video
Abstract: We present a method for retiming people in an ordinary, natural video---manipulating and editing the time in which different motions of individuals in the video occur. We can temporally align different motions, change the speed of certain actions (speeding up/slowing down, or entirely "freezing" people), or "erase" selected people from the video altogether. We achieve these effects computationally via a dedicated learning-based layered video representation, where each frame in the video is decomposed into separate RGBA layers, representing the appearance of different people in the video. A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate---e.g., shadows, reflections, and motion of loose clothing. The layers can be individually retimed and recombined into a new video, allowing us to achieve realistic, high-quality renderings of retiming effects for real-world videos depicting complex actions and involving multiple individuals, including dancing, trampoline jumping, or group running.
Navigating the GAN Parameter Space for Semantic Image Editing
Abstract: Generative Adversarial Networks (GANs) are currently an indispensable tool for visual editing, being a standard component of image-to-image translation and image restoration pipelines. Furthermore, GANs are especially useful for controllable generation since their latent spaces contain a wide range of interpretable directions, well suited for semantic editing operations. By gradually changing latent codes along these directions, one can produce impressive visual effects, unattainable without GANs. In this paper, we significantly expand the range of visual effects achievable with the state-of-the-art models, like StyleGAN2. In contrast to existing works, which mostly operate by latent codes, we discover interpretable directions in the space of the generator parameters. By several simple methods, we explore this space and demonstrate that it also contains a plethora of interpretable directions, which are an excellent source of non-trivial semantic manipulations. The discovered manipulations cannot be achieved by transforming the latent codes and can be used to edit both synthetic and real images. We release our code and models and hope they will serve as a handy tool for further efforts on GAN-based image editing.