Discover more from Deep Learning Weekly
Deep Learning Weekly Issue #176
6 ways AI can help save the planet, optical pre-processing for computer vision, Snap's new AR fund, and more
This week in deep learning we bring you 6 ways AI can help save the planet, the far-reaching impact of Dr. Timnit Gebru, an AI-powered tool that recovers text from pixelized screenshots, and how ApisProtect uses AI and IoT to protect bee hives.
You may also enjoy learning about how optical pre-processing can make computer vision more robust and energy efficient, slimmable GANs, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
6 ways AI can help save the planet
From facial recognition technology that monitors brown bear populations, to intelligent robots sorting recycling, these initiatives are having a positive impact on the environment
The Far-Reaching Impact of Dr. Timnit Gebru
Dr. Timnit Gebru's contributions range from circuit design at Apple to computer vision research at Stanford to her global leadership in AI Ethics
Optical pre-processing makes computer vision more robust and energy efficient
Hybrid neural network can reconstruct Arabic or Japanese characters that it hasn’t seen before.
Deep reinforcement-learning architecture combines pre-learned skills to create new sets of skills on the fly
A team of researchers from the University of Edinburgh and Zhejiang University has developed a way to combine deep neural networks (DNNs) to create a new type of system with a new kind of learning ability.
AI explainability specialist Truera closes $12M round
Truera Inc., a startup working to give enterprises better insight into how their artificial intelligence models make decisions, today said that it has closed a $12 million funding round led by Wing VC.
Mobile + Edge
MediaPipe Holistic — Simultaneous Face, Hand and Pose Prediction, on Device
MediaPipe Holistic provides a unified topology for a groundbreaking 540+ keypoints (33 pose, 21 per-hand and 468 facial landmarks) and achieves near real-time performance on mobile devices.
AI Algorithms Are Slimming Down to Fit in Your Fridge
Artificial intelligence programs typically are power guzzlers. New research shows how to generate computer vision from a simple, low-power chip.
Save the bees, save the world: How ApisProtect uses AI and IoT to protect hives
ApisProtect announced its entry into the US market where it will provide its unique AI-powered hive monitoring system to beekeepers and farmers.
Snap announces $3.5M fund directed toward AR Lens creation
Snap announced a new 2021 fund of $3.5 million that will be directed toward supporting Snapchat Lens creators and developers who are using the company’s Lens Studio tool to explore the use of AR technologies.
Portrait Light: Enhancing Portrait Lighting with Machine Learning
Google released Portrait Light, a new post-capture feature for the Pixel Camera and Google Photos apps that adds a simulated directional light source to portraits.
Detecting Sounds with Deep Learning
How to convert audio to images and analyze it with ResNeSt
Word2Vec: Out of the Black Box
Word2Vec says Best is to Worst as Catan is to Monopoly - but how does it know?
MIT CSAIL Uses Deep Generative Model StyleGAN2 to Deliver SOTA Image Reconstruction Results
CSAIL researchers propose a framework for image reconstruction tasks using the state-of-the-art generative model StyleGAN2.
Libraries & Code
This is an archive of code which was used to produce dataset and results available in the INLG 2020 paper: RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation
Papers & Publications
Slimmable Generative Adversarial Networks
Abstract: Generative adversarial networks (GANs) have achieved remarkable progress in recent years, but the continuously growing scale of models makes them challenging to deploy widely in practical applications. In particular, for real-time tasks, different devices require models of different sizes due to varying computing power. In this paper, we introduce slimmable GANs (SlimGANs), which can flexibly switch the width (channels of layers) of the generator to accommodate various quality-efficiency trade-offs at runtime. Specifically, we leverage multiple partial parameter-shared discriminators to train the slimmable generator. To facilitate the consistency between generators of different widths, we present a stepwise inplace distillation technique that encourages narrow generators to learn from wide ones. As for class-conditional generation, we propose a sliceable conditional batch normalization that incorporates the label information into different widths. Our methods are validated, both quantitatively and qualitatively, by extensive experiments and a detailed ablation study.
Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation
Abstract: Visual object tracking aims to precisely estimate the bounding box for the given target, which is a challenging problem due to factors such as deformation and occlusion. Many recent trackers adopt the multiple-stage tracking strategy to improve the quality of bounding box estimation. These methods first coarsely locate the target and then refine the initial prediction in the following stages. However, existing approaches still suffer from limited precision, and the coupling of different stages severely restricts the method's transferability. This work proposes a novel, flexible, and accurate refinement module called Alpha-Refine, which can significantly improve the base trackers' prediction quality. By exploring a series of design options, we conclude that the key to successful refinement is extracting and maintaining detailed spatial information as much as possible. Following this principle, Alpha-Refine adopts a pixel-wise correlation, a corner prediction head, and an auxiliary mask head as the core components. We apply Alpha-Refine to six famous base trackers to verify our method's effectiveness: DiMPsuper, DiMP50, ATOM, SiamRPN++, RT-MDNet, and ECO. Comprehensive experiments on TrackingNet, LaSOT, GOT-10K, and VOT2020 benchmarks show that our approach significantly improves the base tracker's performance with little extra latency. Code and pretrained model is available at this https URL.