Deep Learning Weekly: Issue #187
Come work with us! We're accepting applications for an Associate Editor. Plus, detecting defects in manufactured products with DL, debugging neural networks, and a "Holistic" pipeline for on-device ML
Hey folks,
Before we jump into to this week’s newsletter, we have a favor to ask…come work with us! Deep Learning Weekly is seeking an Associate Editor. This is a part-time/contract position, perfect for someone who wants to share their passion for deep learning with the world. If you’re interested and think you’d be a good fit, we’d love to hear from you.
Now, back to the best news from the week in deep learning…
This week in deep learning we bring you Amazon's computer vision service to detect defects in manufactured products, Microsoft’s Azure Percept platform for bringing more of its Azure AI services to the edge, MIT’s student-run Driverless organization that partners with industry collaborators to develop and test autonomous technologies in real-world racing scenarios, and some practical tips for building and debugging neural networks.
You may also enjoy the official PyTorch package for the discrete VAE used for Open AI’s DALL·E, this PyTorch library with implementations of representative GANs for conditional/unconditional image generation and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
New AI ‘Deep Nostalgia’ brings old photos, including very old ones, to life
It seems like a nice idea in theory but it’s a tiny bit creepy as well.
Amazon launches computer vision service to detect defects in manufactured products
Amazon recently announced the general availability of Amazon Lookout for Vision, a cloud service that analyzes images using computer vision to spot product or process defects and anomalies in manufactured goods.
Driving on the cutting edge of autonomous vehicle tech
Leveraging research done on campus, student-run MIT Driverless partners with industry collaborators to develop and test autonomous technologies in real-world racing scenarios.
As China Rises, the US Builds Toward a Bigger Role in AI
After decades of staying out of industrial policy, a Pentagon-appointed commission recommends more spending on research and support for US chip makers.
Why a YouTube Chat About Chess Got Flagged for Hate Speech
AI programs that analyze language have difficulty gauging context. Words such as “black,” “white,” and “attack" can have different meanings.
An AI is training counselors to deal with teens in crisis
The Trevor Project, America’s hotline for LGBT youth, is turning to a GPT-2-powered chatbot to help troubled teenagers—but it’s setting strict limits.
Mobile + Edge
Microsoft launches Azure Percept, its new hardware and software platform to bring AI to the edge
Microsoft recently announced Azure Percept, its new hardware and software platform for bringing more of its Azure AI services to the edge.
How edge computing can help save the environment
One expert says caching content at the edge and closer to its users prevents more carbon emissions.
Simultaneously detecting face, hand motion, and pose in real-time on mobile devices
A look at Holistic Tracking — a new ML solution in MediaPipe that optimally integrates three widely-used machine vision capabilities.
Learning
Lyra: A New Very Low-Bitrate Codec for Speech Compression
Lyra is a new high-quality, very low-bitrate speech codec that leverages advances in ML to make voice communication possible even on the slowest networks.
Yoshua Bengio Team Proposes Causal Learning to Solve the ML Model Generalization Problem
A group of researchers make an effort to bring together causality and machine learning research programs, delineate implications of causality for machine learning, and propose critical areas for future research.
Simple considerations for simple people building fancy neural networks
In this post, the author highlights a few steps of their mental process when it comes to building and debugging neural networks.
A research team from Facebook AI has proposed a Unified Transformer (UniT) encoder-decoder model that jointly trains on multiple tasks across different modalities and achieves strong performance on seven tasks with a unified set of model parameters.
A team from Microsoft and Université de Montréal proposes a new mathematical framework that uses measure theory and integral operators to achieve the goal of quantifying the regularity of the attention operation.
Code
This is the official PyTorch package for the discrete VAE used for DALL·E.
[GitHub] POSTECH-CVLab/PyTorch-StudioGAN
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.
Papers
Zero-Shot Text-to-Image Generation
Abstract: Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.
How to represent part-whole hierarchies in a neural network
Abstract: This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language.