Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #199
Google's next-gen language model, a deep learning mobile system that guides blind runners, IBM’s 14-million dataset for programming tasks, Facebook’s unsupervised version of wav2vec, and more
This week in deep learning, we bring you Google's AI advancements in conversational models and hardware, an AI startup that manages and analyzes video content, IBM's 14 million-sample dataset for code automation and programming tasks and Facebook's unsupervised speech recognition model based on GANs.
You may also enjoy a deep learning mobile system that guides blind runners, the new Tensorflow.js pose detection API, an error analysis library, a paper on unified networks for multiple tasks, and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
Google announces a next-generation conversational language model called LaMDA, a search model called MUM that’s a thousand times more powerful than its BERT-based counterpart and the fourth generation of their TPU chips.
Netra, co-founded by Shashi Kant, uses artificial intelligence to help companies search, sort, manage and analyze video content.
IBM’s AI research division released a 14-million-sample dataset called Project CodeNet for code automation and other programming tasks.
Zheshang Fund Management launched the Zheshang Intelligent Industry Preferred Hybrid Fund, which has gained 68.34% since its launch, after China Asset Management announced its partnership with Toronto-based AI company Boosted.ai.
Chinese autonomous vehicle startup Pony.ai has a permit to test its driverless cars without human safety drivers behind the wheel on specified streets in Fremont, Milpitas and Irvine.
Mobile & Edge
Guideline is an upcoming mobile system, developed in partnership with Guiding Eyes for the Blind, that uses a three-stage custom trained MobileNet to guide runners through a variety of environments.
NVIDIA just announced an expansion of the NVIDIA EGX platform and ecosystem, improving edge compute efficiency and many more.
A freely programmable smart camera designed for computer vision tasks offers support for Tensorflow Lite and AutoML Vision Edge.
A compilation of Jetson demonstrations, repositories and examples on accelerated AI applications for open-source robotics frameworks including ROS and ROS 2.
Facebook developed a GAN-based model for building speech recognition systems that require no transcribed data called wav2vec Unsupervised (wav2vec-U).
An introductory article that illustrates the advantages of using Neptune’s experiment tracking platform over the classic spreadsheet technique.
A brief article showcasing the new TensorFlow.js pose-detection API along with a tutorial on how to get started using it.
Google explores converting knowledge graphs to synthetic natural language sentences to augment existing pre-training corpora via a verbalization pipeline.
Libraries & Code
Spektral is a python library for graph deep learning, based on the Keras API and TensorFlow 2.
microsoft/responsible-ai-widgets: This project provides responsible AI user interfaces for Fairlearn, interpret-community, and Error Analysis, as well as foundational building blocks that they rely on.
A library that provides responsible AI user interfaces for Fairlearn, interpret-community, and Error Analysis.
DVC is an open-source tool for data science and machine learning projects that aims to replace spreadsheet and document sharing tools.
Papers & Publications
People “understand'' the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These experiences learned through normal learning or subconsciously will be encoded and stored in the brain. Using these abundant experiences as a huge database, human beings can effectively process data, even if they were unseen beforehand. In this paper, we propose a unified network to encode implicit knowledge and explicit knowledge together, just like the human brain can learn knowledge from normal learning as well as subconsciousness learning. The unified network can generate a unified representation to simultaneously serve various tasks. We can perform kernel space alignment, prediction refinement, and multi-task learning in a convolutional neural network. The results demonstrate that when implicit knowledge is introduced into the neural network, it benefits the performance of all tasks. We further analyze the implicit representation learnt from the proposed unified network, and it shows great capability on catching the physical meaning of different tasks.
Recently, the power of unconditional image synthesis has significantly advanced through the use of Generative Adversarial Networks (GANs). The task of inverting an image into its corresponding latent code of the trained GAN is of utmost importance as it allows for the manipulation of real images, leveraging the rich semantics learned by the network. Recognizing the limitations of current inversion approaches, in this work we present a novel inversion scheme that extends current encoder-based inversion methods by introducing an iterative refinement mechanism. Instead of directly predicting the latent code of a given real image using a single pass, the encoder is tasked with predicting a residual with respect to the current estimate of the inverted latent code in a self-correcting manner. Our residual-based encoder, named ReStyle, attains improved accuracy compared to current state-of-the-art encoder-based methods with a negligible increase in inference time. We analyze the behavior of ReStyle to gain valuable insights into its iterative nature. We then evaluate the performance of our residual encoder and analyze its robustness compared to optimization-based inversion and state-of-the-art encoders.
Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps. In this paper, we focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction. Specifically, we reveal that the attribute transformations, such as illumination and color manipulation, relate more to the low-frequency component, while the content details can be adaptively refined on high-frequency components. We consequently propose a Laplacian Pyramid Translation Network (LPTN) to simultaneously perform these two tasks, where we design a lightweight network for translating the low-frequency component with reduced resolution and a progressive masking strategy to efficiently refine the high-frequency ones. Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details. Extensive experimental results on various tasks demonstrate that the proposed method can translate 4K images in real-time using one normal GPU while achieving comparable transformation performance against existing methods.