Deep Learning Weekly: Issue #199

Google's next-gen language model, a deep learning mobile system that guides blind runners, IBM’s 14-million dataset for programming tasks, Facebook’s unsupervised version of wav2vec, and more

Hey folks,

This week in deep learning, we bring you Google's AI advancements in conversational models and hardware, an AI startup that manages and analyzes video content, IBM's 14 million-sample dataset for code automation and programming tasks and Facebook's unsupervised speech recognition model based on GANs.

You may also enjoy a deep learning mobile system that guides blind runners, the new Tensorflow.js pose detection API, an error analysis library, a paper on unified networks for multiple tasks, and more!

As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.

Until next week!


Google shows off advances in conversational AI, search and TPU chips

Google announces a next-generation conversational language model called LaMDA, a search model called MUM that’s a thousand times more powerful than its BERT-based counterpart and the fourth generation of their TPU chips.

Netra: Improving the way videos are organized

Netra, co-founded by Shashi Kant, uses artificial intelligence to help companies search, sort, manage and analyze video content.

Can we teach AI how to code? Welcome to IBM's Project CodeNet

IBM’s AI research division released a 14-million-sample dataset called Project CodeNet for code automation and other programming tasks.

China fund managers rely on AI to manage trading data and pick stocks

Zheshang Fund Management launched the Zheshang Intelligent Industry Preferred Hybrid Fund, which has gained 68.34% since its launch, after China Asset Management announced its partnership with Toronto-based AI company

Chinese startup gets approval to test driverless vehicles in California

Chinese autonomous vehicle startup has a permit to test its driverless cars without human safety drivers behind the wheel on specified streets in Fremont, Milpitas and Irvine.

Mobile & Edge

Project Guideline: Enabling Those with Low Vision to Run Independently

Guideline is an upcoming mobile system, developed in partnership with Guiding Eyes for the Blind, that uses a three-stage custom trained MobileNet to guide runners through a variety of environments.

Accelerating Edge Computing with a Smarter Network

NVIDIA just announced an expansion of the NVIDIA EGX platform and ecosystem, improving edge compute efficiency and many more.

Imago VisionAI camera supports Tensorflow Lite and AutoML Vision Edge

A freely programmable smart camera designed for computer vision tasks offers support for Tensorflow Lite and AutoML Vision Edge.

Accelerating AI Modules for ROS and ROS 2 on NVIDIA Jetson Platform

A compilation of Jetson demonstrations, repositories and examples on accelerated AI applications for open-source robotics frameworks including ROS and ROS 2.


wav2vec Unsupervised: Speech recognition without supervision

Facebook developed a GAN-based model for building speech recognition systems that require no transcribed data called wav2vec Unsupervised (wav2vec-U).

Switching from Spreadsheets to and How It Pushed My Model Building Process to the Next Level

An introductory article that illustrates the advantages of using Neptune’s experiment tracking platform over the classic spreadsheet technique.

High Fidelity Pose Tracking with MediaPipe BlazePose and TensorFlow.js

A brief article showcasing the new TensorFlow.js pose-detection API along with a tutorial on how to get started using it.

KELM: Integrating Knowledge Graphs with Language Model Pre-training Corpora

Google explores converting knowledge graphs to synthetic natural language sentences to augment existing pre-training corpora via a verbalization pipeline.

Libraries & Code

danielegrattarola/spektral: Graph Neural Networks with Keras and Tensorflow 2.

Spektral is a python library for graph deep learning, based on the Keras API and TensorFlow 2.

microsoft/responsible-ai-widgets: This project provides responsible AI user interfaces for Fairlearn, interpret-community, and Error Analysis, as well as foundational building blocks that they rely on.

A library that provides responsible AI user interfaces for Fairlearn, interpret-community, and Error Analysis.

iterative/dvc: Data Version Control | Git for Data & Models

DVC is an open-source tool for data science and machine learning projects that aims to replace spreadsheet and document sharing tools.

Papers & Publications

You Only Learn One Representation: Unified Network for Multiple Tasks


People “understand'' the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These experiences learned through normal learning or subconsciously will be encoded and stored in the brain. Using these abundant experiences as a huge database, human beings can effectively process data, even if they were unseen beforehand. In this paper, we propose a unified network to encode implicit knowledge and explicit knowledge together, just like the human brain can learn knowledge from normal learning as well as subconsciousness learning. The unified network can generate a unified representation to simultaneously serve various tasks. We can perform kernel space alignment, prediction refinement, and multi-task learning in a convolutional neural network. The results demonstrate that when implicit knowledge is introduced into the neural network, it benefits the performance of all tasks. We further analyze the implicit representation learnt from the proposed unified network, and it shows great capability on catching the physical meaning of different tasks.

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement


Recently, the power of unconditional image synthesis has significantly advanced through the use of Generative Adversarial Networks (GANs). The task of inverting an image into its corresponding latent code of the trained GAN is of utmost importance as it allows for the manipulation of real images, leveraging the rich semantics learned by the network. Recognizing the limitations of current inversion approaches, in this work we present a novel inversion scheme that extends current encoder-based inversion methods by introducing an iterative refinement mechanism. Instead of directly predicting the latent code of a given real image using a single pass, the encoder is tasked with predicting a residual with respect to the current estimate of the inverted latent code in a self-correcting manner. Our residual-based encoder, named ReStyle, attains improved accuracy compared to current state-of-the-art encoder-based methods with a negligible increase in inference time. We analyze the behavior of ReStyle to gain valuable insights into its iterative nature. We then evaluate the performance of our residual encoder and analyze its robustness compared to optimization-based inversion and state-of-the-art encoders.

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network


Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps. In this paper, we focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction. Specifically, we reveal that the attribute transformations, such as illumination and color manipulation, relate more to the low-frequency component, while the content details can be adaptively refined on high-frequency components. We consequently propose a Laplacian Pyramid Translation Network (LPTN) to simultaneously perform these two tasks, where we design a lightweight network for translating the low-frequency component with reduced resolution and a progressive masking strategy to efficiently refine the high-frequency ones. Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details. Extensive experimental results on various tasks demonstrate that the proposed method can translate 4K images in real-time using one normal GPU while achieving comparable transformation performance against existing methods.