Deep Learning Weekly Issue #185

TensorFlow 3D, federated learning for reducing carbon emissions, one-shot music style transfer, and more

Hey folks,

This week in deep learning we bring you the release of TensorFlow 3D, a set of training and evaluation pipelines for state-of-the-art 3D segmentation and object detection, MIT researchers’ new hardware and software system that streamlines state-of-the-art sentence analysis, a chess program that learns from human error, and a study that shows that federated learning can lead to reduced carbon emissions.

You may also enjoy learning about one-shot music style transfer, unsupervised semantic segmentation and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!


A language learning system that pays attention — more efficiently than ever before

MIT researchers’ new hardware and software system streamlines state-of-the-art sentence analysis.

A New Artificial Intelligence Makes Mistakes—on Purpose

A chess program that learns from human error might be better at working with people or negotiating with them.

Auditors are testing hiring algorithms for bias, but big questions remain

AI audits may overlook certain types of bias, and they don’t necessarily verify that a hiring tool picks the best candidates for a job.

Study shows that federated learning can lead to reduced carbon emissions

In a newly published paper, researchers explore whether federated learning, which involves training models across a number of machines, can lead to lowered carbon emissions compared with traditional learning.

Chip startup NeuReality launches from stealth to make AI inference more efficient

NeuReality Ltd., a startup working to develop more efficient artificial intelligence chips, today exited stealth mode and disclosed on the occasion that it has raised $8 million in seed funding.

Mobile + Edge

Full-body Deepfakes, 3D Human Filters and more: New, enhanced AR features round the corner for Snapchat?

After Snap’s purchase of startup Ariel AI, let’s take a look at what lies in store for Snapchat’s AR offerings.

Future of compute will be big, small, smart and way out on the edge

Compute companies are thinking big in terms of expanding the capabilities of the cloud, but that will also mean putting more processing power into smaller devices, mass application of artificial intelligence and machine learning, and deploying workable, efficient solutions at the edge.

Edge Computing Brings the Cloud Closer to the Data for Agencies

NOAA, USDA and other agencies that collect mass amounts of data in the field rely on edge computing for fast analysis.


Rearranging the Visual World

Can models more efficiently learn rearrangement tasks by overlaying 3D space instead of using object-centric representations? Check out Transporter Nets, an open-source framework for sample-efficient robot manipulation, with related benchmark tasks.

Stanford University Deep Evolutionary RL Framework Demonstrates Embodied Intelligence via Learning and Evolution

Stanford researchers’ DERL (Deep Evolutionary Reinforcement Learning) is a novel computational framework that enables AI agents to evolve morphologies and learn challenging locomotion and manipulation tasks in complex environments using only low level egocentric sensory information.

DeepMind & UCL’s Alchemy Is a ‘Best-of-Both-Worlds’ 3D Video Game for Meta-RL

A research team from DeepMind and University College London have introduced Alchemy, an open-source benchmark for meta-RL research.

3D Scene Understanding with TensorFlow 3D

Google AI announced the release of TensorFlow 3D, a set of training and evaluation pipelines for state-of-the-art 3D semantic segmentation, object detection and instance segmentation, with support for distributed training.


[GitHub] jayleicn/ClipBERT

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning for image-text and video-text tasks.

[GitHub] deepmind/deepmind-research/tree/master/nfnets

This repository contains code for the ICLR 2021 paper "Characterizing signal propagation to close the performance gap in unnormalized ResNets."

Papers & Publications

Self-Supervised VQ-VAE For One-Shot Music Style Transfer

Abstract: Neural style transfer, allowing to apply the artistic style of one image to another, has become one of the most widely showcased computer vision applications shortly after its introduction. In contrast, related tasks in the music audio domain remained, until recently, largely untackled. While several style conversion methods tailored to musical signals have been proposed, most lack the 'one-shot' capability of classical image style transfer algorithms. On the other hand, the results of existing one-shot audio style transfer methods on musical inputs are not as compelling. In this work, we are specifically interested in the problem of one-shot timbre transfer. We present a novel method for this task, based on an extension of the vector-quantized variational autoencoder (VQ-VAE), along with a simple self-supervised learning strategy designed to obtain disentangled representations of timbre and pitch. We evaluate the method using a set of objective metrics and show that it is able to outperform selected baselines.

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

Abstract: Being able to learn dense semantic representations of images without supervision is an important problem in computer vision. However, despite its significance, this problem remains rather unexplored, with a few exceptions that considered unsupervised semantic segmentation on small-scale datasets with a narrow visual domain. In this paper, we make a first attempt to tackle the problem on datasets that have been traditionally utilized for the supervised case. To achieve this, we introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings. This marks a large deviation from existing works that relied on proxy tasks or end-to-end clustering. Additionally, we argue about the importance of having a prior that contains information about objects, or their parts, and discuss several possibilities to obtain such a prior in an unsupervised manner. Extensive experimental evaluation shows that the proposed method comes with key advantages over existing works. First, the learned pixel embeddings can be directly clustered in semantic groups using K-Means. Second, the method can serve as an effective unsupervised pre-training for the semantic segmentation task. In particular, when fine-tuning the learned representations using just 1% of labeled examples on PASCAL, we outperform supervised ImageNet pre-training by 7.1% mIoU. The code is available at this https URL.