Deep Learning Weekly Issue #157

Creative GPT-3 applications, working with custom NNs in SnapML, SOTA QA from Google, MLPerf benchmarks, & more

Hey folks,

This week in deep learning we bring you this analysis of last week's “Online Platforms and Market Power, Part 6: Examining the Dominance of Amazon, Apple, Facebook, and Google” hearing, Google AI's Big Bird model that achieved SOTA results in question answering and summarization, and this tool to cloak your photos against facial recognition.

You may also enjoy this real-time HDR+ image prediction model, this exploration of creative GPT-3 applications, this GAN trained to generate abstract art, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Nvidia sets 16 new performance records in latest MLPerf AI benchmarks

Nvidia Corp. said the latest MLPerf benchmark test results prove that its latest platforms deliver the world’s fastest artificial intelligence training performance among all commercially available systems.

Facebook passes PyTorch for Windows development to Microsoft

Facebook announced that Microsoft will take ownership of the development and maintenance of the PyTorch build for Windows.

Cloak your photos with this AI privacy tool to fool facial recognition

Privacy tool Fawkes makes your selfies less like yourself.

Antitrust experts weigh in on breaking up Amazon, Apple, Facebook, and Google

VentureBeat spoke with Gary Reback and other antitrust experts about what stood out during the House Judiciary Committee hearing “Online Platforms and Market Power” last week and what should happen next.

Google’s TF-Coder tool automates machine learning model design

Researchers at Google Brain developed an automated tool for programming in machine learning frameworks like TensorFlow. They say it achieves better-than-human performance on some challenging development tasks.

Mobile + Edge

Syntiant raises $35 million for AI speech-processing edge chips

Syntiant, a startup developing AI edge hardware for voice and sensor solutions, closed a $35 million round.

Live HDR+ and Dual Exposure Controls on Pixel 4 and 4a

Researchers at Google AI developed a machine learning algorithm called HDRnet, a deep neural network that approximates the HDR+ look of an image in real time.

Exploring SnapML: Working with Custom Neural Networks in Lens Studio

Slight disclaimer--this one is from me. As a follow-up to my technical overview of Lens Studio from last week, I wrote about some of the challenges I faced when working with a custom segmentation model.

New Era Farming with TensorFlow on Lowest Power Consumption

This project demonstrates how to use TensorFlow and an Artemis module to build a device that can predict or detect pests, mosquitos, trees being cut, fires, greenhouse adaptation, and plant growth.

Learning

Introducing Learned Motion Matching

Motion Matching is a simple yet powerful way of animating characters in video games. Ubisoft developed a method called Learned Motion Matching which leverages neural networks to perform the task in a more efficient way.

Introducing the Model Card Toolkit for Easier Model Transparency Reporting

Google AI announced the Model Card Toolkit, a collection of tools that support ML developers in compiling the information that goes into a Model Card. Model Cards provide a structured framework for reporting on ML model provenance, usage, and ethics-informed evaluation and give a detailed overview of a model’s suggested uses and limitations.

GPT-3: Creative Potential of NLP

New ML milestone by OpenAI — in action.

MachineRay: Using AI to Create Abstract Art

In this post, the author explains how to train a GAN with public domain paintings to generate abstract art.

Datasets

[GitHub] PrincetonLIPS/SketchGraphs

A dataset of 15 million CAD sketches with geometric constraint graphs.

Libraries & Code

[GitHub] JialeCao001/SipMask

This is the official implementation of "SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation (ECCV2020)" built on the open-source mmdetection and maskrcnn-benchmark.

[GitHub] vwxyzjn/cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features.

Papers & Publications

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

Abstract: We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment. Notably, our method runs on datasets without any scene- or object-level 3D supervision. Our key insight is that considering humans and objects jointly gives rise to "3D common sense" constraints that can be used to resolve ambiguity. In particular, we introduce a scale loss that learns the distribution of object size from data; an occlusion-aware silhouette re-projection loss to optimize object pose; and a human-object interaction loss to capture the spatial layout of objects with which humans interact. We empirically validate that our constraints dramatically reduce the space of likely 3D spatial configurations. We demonstrate our approach on challenging, in-the-wild images of humans interacting with large objects (such as bicycles, motorcycles, and surfboards) and handheld objects (such as laptops, tennis rackets, and skateboards). We quantify the ability of our approach to recover human-object arrangements and outline remaining challenges in this relatively domain. The project webpage can be found at this https URL.

Big Bird: Transformers for Longer Sequences

Abstract: Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having O(1) global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data.