Deep Learning Weekly: Issue #210
The AI management platform for mixing models from different vendors, GAN Sketching, new NLP datasets for contextual disfluencies and temporal commonsense reasoning, and more
Hey folks,
This week in deep learning, we bring you an AI management platform for mixing models from different vendors , a new book on synthetic data for deep learning, an LB-CNN optimization framework for TinyML applications and a paper on GAN sketching.
You may also enjoy two new NLP datasets for contextual disfluencies and temporal reasoning, different ways to make histopathology image models more robust to domain shifts, cattle management with edge AI, a paper on video contrastive learning with global context, and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Eden AI launches platform to unify ML APIs
Eden AI launches an AI Management platform that allows companies to mix and match models from different vendors to suit their use case.
Two New Datasets for Conversational NLP: TimeDial and Disfl-QA
Google introduces two new conversational NLP datasets for temporal commonsense reasoning in dialog and contextual disfluencies.
Synthesis AI, a pioneer in synthetic data technologies, announces a newly published book called “Synthetic Data for Deep Learning” written by Head of AI, Sergey Nikolenko.
TikTok brain drain as head of AI lab leaves ByteDance?
The head of an AI lab at the company behind TikTok has left the private sector to join UC Santa Barbara, following a string of Chinese CEOs and scientists that have previously resigned as well.
OpenAI just released an improved version of Codex, the AI system that translates natural language to code and powers GitHub Copilot, through their API in private beta.
These Algorithms Look at X-Rays—and Somehow Detect Your Race
New algorithms that can read a patient’s race from medical scans emerged from tests on five types of imagery used in radiology research.
Mobile & Edge
Plainsight Enhances Cattle Management with Edge AI
Plainsight, a San Francisco-based startup, performs precision livestock counting and health monitoring using their vision AI platform and NVIDIA GPUs.
A paper on a novel LB-CNN optimization framework for TinyML applications such as face recognition and other industrial processes.
On-demand Session: Edge Impulse's New AutoML Tool for Embedded Machine Learning
Sign-up for a hands-on session that introduces the EON Tuner, a new AutoML tool available to all Edge Impulse developers.
One weird trick to shrink convolutional networks for TinyML
Pete Warden briefly shares a neat trick to shrink convolutional networks by removing MaxPool or AveragePool.
Learning
Machine Learning Won't Solve Natural Language Understanding
A comprehensive article detailing an argument to re-think the current “data-driven” approach being taken to fully grasp Natural Language Understanding.
A blog showcasing how the NVIDIA Transfer Learning Toolkit (TLT) and a pretrained model can be used to quickly build more accurate quality control in the manufacturing process.
5 Ways to Make Histopathology Image Models More Robust to Domain Shifts
An article exploring a variety of approaches such as stain normalization, color augmentation, adversarial domain adaptation, model adaptation, and fine-tuning to make models more robust to domain shifts.
Machine Learning Frameworks Interoperability: Memory Layouts and Memory Pools
A technical blog that describes the different aspects of efficient machine learning framework interoperability such as distinct memory layouts and data transfer bottlenecks.
Libraries & Code
netron: Visualizer for neural network, deep learning, and machine learning models
A viewer for neural networks, deep learning and machine learning models.
pytorch-tutorial: PyTorch Tutorial for Deep Learning Researchers
Tutorial code for deep learning researchers to learn PyTorch.
text-to-text-transfer-transformer: Library Code for the Paper
A library that serves primarily as code for reproducing the experiments in the paper “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”.
Papers & Publications
Abstract:
Can a user create a deep generative model by sketching a single example? Traditionally, creating a GAN model has required the collection of a large-scale dataset of exemplars and specialized knowledge in deep learning. In contrast, sketching is possibly the most universally accessible way to convey a visual concept. In this work, we present a method, GAN Sketching, for rewriting GANs with one or more sketches, to make GANs training easier for novice users. In particular, we change the weights of an original GAN model according to user sketches. We encourage the model's output to match the user sketches through a cross-domain adversarial loss. Furthermore, we explore different regularization methods to preserve the original model's diversity and image quality. Experiments have shown that our method can mold GANs to match shapes and poses specified by sketches while maintaining realism and diversity. Finally, we demonstrate a few applications of the resulting GAN, including latent space interpolation and image editing.
Video Contrastive Learning with Global Context
Abstract:
Contrastive learning has revolutionized self-supervised image representation learning field, and recently been adapted to video domain. One of the greatest advantages of contrastive learning is that it allows us to flexibly define powerful loss objectives as long as we can find a reasonable way to formulate positive and negative samples to contrast. However, existing approaches rely heavily on the short-range spatiotemporal salience to form clip-level contrastive signals, thus limit themselves from using global context. In this paper, we propose a new video-level contrastive learning method based on segments to formulate positive pairs. Our formulation is able to capture global context in a video, thus robust to temporal content change. We also incorporate a temporal order regularization term to enforce the inherent sequential structure of videos. Extensive experiments show that our video-level contrastive learning framework (VCLR) is able to outperform previous state-of-the-arts on five video datasets for downstream action classification, action localization and video retrieval.
Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents
Abstract:
Document digitization is essential for the digital transformation of our societies, yet a crucial step in the process, Optical Character Recognition (OCR), is still not perfect. Even commercial OCR systems can produce questionable output depending on the fidelity of the scanned documents. In this paper, we demonstrate an effective framework for mitigating OCR errors for any downstream NLP task, using Named Entity Recognition (NER) as an example. We first address the data scarcity problem for model training by constructing a document synthesis pipeline, generating realistic but degraded data with NER labels. We measure the NER accuracy drop at various degradation levels and show that a text restoration model, trained on the degraded data, significantly closes the NER accuracy gaps caused by OCR errors, including on an out-of-domain dataset. For the benefit of the community, we have made the document synthesis pipeline available as an open-source project.