Deep Learning Weekly: Issue #210
The AI management platform for mixing models from different vendors, GAN Sketching, new NLP datasets for contextual disfluencies and temporal commonsense reasoning, and more
This week in deep learning, we bring you an AI management platform for mixing models from different vendors , a new book on synthetic data for deep learning, an LB-CNN optimization framework for TinyML applications and a paper on GAN sketching.
You may also enjoy two new NLP datasets for contextual disfluencies and temporal reasoning, different ways to make histopathology image models more robust to domain shifts, cattle management with edge AI, a paper on video contrastive learning with global context, and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
Eden AI launches an AI Management platform that allows companies to mix and match models from different vendors to suit their use case.
Google introduces two new conversational NLP datasets for temporal commonsense reasoning in dialog and contextual disfluencies.
Synthesis AI, a pioneer in synthetic data technologies, announces a newly published book called “Synthetic Data for Deep Learning” written by Head of AI, Sergey Nikolenko.
The head of an AI lab at the company behind TikTok has left the private sector to join UC Santa Barbara, following a string of Chinese CEOs and scientists that have previously resigned as well.
OpenAI just released an improved version of Codex, the AI system that translates natural language to code and powers GitHub Copilot, through their API in private beta.
New algorithms that can read a patient’s race from medical scans emerged from tests on five types of imagery used in radiology research.
Mobile & Edge
Plainsight, a San Francisco-based startup, performs precision livestock counting and health monitoring using their vision AI platform and NVIDIA GPUs.
A paper on a novel LB-CNN optimization framework for TinyML applications such as face recognition and other industrial processes.
Sign-up for a hands-on session that introduces the EON Tuner, a new AutoML tool available to all Edge Impulse developers.
Pete Warden briefly shares a neat trick to shrink convolutional networks by removing MaxPool or AveragePool.
A comprehensive article detailing an argument to re-think the current “data-driven” approach being taken to fully grasp Natural Language Understanding.
A blog showcasing how the NVIDIA Transfer Learning Toolkit (TLT) and a pretrained model can be used to quickly build more accurate quality control in the manufacturing process.
An article exploring a variety of approaches such as stain normalization, color augmentation, adversarial domain adaptation, model adaptation, and fine-tuning to make models more robust to domain shifts.
A technical blog that describes the different aspects of efficient machine learning framework interoperability such as distinct memory layouts and data transfer bottlenecks.
Libraries & Code
A viewer for neural networks, deep learning and machine learning models.
Tutorial code for deep learning researchers to learn PyTorch.
A library that serves primarily as code for reproducing the experiments in the paper “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”.
Papers & Publications
Can a user create a deep generative model by sketching a single example? Traditionally, creating a GAN model has required the collection of a large-scale dataset of exemplars and specialized knowledge in deep learning. In contrast, sketching is possibly the most universally accessible way to convey a visual concept. In this work, we present a method, GAN Sketching, for rewriting GANs with one or more sketches, to make GANs training easier for novice users. In particular, we change the weights of an original GAN model according to user sketches. We encourage the model's output to match the user sketches through a cross-domain adversarial loss. Furthermore, we explore different regularization methods to preserve the original model's diversity and image quality. Experiments have shown that our method can mold GANs to match shapes and poses specified by sketches while maintaining realism and diversity. Finally, we demonstrate a few applications of the resulting GAN, including latent space interpolation and image editing.
Contrastive learning has revolutionized self-supervised image representation learning field, and recently been adapted to video domain. One of the greatest advantages of contrastive learning is that it allows us to flexibly define powerful loss objectives as long as we can find a reasonable way to formulate positive and negative samples to contrast. However, existing approaches rely heavily on the short-range spatiotemporal salience to form clip-level contrastive signals, thus limit themselves from using global context. In this paper, we propose a new video-level contrastive learning method based on segments to formulate positive pairs. Our formulation is able to capture global context in a video, thus robust to temporal content change. We also incorporate a temporal order regularization term to enforce the inherent sequential structure of videos. Extensive experiments show that our video-level contrastive learning framework (VCLR) is able to outperform previous state-of-the-arts on five video datasets for downstream action classification, action localization and video retrieval.
Document digitization is essential for the digital transformation of our societies, yet a crucial step in the process, Optical Character Recognition (OCR), is still not perfect. Even commercial OCR systems can produce questionable output depending on the fidelity of the scanned documents. In this paper, we demonstrate an effective framework for mitigating OCR errors for any downstream NLP task, using Named Entity Recognition (NER) as an example. We first address the data scarcity problem for model training by constructing a document synthesis pipeline, generating realistic but degraded data with NER labels. We measure the NER accuracy drop at various degradation levels and show that a text restoration model, trained on the degraded data, significantly closes the NER accuracy gaps caused by OCR errors, including on an out-of-domain dataset. For the benefit of the community, we have made the document synthesis pipeline available as an open-source project.