Deep Learning Weekly: Issue #198
An alternative to GPT-3, the final projects of the OpenAI scholars, Intel’s photorealism enhancement model for an elevated GTA V experience, a game theory reformulation of PCA, and more
This week in deep learning, we bring you a free alternative to GPT-3, a discrete choice and neural network model that enhances travel behavior research, the final projects of the OpenAI scholars and a game theory reformulation of PCA.
You may also enjoy Intel's photorealism enhancement model that makes GTA V look more realistic, USPS's adoption of Edge AI and Triton for item tracking, a paper on diffusion models, a paper on efficient anchor-free object detector guidance, and more!
As always, happy reading and hacking. If you have something you think should be in next week’s issue, find us on Twitter: @dl_weekly.
Until next week!
EleutherAI released two GPT-style language models, GPT-Neo 1.3B and GPT-Neo 2.7B, trained on an 825 GB dataset using Google’s TPU Research Cloud. GPT-Neo 2.7B outperforms GPT-3 Ada, its closest competitor in terms of parameter size.
Researchers from Singapore-MIT Alliance for Research and Technology have created a framework known as TB-ResNet, which combines discrete choice models and deep neural networks to improve travel behavior research.
Intel researchers Stephan R. Richter, Hassan Abu Alhaija, and Vladlen Kolten created a photorealism enhancement model that uses the Cityscapes dataset to make GTA V realistic at interactive rates.
MIT researchers have created a domain-specific and automatic data table cleaning system, called PClean, based on Bayesian probability and recent progress in probabilistic programming.
IBM unveiled AI-powered enterprise products such as Mono2Micro, for streamlining cloud app migration, and Watson Orchestrate which automates work in business tools from Salesforce, SAP, and Workday.
The European Commission recently published a proposal for regulations and risk management to govern artificial intelligence use in the European Union.
Mobile & Edge
A comprehensive tutorial for creating a TinyML application for the Arduino Nano 33 BLE Sense that is capable of recognizing different boxing punches in real-time using Gyroscope and Accelerometer sensor data.
A USPS Architect, along with half a dozen NVIDIA architects, designed the Edge Computing Infrastructure Program (ECIP), a distributed edge AI system meant for large-scale image analysis and other deep learning tasks.
A fully-centralized, low-power, industrial control unit that enables a wide range of predictive maintenance and AI use cases.
A short article detailing a pneumonia detection solution using Edge Impulse Studio and balenaCloud on a Raspberry Pi.
DeepMind presents a reformulation of Principal Component Analysis, a type of eigenvalue problem, as a competitive multi-agent game we call EigenGame.
A technical proposal regarding the Reflective Human-Centered Explainable AI (HCXAI), a sociotechnically informed mindset that is grounded in critical AI studies and HCI.
A comprehensive blog on a new dual-encoder architecture trained via a contrastive loss for relatively noisy vision-language datasets such as Conceptual Captions.
OpenAI Scholars showcase their final projects exploring topics like AI safety, contrastive learning, generative modeling, scaling laws, auto-encoding multi-objective tasks, test time compute, NLP segmentation strategies, and summarization from human feedback.
Libraries & Code
A data orchestrator for machine learning, analytics, and ETL.
A number of tutorial notebooks for various case studies, exercises, and project files that illustrate parts of the SageMaker ML workflow and deployment.
Papers & Publications
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for sample quality using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128×128, 4.59 on ImageNet 256×256, and 7.72 on ImageNet 512×512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.85 on ImageNet 512×512.
We propose FEDENHANCE, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients. We simulate a real-world scenario where each client only has access to a few noisy recordings from a limited and disjoint number of speakers (hence non-IID). Each client trains their model in isolation using mixture invariant training while periodically providing updates to a central server. Our experiments show that our approach achieves competitive enhancement performance compared to IID training on a single device and that we can further facilitate the convergence speed and the overall performance using transfer learning on the server-side. Moreover, we show that we can effectively combine updates from clients trained locally with supervised and unsupervised losses. We also release a new dataset LibriFSD50K and its creation recipe in order to facilitate FL research for source separation problems.
Object detection is a basic but challenging task in computer vision, which plays a key role in a variety of industrial applications. However, object detectors based on deep learning usually require greater storage requirements and longer inference time, which hinders its practicality seriously. Therefore, a trade-off between effectiveness and efficiency is necessary in practical scenarios. Considering that without constraint of pre-defined anchors, anchor-free detectors can achieve acceptable accuracy and inference speed simultaneously. In this paper, we start from an anchor-free detector called TTFNet, modify the structure of TTFNet and introduce multiple existing tricks to realize effective server and mobile solutions respectively. Since all experiments in this paper are conducted based on PaddlePaddle, we call the model as PAFNet(Paddle Anchor Free Network). For server side, PAFNet can achieve a better balance between effectiveness (42.2% mAP) and efficiency (67.15 FPS) on a single V100 GPU. For moblie side, PAFNet-lite can achieve a better accuracy of (23.9% mAP) and 26.00 ms on Kirin 990 ARM CPU, outperforming the existing state-of-the-art anchor-free detectors by significant margins.