Deep Learning Weekly: Issue #227
MIT's light-field networks for efficient 3D scenes, Amazon SageMaker Canvas for no-code model creation, continuous adaptation for ML systems, & a paper on frequency effects on syntactic rule learning
Austin here, from Comet—the official sponsor of DLW. Before we jump into this week’s content from around the deep learning world, I wanted to quickly share info about an upcoming event our team is putting on that I think might be of interest.
Tomorrow (Thursday, December 9th), we’ll be sitting down with ML leaders from Uber AI, The RealReal, and WorkFusion to explore the challenges, opportunities, and approaches to developing and deploying ML at scale in the enterprise.
The event is free to attend, and will be available on-demand for those who register but are unable to attend.
Now, back to our regularly-scheduled programming!
This week in deep learning, we bring you MIT's light-field networks for efficient 3D scenes, Amazon SageMaker Canvas for no-code model creation, continuous adaptation for ML systems, and a paper on frequency effects on syntactic rule learning in transformers.
You may also enjoy a clever pruning technique, the femtojoule promise of analog AI, quantization using TensorFlow 2, a paper on masked-attention mask transformers for universal image segmentation, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
MIT researchers’ light-field networks can reconstruct a light field and generate a 3D scene from an image about 15,000 times faster than other methods.
MIT researchers find an efficient way called PARP to prune speech-recognition AIs while still boosting accuracy.
Amazon announces the general availability of Amazon SageMaker Canvas, a new visual, no-code capability that allows business analysts to build ML models and generate accurate predictions without writing code or requiring ML expertise.
GauGAN2 combines segmentation mapping, inpainting and text-to-image generation in a single model, making it a powerful tool to create photorealistic art with a mix of words and drawings.
A team of scientists from Google Research, the Alan Turing Institute, and Cambridge University recently unveiled a new state of the art (SOTA) multimodal transformer for AI.
Dream is an accessible app which lets anyone create AI-powered paintings by simply typing a brief description of what they want to see, much like DALL-E and VQGAN+CLIP.
Mobile & Edge
Meta AI researchers have pushed the future of conversational voice assistants forward with two new works that significantly reduce latency and provide a framework for on-device process
A technical article highlighting different ways of quantization on Keras models using TensorFlow 2.
Another technical article on model compression—this time with a pre-trained CLIP model using the ONNX Runtime.
A tutorial on how to make a TinyML-based cloud classifier on a Arduino Portenta H7.
Using nonvolatile memory devices and two fundamental physical laws of electrical engineering, simple circuits can implement a version of deep learning's most basic calculations that requires mere thousandths of a trillionth of a joule.
A blog post that presents a project implementing a workflow that combines batch prediction and model evaluation for continuous evaluation retraining.
A technical post that shows how NVIDIA TAO and NVIDIA DeepStream are used to fast-track AI application development by taking an action recognition model, fine-tuning it with custom data, and deploying it for inference.
A technical walkthrough on using the layers from the recently released Keras Preprocessing Layers set to build a simple sentiment classification model with the IMDB movie review dataset..
Libraries & Code
A standalone Python library designed to enable federated learning amongst different parties using their local secure protected data for client-side training.
A neural network based time-series model, inspired by Facebook Prophet and AR-Net, built on PyTorch.
Paddle Lite is an updated version of Paddle-Mobile, an open-open source deep learning framework designed to make it easy to perform inference on mobile, embedded, and IoT devices.
Papers & Publications
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field.
Pre-trained language models perform well on a variety of linguistic tasks that require symbolic reasoning, raising the question of whether such models implicitly represent abstract symbols and rules. We investigate this question using the case study of BERT's performance on English subject-verb agreement. Unlike prior work, we train multiple instances of BERT from scratch, allowing us to perform a series of controlled interventions at pre-training time. We show that BERT often generalizes well to subject-verb pairs that never occurred in training, suggesting a degree of rule-governed behavior. We also find, however, that performance is heavily influenced by word frequency, with experiments showing that both the absolute frequency of a verb form, as well as the frequency relative to the alternate inflection, are causally implicated in the predictions BERT makes at inference time. Closer analysis of these frequency effects reveals that BERT's behavior is consistent with a system that correctly applies the SVA rule in general but struggles to overcome strong training priors and to estimate agreement features (singular vs. plural) on infrequent lexical items.
Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).