Deep Learning Weekly: Issue #231
Adaptive Resonance Theory, MLOps pipelines with Microsoft Azure, JoJoGAN Quickstart Colab Notebook, a paper on dataset distillation with infinitely wide convolutional networks, and more!
Welcome to 2022’s first issue of Deep Learning Weekly! We hope you enjoyed our 2021 roundup, and we’re really excited to jump into the new year with you.
One quick note before we get to this week’s roundup—we’re making a small change to the newsletter that we believe is more responsive to the direction the industry is moving in.
Specifically, as ML teams increasingly work to more effectively and reliably deploy models into production, we’re seeing a surge of content and educational material centered on “MLOps”—i.e. processes, tools, workflows, team structures, and more that are centered on productizing and operationalizing ML. With that in mind, we’re replacing the “Mobile & Edge” section of DLW with an “MLOps” section.
As always, we’d love to hear any feedback you have, about this change or any general thoughts or content ideas you have.
With that, let’s jump right in!
This week in deep learning, we bring you Adaptive Resonance Theory, MLOps pipelines with Microsoft Azure, JoJoGAN Quickstart Colab Notebook and a paper on dataset distillation with infinitely wide convolutional networks.
You may also enjoy a winning TinyML-based wildfire detection solution, an article on ML model registries, active learning using AutoNLP and Prodigy, a paper on vision transformers for small-size datasets, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
IEEE Fellow Stephen Grossberg argues for an entirely different alternative to deep learning: Adaptive Resonance Theory.
Pratyush Mallick, the brains behind the winning entry for Arm DevSummit Developer Competition, explains how he created his wildfire detection solution.
To explore how both our ears and our environment influence pitch perception, McDermott, Saddler, and Research Assistant Ray Gonzalez built a deep neural network and combined it with an existing model of the mammalian ear.
RPA provider Automation Anywhere Inc. announced that it’s acquiring FortressIQ Inc., an AI process discovery platform.
Edge Impulse looks back at some embedded ML projects created by their 30K+ developers.
A technical blog explaining how to build an end-to-end CI/CD pipeline for your ML workflows by leveraging Microsoft’s Azure Machine Learning platform.
An article demonstrating how you can set up and run pipelines using Comet with GitLab’s DevOps platform to streamline your ML workflow.
A comprehensive blog explaining what a model registry is, where it fits in the MLOps stack, the key functionalities of a model registry, and how this is set up.
Neoway developed an in-house Feature Store to make data scientists’ work easier during model development, either to discover datasets available for generating features or to create new feature datasets to be used on the models.
A post outlining the solutions for (1) online prediction and (2) continual learning, with step-by-step use cases, considerations, and technologies required for each level.
A comprehensive tutorial on how to use AutoNLP and Prodigy to build an active learning pipeline.
A technical blog post reviewing the new multi-weight support prototype of TorchVision, showcasing its features and highlighting key differences with the existing one.
A quickstart Colab notebook on how to use JoJoGAN.
In interviews with AI experts, IEEE Spectrum has uncovered six real-world AI worst-case scenarios that are far more mundane than those depicted in the movies.
An interactive demo of a few music transcription models created by Google's Magenta team.
Libraries & Code
Ecco is a Python library for exploring and explaining Natural Language Processing models using interactive visualizations.
SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipeline
Papers & Publications
The effectiveness of machine learning algorithms arises from being able to extract useful features from large amounts of data. As model and dataset sizes increase, dataset distillation methods that compress large datasets into significantly smaller yet highly performant ones will become valuable in terms of training efficiency and useful feature extraction. To that end, we apply a novel distributed kernel-based meta-learning framework to achieve state-of-the-art results for dataset distillation using infinitely wide convolutional neural networks. For instance, using only 10 datapoints (0.02% of original dataset), we obtain over 64% test accuracy on CIFAR-10 image classification task, a dramatic improvement over the previous best test accuracy of 40%. Our state-of-the-art results extend across many other settings for MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and SVHN. Furthermore, we perform some preliminary analyses of our distilled datasets to shed light on how they differ from naturally occurring data.
Recently, the Vision Transformer (ViT), which applied the transformer structure to the image classification task, has outperformed convolutional neural networks. However, the high performance of the ViT results from pre-training using a large-size dataset such as JFT-300M, and its dependence on a large dataset is interpreted as due to low locality inductive bias. This paper proposes Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA), which effectively solve the lack of locality inductive bias and enable it to learn from scratch even on small-size datasets. Moreover, SPT and LSA are generic and effective add-on modules that are easily applicable to various ViTs. Experimental results show that when both SPT and LSA were applied to the ViTs, the performance improved by an average of 2.96% in Tiny-ImageNet, which is a representative small-size dataset. Especially, Swin Transformer achieved an overwhelming performance improvement of 4.08% thanks to the proposed SPT and LSA.
Recently, Transformers have shown promising performance in various vision tasks. To reduce the quadratic computation complexity caused by the global self-attention, various methods constrain the range of attention within a local region to improve its efficiency. Consequently, their receptive fields in a single attention layer are not large enough, resulting in insufficient context modeling. To address this issue, we propose a Pale-Shaped self-Attention (PS-Attention), which performs self-attention within a pale-shaped region. Compared to the global self-attention, PS-Attention can reduce the computation and memory costs significantly. Meanwhile, it can capture richer contextual information under the similar computation complexity with previous local self-attention mechanisms. Based on the PS-Attention, we develop a general Vision Transformer backbone with a hierarchical architecture, named Pale Transformer, which achieves 83.4%, 84.3%, and 84.9% Top-1 accuracy with the model size of 22M, 48M, and 85M respectively for 224 ImageNet-1K classification, outperforming the previous Vision Transformer backbones. For downstream tasks, our Pale Transformer backbone performs better than the recent state-of-the-art CSWin Transformer by a large margin on ADE20K semantic segmentation and COCO object detection & instance segmentation