Deep Learning Weekly: Issue #236
DeepMind’s AlphaCode, deploying machine learning models on AWS Lambda, a fast DataFrames library implemented with Rust, interleaved transformers for volumetric segmentation, and more
This week in deep learning, we bring you reconfigurable AI hardware, a technical guide on deploying machine learning models on AWS Lambda, a fast DataFrames library implemented with Rust using Apache Arrow Columnar Format, and a paper on interleaved transformers for volumetric segmentation.
You may also enjoy biomarker-optimized CNNs for detecting dead cells, competitive programming with AlphaCode, temporal knowledge graphs, a paper on video restoration transformers, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Reconfigurable AI Device Shows Brainlike Promise
An adaptable new device can transform into all the key electric components needed for artificial intelligence hardware, for potential use in robotics and autonomous systems, a new study finds.
AI can now spot dead cells 100 times faster than people
A team of researchers developed a new technology called biomarker-optimized CNNs, or BO-CNNs, to identify cells that have died at a much faster rate compared to humans.
Apple reportedly acquires UK startup AI Music
Apple Inc. has reportedly acquired AI Music, a startup that uses artificial intelligence to create tailor-made music.
Get Started on NLP and Conversational AI with NVIDIA DLI Courses
To get developers started with some quick examples in a cloud GPU-accelerated environment, NVIDIA Deep Learning Institute (DLI) is offering three fast, free, self-paced courses.
This AI is set to help stop illegal fishing
As part of the xView3 challenge, members across AI2 came together to harness satellite-based synthetic aperture radar (SAR) data and combine it with AI to identify vessels suspected of engaging in IUU fishing
The Best Vertex ML Metadata Alternatives
An article discussing Vertex AI and its alternatives, along with how these are used as metadata stores.
Serverless Deployment of Machine Learning Models on AWS Lambda
A technical guide on how to deploy a machine learning model as a lambda function, the serverless offering by AWS.
The Principles of Data-Centric AI Development
Alex Ratner’s presentation on the principles of data-centric artificial intelligence and where it is headed.
What is MLOps and Various MLOps Tools (Part 2)
A blog highlighting different MLOps tools, from proprietary to open-source, and their accompanying features.
A new resource to help measure fairness in AI speech recognition systems
With the addition of transcriptions to the data set containing videos of unscripted conversations, Meta AI is expanding the Casual Conversation's utility to a new domain – automatic speech recognition.
Competitive programming with AlphaCode
As part of DeepMind’s mission to solve intelligence, they created a system called AlphaCode that writes computer programs at a competitive level.
TinyML Is Going Down the Drain: TinySewer Is a Low-Power Sewer Faults Detection System
Huy Mai automated the pipe inspection process, and designed a low-power, vision-based device called TinySewer to take some of the load off of inspectors.
Automatic Speech Recognition work on large files with Wav2Vec2 in HuggingFace Transformers
A post explaining how to use the specificities of the Connectionist Temporal Classification architecture in order to achieve high quality automatic speech recognition.
All you need to know about temporal knowledge graphs
An article that discusses in detail the popular works and applications of temporal knowledge graphs.
Libraries & Code
unifyai/ivy: The Unified Machine Learning Framework
The unified machine learning framework, enabling framework-agnostic functions, layers, and libraries
pola-rs/polars: Fast multi-threaded DataFrame library in Rust | Python | Node.js
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.
Bluefog-Lib/bluefog: Distributed and decentralized training framework for PyTorch over graph
BlueFog is a high-performance distributed training framework built with decentralized optimization algorithms.
Papers & Publications
Toward Training at ImageNet Scale with Differential Privacy
Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy.
We set out to investigate how to do this, using ImageNet image classification as a poster example of an ML task that is very challenging to resolve accurately with DP right now. This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale. We show approaches which help to make DP training faster, as well as model types and settings of the training process that tend to work better for DP. Combined, the methods we discuss let us train a Resnet-18 with differential privacy to 47.9% accuracy and privacy parameters ϵ=10,δ=10−6, a significant improvement over "naive" DP-SGD training of Imagenet models but a far cry from the 75% accuracy that can be obtained by the same network without privacy.
nnFormer: Interleaved Transformer for Volumetric Segmentation
Transformer, the model of choice for natural language processing, has drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks to overcome their inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations. To address this issue, we introduce nnFormer, a 3D transformer for volumetric medical image segmentation. nnFormer not only exploits the combination of interleaved convolution and self-attention operations, but also introduces local and global volume-based self-attention mechanism to learn volume representations. Moreover, nnFormer proposes to use skip attention to replace the traditional concatenation/summation operations in skip connections in U-Net like architecture. Experiments show that nnFormer significantly outperforms previous transformer-based counterparts by large margins on three public datasets. Compared to nnUNet, nnFormer produces significantly lower HD95 and comparable DSC results. Furthermore, we show that nnFormer and nnUNet are highly complementary to each other in model ensembling.
VRT: A Video Restoration Transformer
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which either is restricted by frame-by-frame restoration or lacks long-range modelling ability. In this paper, we propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities. More specifically, VRT is composed of multiple scales, each of which consists of two kinds of modules: temporal mutual self attention (TMSA) and parallel warping. TMSA divides the video into small clips, on which mutual attention is applied for joint motion estimation, feature alignment and feature fusion, while self attention is used for feature extraction. To enable cross-clip interactions, the video sequence is shifted for every other layer. Besides, parallel warping is used to further fuse information from neighboring frames by parallel feature warping. Experimental results on three tasks, including video super-resolution, video deblurring and video denoising, demonstrate that VRT outperforms the state-of-the-art methods by large margins (up to 2.16dB) on nine benchmark datasets.