Deep Learning Weekly: Issue #236
DeepMind’s AlphaCode, deploying machine learning models on AWS Lambda, a fast DataFrames library implemented with Rust, interleaved transformers for volumetric segmentation, and more
This week in deep learning, we bring you reconfigurable AI hardware, a technical guide on deploying machine learning models on AWS Lambda, a fast DataFrames library implemented with Rust using Apache Arrow Columnar Format, and a paper on interleaved transformers for volumetric segmentation.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
An adaptable new device can transform into all the key electric components needed for artificial intelligence hardware, for potential use in robotics and autonomous systems, a new study finds.
A team of researchers developed a new technology called biomarker-optimized CNNs, or BO-CNNs, to identify cells that have died at a much faster rate compared to humans.
Apple Inc. has reportedly acquired AI Music, a startup that uses artificial intelligence to create tailor-made music.
To get developers started with some quick examples in a cloud GPU-accelerated environment, NVIDIA Deep Learning Institute (DLI) is offering three fast, free, self-paced courses.
As part of the xView3 challenge, members across AI2 came together to harness satellite-based synthetic aperture radar (SAR) data and combine it with AI to identify vessels suspected of engaging in IUU fishing
An article discussing Vertex AI and its alternatives, along with how these are used as metadata stores.
A technical guide on how to deploy a machine learning model as a lambda function, the serverless offering by AWS.
Alex Ratner’s presentation on the principles of data-centric artificial intelligence and where it is headed.
A blog highlighting different MLOps tools, from proprietary to open-source, and their accompanying features.
With the addition of transcriptions to the data set containing videos of unscripted conversations, Meta AI is expanding the Casual Conversation's utility to a new domain – automatic speech recognition.
As part of DeepMind’s mission to solve intelligence, they created a system called AlphaCode that writes computer programs at a competitive level.
Huy Mai automated the pipe inspection process, and designed a low-power, vision-based device called TinySewer to take some of the load off of inspectors.
A post explaining how to use the specificities of the Connectionist Temporal Classification architecture in order to achieve high quality automatic speech recognition.
An article that discusses in detail the popular works and applications of temporal knowledge graphs.
Libraries & Code
The unified machine learning framework, enabling framework-agnostic functions, layers, and libraries
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.
BlueFog is a high-performance distributed training framework built with decentralized optimization algorithms.
Papers & Publications
Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy.
We set out to investigate how to do this, using ImageNet image classification as a poster example of an ML task that is very challenging to resolve accurately with DP right now. This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale. We show approaches which help to make DP training faster, as well as model types and settings of the training process that tend to work better for DP. Combined, the methods we discuss let us train a Resnet-18 with differential privacy to 47.9% accuracy and privacy parameters ϵ=10,δ=10−6, a significant improvement over "naive" DP-SGD training of Imagenet models but a far cry from the 75% accuracy that can be obtained by the same network without privacy.
Transformer, the model of choice for natural language processing, has drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks to overcome their inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations. To address this issue, we introduce nnFormer, a 3D transformer for volumetric medical image segmentation. nnFormer not only exploits the combination of interleaved convolution and self-attention operations, but also introduces local and global volume-based self-attention mechanism to learn volume representations. Moreover, nnFormer proposes to use skip attention to replace the traditional concatenation/summation operations in skip connections in U-Net like architecture. Experiments show that nnFormer significantly outperforms previous transformer-based counterparts by large margins on three public datasets. Compared to nnUNet, nnFormer produces significantly lower HD95 and comparable DSC results. Furthermore, we show that nnFormer and nnUNet are highly complementary to each other in model ensembling.
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which either is restricted by frame-by-frame restoration or lacks long-range modelling ability. In this paper, we propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities. More specifically, VRT is composed of multiple scales, each of which consists of two kinds of modules: temporal mutual self attention (TMSA) and parallel warping. TMSA divides the video into small clips, on which mutual attention is applied for joint motion estimation, feature alignment and feature fusion, while self attention is used for feature extraction. To enable cross-clip interactions, the video sequence is shifted for every other layer. Besides, parallel warping is used to further fuse information from neighboring frames by parallel feature warping. Experimental results on three tasks, including video super-resolution, video deblurring and video denoising, demonstrate that VRT outperforms the state-of-the-art methods by large margins (up to 2.16dB) on nine benchmark datasets.