Deep Learning Weekly: Issue #251
Meta's new advances in speech recognition, Graph-based Neural Structured Learning in TFX, IBM's analog hardware acceleration kit, a paper on lossless acceleration for seq2seq generation and more.
This week in deep learning, we bring you Meta's new advances in speech recognition, Graph-based Neural Structured Learning in TFX, IBM's analog hardware acceleration kit, and a paper on lossless acceleration for seq2seq generation with aggressive decoding.
You may also enjoy Accelerated PyTorch training on Mac, Netflix's offline ML fact store, efficient table pre-training with TAPEX, a paper on unified keyframe propagation models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
New advances in speech recognition to power AR experiences and more
In this blog post, Meta highlights new speech recognition research, including some of the papers to be presented at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) this month.
Introducing Accelerated PyTorch Training on Mac
In collaboration with the Metal engineering team at Apple, the PyTorch team announced support for GPU-accelerated PyTorch training on Mac.
AI on the Ball: Startup Shoots Computer Vision to the Soccer Pitch
The CEO of Track160 is democratizing sports analytics and computer vision to underserved amateurs in the clubs and community teams.
With $1.5M in seed funding, Humanitas aims to help enterprises handle algorithmic bias
Humanitas Technologies, an AI startup building a data pool designed to fight algorithmic bias for diversity and inclusion, announced that it has raised $1.5 million in seed funding.
Building MLOps Pipeline for NLP: Machine Translation Task
A technical article that discusses in detail how to build an MLOps pipeline for machine translation using Tensorflow, Neptune.ai, GitHub Actions, Docker, Kubernetes, and Cloud Build.
Getting Started with Comet REST API
An article providing an introduction and a practical example for Comet’s REST API.
A post that focuses on Netflix’s large volume of high-quality data stored in Axion — their fact store that is leveraged to compute ML features offline.
Graph-based Neural Structured Learning in TFX
A tutorial describing graph regularization from the Neural Structured Learning framework and demonstrates an end-to-end workflow for sentiment classification in a TFX pipeline.
Step-by-Step Guide to Building a Machine Learning Application with RAPIDS
An article that walks through each step to build an ML service, using a toolkit for end-to-end data science and analytics pipelines, entirely on the GPU.
Efficient Table Pre-training without Real Data: An Introduction to TAPEX
HuggingFace introduces TAPEX, a table pre-training approach whose corpus is automatically synthesized via sampling SQL queries and their execution results.
An Introduction to Q-Learning Part 2
A tutorial that discusses Q-Learning, and describes how to implement an RL agent using two environments: Frozen Lake v1 and an autonomous taxi.
A short and friendly video by IBM explaining the advantages of foundation models.
Libraries & Code
MindsEye Lite - a Hugging Face Space by multimodalart
A Gradio demo that runs multiple text-to-image models in one place.
gradio-app/gradio: Create UIs for your machine learning model in Python in 3 minutes
Gradio is an open-source Python library that is used to build machine learning and data science demos and web applications.
IBM/aihwkit: IBM Analog Hardware Acceleration Kit
IBM Analog Hardware Acceleration Kit is an open source Python toolkit for exploring and using the capabilities of in-memory computing devices in the context of artificial intelligence.
Papers & Publications
Lossless Acceleration for seq2seq Generation with Aggressive Decoding
We study lossless acceleration for seq2seq generation with a novel decoding algorithm -- Aggressive Decoding. Unlike the previous efforts (e.g., non-autoregressive decoding) speeding up seq2seq generation at the cost of quality loss, our approach aims to yield the identical (or better) generation compared with autoregressive decoding but in a significant speedup, achieved by innovative cooperation of aggressive decoding and verification that are both efficient due to parallel computing.
We propose two Aggressive Decoding paradigms for two kinds of seq2seq tasks: 1) For the seq2seq tasks whose inputs and outputs are highly similar (e.g., Grammatical Error Correction), we propose Input-guided Aggressive Decoding (IAD) that aggressively copies from the input sentence as drafted decoded tokens to verify in parallel; 2) For other general seq2seq tasks (e.g., Machine Translation), we propose Generalized Aggressive Decoding (GAD) that first employs an additional non-autoregressive decoding model for aggressive decoding and then verifies in parallel in the autoregressive manner.
We test Aggressive Decoding on the most popular 6-layer Transformer model on GPU in multiple seq2seq tasks: 1) For IAD, we show that it can introduce a 7x-9x speedup for the Transformer in Grammatical Error Correction and Text Simplification tasks with the identical results as greedy decoding; 2) For GAD, we observe a 3x-5x speedup with the identical or even better quality in two important seq2seq tasks: Machine Translation and Abstractive Summarization. Moreover, Aggressive Decoding can benefit even more from stronger computing devices that are better at parallel computing. Given the lossless quality as well as significant and promising speedup, we believe Aggressive Decoding may potentially evolve into a de facto standard for efficient and lossless seq2seq generation in the near future.
Towards Unified Keyframe Propagation Models
Many video editing tasks such as rotoscoping or object removal require the propagation of context across frames. While transformers and other attention-based approaches that aggregate features globally have demonstrated great success at propagating object masks from keyframes to the whole video, they struggle to propagate high-frequency details, such as textures, faithfully. We hypothesize that this is due to an inherent bias of global attention towards low-frequency features. To overcome this limitation, we present a two-stream approach, where high-frequency features interact locally and low-frequency features interact globally. The global interaction stream remains robust in difficult situations such as large camera motions, where explicit alignment fails. The local interaction stream propagates high-frequency details through deformable feature aggregation and, informed by the global interaction stream, learns to detect and correct errors of the deformation field. We evaluate our two-stream approach for inpainting tasks, where experiments show that it improves both the propagation of features within a single frame as required for image inpainting, as well as their propagation from keyframes to target frames. Applied to video inpainting, our approach leads to 44% and 26% improvements in FID and LPIPS scores.