Deep Learning Weekly: Issue #253
Photonic neural networks that can classify images in less than 570 picoseconds, AI ushering a new scientific revolution, deploying transformers on the Apple Neural Engine, and more.
This week in deep learning, we bring you photonic neural networks that can classify images in less than 570 picoseconds, AI ushering a new scientific revolution, deploying transformers on the Apple Neural Engine, and a paper on modern hopfield networks for tabular data.
You may also enjoy Andrew Ng's LandingEdge, automated testing on machine learning projects, test suites for validating ML models and data, a paper on FlashAttention, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Machine Learning Helps Banks, Buyers Finalize Real Estate Transactions
Real estate technology company Doma helps speed closing home purchases with machine learning models trained on NVIDIA GPUs.
Photonic Chip Performs Image Recognition at the Speed of Light
In a new study, researchers have developed a photonic deep neural network that can directly analyze images without the need for a clock, sensor, or large memory modules. It can classify an image in less than 570 picoseconds, which is comparable with a single clock cycle in state-of-the-art microchips.
Andrew Ng's Landing AI aims to help manufacturers deploy AI vision systems
Landing AI announces LandingEdge, which customers can use to deploy deep learning based vision inspection to their production floor.
Imaging sensor startup Vayyar lands $108M to fuel expansion
Vayyar, a company developing radar-imaging sensor technologies, today announced that it raised $108 million in a Series E round led by Koch Disruptive Technologies.
Building a Machine Learning Pipeline with DBT
An article demonstrating how to use a tool like DBT to develop a data pipeline that performs feature engineering, trains, and makes predictions, all without moving data from the database.
Best Practices for Deploying Language Models
A joint recommendation (Cohere, OpenAI, AI21) of several key principles to help providers of large language models mitigate the risks of this technology in order to achieve its full promise to augment human capabilities.
Use Serverless Inference to reduce testing costs in your MLOps pipelines
In this post, you’ll see how to use SageMaker Serverless Inference to reduce cost when you deploy an ML model as part of the testing phase of your MLOps pipeline.
Automated Testing in Machine Learning Projects
In this article, we will try to understand the different categories of automated testing and how to make ML projects better with each.
AI is Ushering In a New Scientific Revolution
A comprehensive blog on how AI is ushering in a new scientific revolution by making remarkable breakthroughs in a number of fields, unlocking new approaches to science, and accelerating the pace of science and innovation.
Deploying Transformers on the Apple Neural Engine
An article providing generalizable guidance to developers on optimizing their models for Apple Neural Engine execution.
Create Your Own Friends With A GAN
An article briefly describing the high-level intuition behind GANs, along with a technical guide on building a small demo around a pre-trained CryptoPunks GAN.
Libraries & Code
deepchecks/deepchecks: Test Suites for Validating ML Models & Data
Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort.
Quantus is an easy-to-use yet comprehensive toolbox for quantitative evaluation of neural network explanations — including 25+ different metrics.
A dataset for implicit hate speech detection.
Papers & Publications
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation. Its application to video generation is still facing many challenges: The potential huge computation cost makes the training from scratch unaffordable; The scarcity and weak relevance of text-video datasets hinder the model understanding complex movement semantics. In this work, we present 9B-parameter transformer CogVideo, trained by inheriting a pretrained text-to-image model, CogView2. We also propose multi-frame-rate hierarchical training strategy to better align text and video clips. As (probably) the first open-source large-scale pretrained text-to-video model, CogVideo outperforms all publicly available models at a large margin in machine and human evaluations.
Hopular: Modern Hopfield Networks for Tabular Data
While Deep Learning excels in structured data as encountered in vision and natural language processing, it failed to meet expectations on tabular data. For tabular data, Support Vector Machines (SVMs), Random Forests, and Gradient Boosting are the best performing techniques with Gradient Boosting in the lead. Recently, we saw a surge of Deep Learning methods that were tailored to tabular data but still underperform compared to Gradient Boosting on small-sized datasets. We suggest "Hopular," a novel Deep Learning architecture for medium- and small-sized datasets, where each layer is equipped with continuous modern Hopfield networks. The modern Hopfield networks use stored data to identify feature-feature, feature-target, and sample-sample dependencies. Hopular's novelty is that every layer can directly access the original input as well as the whole training set via stored data in the Hopfield networks. Therefore, Hopular can step-wise update its current model and the resulting prediction at every layer like standard iterative learning algorithms. In experiments on small-sized tabular datasets with less than 1,000 samples, Hopular surpasses Gradient Boosting, Random Forests, SVMs, and in particular several Deep Learning methods. In experiments on medium-sized tabular data with about 10,000 samples, Hopular outperforms XGBoost, CatBoost, LightGBM, and a state-of-the art Deep Learning method designed for tabular data. Thus, Hopular is a strong alternative to these methods on tabular data.
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. We analyze the IO complexity of FlashAttention, showing that it requires fewer HBM accesses than standard attention, and is optimal for a range of SRAM sizes. We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method. FlashAttention trains Transformers faster than existing baselines: 15% end-to-end wall-clock speedup on BERT-large (seq. length 512) compared to the MLPerf 1.1 training speed record, 3× speedup on GPT-2 (seq. length 1K), and 2.4× speedup on long-range arena (seq. length 1K-4K). FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).