Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #242
12 graphs that explain the state of AI, model deployment strategies, integrating Neo4j with PyTorch Geometric to create recommendations, and more
This week in deep learning, we bring you 12 graphs that explain the state of AI, model deployment strategies, integrating Neo4j with PyTorch Geometric to create recommendations, and a paper on multimodal augmentation of generative models through adapter-based fine-tuning.
You may also enjoy GPT-3's new edit and insert capabilities, setting up an MLOps environment on Google Cloud, customizable sticky cells for Jupyter Notebooks, a paper on end-to-end perception networks, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Twelve graphs and summaries of the Stanford Institute for Human-Centered Artificial Intelligence’s AI Index, which is 190 pages covering R&D, technical performance, ethics, policy, education, and the economy.
OpenAI released new versions of GPT-3 and Codex which can edit or insert content into existing text, rather than just completing existing text.
Princeton researchers developed a technique they named DataMUX wherein a neural network can analyze multiple data feeds simultaneously as one mixed clump of information.
Results show that a contrastive representation learning model trained using only synthetic data is able to learn visual representations that rival or even outperform those learned from real data.
Partnering with NVIDIA and the International Chamber of Commerce, Photon Commerce guides payment processors, neobanks, and credit card fintechs on how to train and invent the world’s most intelligent AI for payments, invoices, and commerce.
This in-depth article aims to provide you with information on the model deployment strategies and how you can choose which strategy is best for your application.
This reference guide outlines the architecture of a machine learning operations (MLOps) environment on Google Cloud.
A guide to using the Artifacts class in Comet ML to automatically log and track your experiments.
In this end-to-end tutorial, you will learn how to speed up BERT inference for text classification with Hugging Face Transformers, Amazon SageMaker, and AWS Inferentia.
A post that provides an update on the current state of JumpStart, a multi-faceted product that helps get you started with ML on SageMaker, and guides you through the usage flow of the JumpStart API with an example use case.
The main goal of this post is to show you how to convert a Neo4j graph into a heterogeneous PyTorch Geometric graph using the MovieLens dataset.
An article that summarizes a few ways to run different deep learning models using three inference methods on Jetson Nano.
Andrej Karpathy’s attempt to reproduce Yann Lecun’s paper on Backpropagation Applied to Handwritten Zip Code Recognition.
A deep dive into what embeddings are, how they work, and how they are often operationalized in real-world systems.
A guide to setting up your data science project for success, including a look at the Harvard and CRISP-DM methods.
A visual and beginner-friendly guide to imCLR, a simple framework for contrastive learning of visual representations.
Libraries & Code
A top-like tool for monitoring GPUs across a cluster.
Break the linear presentation of Jupyter Notebooks with sticky cells. With multiple floating cells, users can create a fully-fledged interactive dashboard.
TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Papers & Publications
End-to-end network has become increasingly important in multi-tasking. One prominent example of this is the growing significance of a driving perception system in autonomous driving. This paper systematically studies an end-to-end perception network for multi-tasking and proposes several key optimizations to improve accuracy. First, the paper proposes efficient segmentation head and box/class prediction networks based on weighted bidirectional feature network. Second, the paper proposes automatically customized anchor for each level in the weighted bidirectional feature network. Third, the paper proposes an efficient training loss function and training strategy to balance and optimize the network. Based on these optimizations, we have developed an end-to-end perception network to perform multi-tasking, including traffic object detection, drivable area segmentation and lane detection simultaneously, called HybridNets, which achieves better accuracy than prior art. In particular, HybridNets achieves 77.3 mean Average Precision on Berkeley DeepDrive Dataset, outperforms lane detection with 31.6 mean Intersection Over Union with 12.83 million parameters and 15.6 billion floating-point operations. In addition, it can perform visual perception tasks in real-time and thus is a practical and accurate solution to the multi-tasking problem.
Large-scale pre-training is fast becoming the norm in Vision-Language (VL) modeling. However, prevailing VL approaches are limited by the requirement for labeled data and the use of complex multi-step pre-training objectives. We present MAGMA - a simple method for augmenting generative language models with additional modalities using adapter-based fine-tuning. Building on Frozen, we train a series of VL models that autoregressively generate text from arbitrary combinations of visual and textual input. The pre-training is entirely end-to-end using a single language modeling objective, simplifying optimization compared to previous approaches. Importantly, the language model weights remain unchanged during training, allowing for transfer of encyclopedic knowledge and in-context learning abilities from language pre-training. MAGMA outperforms Frozen on open-ended generative tasks, achieving state of the art results on the OKVQA benchmark and competitive results on a range of other popular VL benchmarks, while pre-training on 0.2% of the number of samples used to train SimVLM.
We present TensoRF, a novel approach to model and reconstruct radiance fields. Unlike NeRF that purely uses MLPs, we model the radiance field of a scene as a 4D tensor, which represents a 3D voxel grid with per-voxel multi-channel features. Our central idea is to factorize the 4D scene tensor into multiple compact low-rank tensor components. We demonstrate that applying traditional CP decomposition—that factorizes tensors into rank-one components with compact vectors—in our framework leads to improvements over vanilla NeRF. To further boost performance, we introduce a novel vector-matrix (VM) decomposition that relaxes the low-rank constraints for two modes of a tensor and factorizes tensors into compact vector and matrix factors. Beyond superior rendering quality, our models with CP and VM decompositions lead to a significantly lower memory footprint in comparison to previous and concurrent works that directly optimize per-voxel features. Experimentally, we demonstrate that TensoRF with CP decomposition achieves fast reconstruction (<30 min) with better rendering quality and even a smaller model size (<4 MB) compared to NeRF. Moreover, TensoRF with VM decomposition further boosts rendering quality and outperforms previous state-of-the-art methods, while reducing the reconstruction time (<10 min) and retaining a compact model size (<75 MB).