

Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #303
Democratic Inputs to AI, Creating Edge Machine Learning Experiments, Object Detection and Segmentation With Text Prompt, a Paper on Foundational Models for Reasoning on Charts, and many more!
This week in deep learning, we bring you Democratic Inputs to AI, Create Edge Machine Learning Experiments with Edge Impulse and W&B, Lang Segment Anything – Object Detection and Segmentation With Text Prompt, and a paper on MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering.
You may also enjoy JPMorgan's IndexGPT, a deep learning framework called tinygrad, AI Canon | Andreessen Horowitz, a paper on QLoRA: Efficient Finetuning of Quantized LLMs, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
JPMorgan is developing a ChatGPT-like A.I. service that gives investment advice
JPMorgan Chase is developing a ChatGPT-like software service that leans on a disruptive form of artificial intelligence to select investments for customers.
Using AI, scientists find a drug that could combat drug-resistant infections
Using an artificial intelligence algorithm, researchers at MIT and McMaster University have identified a new antibiotic that can kill a type of bacteria (Acinetobacter baumannii, pink) that is responsible for many drug-resistant infections.
Nvidia launches GH200 Superchip to accelerate generative AI workloads
Nvidia announced that its most powerful AI chip yet, the GH200 Grace Hopper Superchip, is now in full production.
OpenAI is launching a program to award ten $100,000 grants to fund experiments in setting up a democratic process for deciding what rules AI systems should follow.
The National AI Research and Development Strategic Plan that includes relevant text from the 2016 and 2019 national AI R&D strategic plans, along with updates prepared in 2023
MLOps
DagsHub Integrates with Colab: Train ML Models With ZERO MLOps
DagsHub users can now open notebooks in a Colab environment directly from DagsHub (free GPU included) and also version and commit them back using Git or DVC.
Large Language Models: A complete guide
A comprehensive guide to training, optimizing, and unlocking the power of natural language processors
Optimizing Stable Diffusion for Intel CPUs with NNCF and 🤗 Optimum
A blogpost that proposes a workflow that substantially reduces the latency of Stable Diffusion models when running on resource-constrained hardware such as CPU.
Automate Your ML Pipeline: Combining Airflow, DVC, and CML for a Seamless Batch Scoring Experience
A tutorial that guides you through the process of setting up an end-to-end experimentation, training, and production infrastructure for batch scoring applications.
Learning
Evolve GAT — A dynamic graph attention model
A technical article about a novel architecture that combines the GAT model and EvolveGCN to create a dynamic graph attention model.
AI Canon | Andreessen Horowitz
A collection of resources that include gentle introductions to transformers and latent diffusion models; technical learning resources; and references to landmark research results.
Large Language Models: A Complete Guide
A comprehensive guide to training, optimizing, and unlocking the power of natural language processors
Lang Segment Anything – Object Detection and Segmentation With Text Prompt
A technical blog on how to leverage the Segment Anything Model from Meta AI using PyTorch Lightning.
Extractive Question Answering With HuggingFace Using PyTorch and W&B
This article explores extractive question answering using HuggingFace Transformers, PyTorch, and W&B. Learn how to build a SOTA question-answering model.
Libraries & Code
A deep learning framework in between a pytorch and a karpathy/micrograd.
An easy-to-use LLMOps platform designed to empower more people to create sustainable, AI-native applications.
Seamlessly integrate powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks.
Papers & Publications
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Abstract:
Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling.
We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks.
QLoRA: Efficient Finetuning of Quantized LLMs
Abstract:
We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training.
Towards Automated Circuit Discovery for Mechanistic Interpretability
Abstract:
Recent work in mechanistic interpretability has reverse-engineered nontrivial behaviors of transformer models. These contributions required considerable effort and researcher intuition, which makes it difficult to apply the same methods to understand the complex behavior that current models display. At their core however, the workflow for these discoveries is surprisingly similar. Researchers create a data set and metric that elicit the desired model behavior, subdivide the network into appropriate abstract units, replace activations of those units to identify which are involved in the behavior, and then interpret the functions that these units implement. By varying the data set, metric, and units under investigation, researchers can understand the functionality of each neural network region and the circuits they compose. This work proposes a novel algorithm, Automatic Circuit DisCovery (ACDC), to automate the identification of the important units in the network. Given a model's computational graph, ACDC finds subgraphs that explain a behavior of the model. ACDC was able to reproduce a previously identified circuit for Python docstrings in a small transformer, identifying 6/7 important attention heads that compose up to 3 layers deep, while including 91% fewer the connections.