Deep Learning Weekly: Issue #308
Meta's AI system cards for transparency, Real-time ML Foundations at Lyft, A Local Code Generator Tool Using Microsoft’s Guidance Library, a paper on Theorem Proving with Retrieval-Augmented Language
This week in deep learning, we bring you Meta's AI system cards for transparency, real-time machine learning foundations at Lyft, Building Oasis: a Local Code Generator Tool Using Microsoft’s Guidance Library, and a paper on LeanDojo: Theorem Proving with Retrieval-Augmented Language Models.
You may also enjoy Unity's new AI products for real-time 3D content, Dealing with Train-serve Skew in Real-time ML Models, Using LLMs to Extract Structured Data, a paper on A Simple and Effective Pruning Approach for Large Language Models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Inflection lands $1.3B investment to build more ‘personal’ AI
Inflection AI, an AI startup aiming to create “personal AI for everyone,” has closed a $1.3 billion funding round led by Microsoft, Reid Hoffman, Bill Gates, Eric Schmidt and Nvidia.
Introducing 22 system cards that explain how AI powers experiences on Facebook and Instagram
Meta AI shares 22 system cards that contain information and actionable insights everyone can use to understand and customize specific AI-powered experiences in their products.
Waterwave Could Quench AIs' Thirst for GPU Memory
Researchers describe a novel approach, called Waterwave, to increase the efficiency of training multiple AI models simultaneously and efficiently on the same GPU.
Unity launches Sentis and Muse AI platforms for real-time 3D creation
Unity revealed two new AI-based products for creators to enhance their real-time 3D (RT3D) content.
MLOps
Dealing with Train-serve Skew in Real-time ML Models: A Short Guide
A short guide explaining how to avoid and mitigate the impacts of train-serve skew in realtime ML models.
Building Real-time Machine Learning Foundations at Lyft
A blog post that highlights what the Lyft team has done to enable the hundreds of ML developers within the company to efficiently develop and enhance models with streaming data.
How to Deploy an AI Model in Python with PyTriton
A post about how to use NVIDIA Triton Inference Server to serve models within your Python code and environment using the new PyTriton interface.
Optimizing LibTorch-based inference engine memory usage and thread-pooling
A blog post that shows how to optimize a LibTorch-based inference engine to maximize throughput by reducing memory usage and optimizing the thread-pooling strategy.
Architecting the Edge for AI and ML
An article that explores the elements needed for an ideal edge ML architecture and highlights a breakdown of four edge paradigms.
Learning
Debugging Image Classifiers With Confusion Matrices
An article on how to intuitively explore model predictions on specific images over time.
Using LLMs to Extract Structured Data: OpenAI Function Calling in Action
An article on harnessing GPT-4 and ChatGPT for efficient processing of unstructured documents.
An article that thoroughly describes how a docstring generator tool was built using Microsoft’s Guidance and open source models.
Making a web app generator with open ML models
A tutorial that presents a direct approach to AI web content generation by streaming and rendering the content all in one go.
Interactively fine-tune Falcon-40B and other LLMs on Amazon SageMaker Studio notebooks using QLoRA
A tutorial that shows how to fine-tune Falcon-40B using a single ml.g5.12xlarge instance (4 A10G GPUs) on Amazon SageMaker Studio.
Libraries & Code
OpenChat: Less is More for Open-source Models
OpenLLMs is a series of open-source language models fine-tuned on a small, yet diverse and high-quality dataset of multi-round conversations.
Easily migrate your codebase from one framework or language to another.
This toolbox aims to unify audio generation model evaluation for easier comparison.
Papers & Publications
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Abstract:
Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, data, and large compute requirements. This has created substantial barriers to research on machine learning methods for theorem proving. This paper removes these barriers by introducing LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks. LeanDojo extracts data from Lean and enables interaction with the proof environment programmatically. It contains fine-grained annotations of premises in proofs, providing valuable data for premise selection: a key bottleneck in theorem proving. Using this data, we develop ReProver (Retrieval-Augmented Prover): the first LLM-based prover that is augmented with retrieval for selecting premises from a vast math library. It is inexpensive and needs only one GPU week of training. Our retriever leverages LeanDojo's program analysis capability to identify accessible premises and hard negative examples, which makes retrieval much more effective. Furthermore, we construct a new benchmark consisting of 96,962 theorems and proofs extracted from Lean's math library. It features challenging data split requiring the prover to generalize to theorems relying on novel premises that are never used in training. We use this benchmark for training and evaluation, and experimental results demonstrate the effectiveness of ReProver over non-retrieval baselines and GPT-4. We thus provide the first set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research.
Prefix Conditioning Unifies Language and Label Supervision
Abstract:
Image-classification datasets have been used to pretrain image recognition models. Recently, web-scale image-caption datasets have emerged as a source of powerful pretraining alternative. Image-caption datasets are more ``open-domain'', containing a wider variety of scene types and vocabulary words than traditional classification datasets, and models trained on these datasets have demonstrated strong performance on few- and zero-shot recognition tasks. When naively unifying image-classification and -caption dataset, we show that such dataset biases negatively affect pre-training by reducing the generalizability of learned representations and thus jeopardizing zero-shot performance since the unification can tailor the model for the classification dataset, making it vulnerable to the distribution shift from the dataset. In this work, we address the problem by disentangling the dataset bias using prefix tokens that inform a language encoder of the type of the input dataset (e.g., image-classification or caption) at training time. This approach allows the language encoder to share the knowledge from two datasets as well as switch the mode of feature extraction, i.e., image-classification dataset or image-caption dataset tailored mode, where we use image-caption mode in the zero-shot evaluation. Our method is generic and can be easily integrated into existing VL pre-training objectives such as CLIP or UniCL. In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
A Simple and Effective Pruning Approach for Large Language Models
Abstract:
As their size increases, Large Languages Models (LLMs) are natural candidates for network pruning methods: approaches that drop a subset of network weights while striving to preserve performance. Existing methods, however, require either retraining, which is rarely affordable for billion-scale LLMs, or solving a weight reconstruction problem reliant on second-order information, which may also be computationally expensive. In this paper, we introduce a novel, straightforward yet effective pruning method, termed Wanda (Pruning by Weights and activations), designed to induce sparsity in pretrained LLMs. Motivated by the recent observation of emergent large magnitude features in LLMs, our approach prune weights with the smallest magnitudes multiplied by the corresponding input activations, on a per-output basis. Notably, Wanda requires no retraining or weight update, and the pruned LLM can be used as is. We conduct a thorough evaluation of our method on LLaMA across various language benchmarks. Wanda significantly outperforms the established baseline of magnitude pruning and competes favorably against recent methods involving intensive weight update.



Another excellent issue. Thanks Miko!