Deep Learning Weekly: Issue #319
Bard Extension, PagedAttention for Efficient LLM Serving, Optimizing LLMs from a Dataset Perspective, a paper on Studying Large Language Model Generalization with Influence Functions, and many more!
This week in deep learning, we bring you Google's Bard Extension, PagedAttention for Efficient LLM Serving, Optimizing LLMs from a Dataset Perspective, and a paper on Studying Large Language Model Generalization with Influence Functions.
You may also enjoy Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion, Explainability in Machine Learning Systems, Training Tiny Llamas for Fun, a paper on Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Google’s Bard chatbot can now find answers in your Gmail, Docs, Drive
Google’s Bard AI chatbot is no longer limited to pulling answers from just the web — it can now scan your Gmail, Docs, and Drive to help you find the information you’re looking for.
MIT scholars awarded seed grants to probe the social implications of generative AI
27 selected proposals represent a sweeping array of perspectives for exploring the transformative potential of generative AI, in both positive and negative directions for society.
Adept open-sourced Persimmon-8B, a <10 billion parameter language model that matches the performance of Llama 2 while having four times the context size.
Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion
Stability AI introduced Stable Audio, a latent diffusion model architecture for audio conditioned on text metadata as well as audio file duration and start time.
Multi-AI collaboration helps reasoning and factual accuracy in large language models
Researchers use multiple AI models to collaborate, debate, and improve their reasoning abilities to advance the performance of LLMs while increasing accountability and factual accuracy.
OpenAI launches a red teaming network to make its models more robust
OpenAI launched the OpenAI Red Teaming Network, a contracted group of experts to help inform the company’s AI model risk assessment and mitigation strategies.
MLOps & LLMOps
Efficient Memory Management for Large Language Model Serving with PagedAttention
Researchers build vLLM, an LLM serving system that achieves near-zero waste in KV cache memory and flexible sharing of KV cache within and across requests to further reduce memory usage.
LLM Monitoring and Observability — A Summary of Techniques and Approaches for Responsible AI
A summary of research conducted on existing methods for large language model monitoring, including metrics, evaluation datasets, and more.
Explainability in AI and Machine Learning Systems: An Overview
A guide that explores various explainability techniques and tools facilitating explainability operations.
Accelerating Vector Search: Fine-Tuning GPU Index Algorithms
An article that discusses how to accelerate vector search using GPU-based algorithms, paired with end-to-end Python demonstrations.
Learning
Create a Self-Moderated Comment System with Llama 2 and LangChain
An article on how to create a self-moderated comment response system, using two LLama models chained with LangChain.
Optimizing LLMs From a Dataset Perspective
An article highlighting strategies that involve modifying, utilizing, or manipulating the datasets for instruction-based finetuning rather than altering the model architecture or training algorithms.
Training Tiny Llamas for Fun—and Science
An article that explores how Softmax implementation can impact model performance using Karpathy's Tiny Llama implementation.
Fine-tune Falcon 180B with QLoRA and Flash Attention on Amazon SageMaker
A tutorial on how to fine-tune Falcon 180B using QLoRA with Flash Attention on Amazon SageMaker.
Libraries & Code
An open-source framework for autonomous language agents.
A versatile framework that streamlines the process of creating custom multi-agent environments for large language models.
An open-source Python library for generating synthetic yet realistic schemas and (KGs) based on user-specified parameters.
Papers & Publications
Studying Large Language Model Generalization with Influence Functions
Abstract:
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement
Abstract:
We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-based models and performing large-scale study of distillation with state-of-the-art models with various data augmentations. We create a reinforced version of the ImageNet training dataset, called ImageNet+, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+. Models trained with ImageNet+ are more accurate, robust, and calibrated, and transfer well to downstream tasks (e.g., segmentation and detection). As an example, the accuracy of ResNet-50 improves by 1.7% on the ImageNet validation set, 3.5% on ImageNetV2, and 10.0% on ImageNet-R. Expected Calibration Error (ECE) on the ImageNet validation set is also reduced by 9.9%. Using this backbone with Mask-RCNN for object detection on MS-COCO, the mean average precision improves by 0.8%. We reach similar gains for MobileNets, ViTs, and Swin-Transformers. For MobileNetV3 and Swin-Tiny, we observe significant improvements on ImageNet-R/A/C of up to 20% improved robustness. Models pretrained on ImageNet+ and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3.4% improved accuracy.
Nougat: Neural Optical Understanding for Academic Documents
Abstract:
Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. The proposed approach offers a promising solution to enhance the accessibility of scientific knowledge in the digital age, by bridging the gap between human-readable documents and machine-readable text. We release the models and code to accelerate future work on scientific text recognition.