Deep Learning Weekly : Issue #295

Pause Giant AI Experiments: An Open Letter, distributed hyperparameter tuning on Vertex AI, Cyborgism, a paper on LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.

Apr 07, 2023

This week in deep learning, we bring you Pause Giant AI Experiments: An Open Letter, distributed hyperparameter tuning on Vertex AI, Cyborgism, and a paper on LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.

You may also enjoy Meta AI's artificial visual cortex, a general purpose framework for evaluating machine learning models, Inverse Physics-Inform Neural Networks, a paper on When Fairness Naturally Emerges From Deep Ensembling, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Strengthening trust in machine-learning models

Associate Professor Tamara Broderick and colleagues build a “taxonomy of trust” to identify where confidence in the results of a data analysis might break down.

Pause Giant AI Experiments: An Open Letter

An open-letter calling on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4 (signed by Stuart Russell, Elon Musk, Yoshua Bengio, etc.).

Kangas 2.0: Exploratory Data Analysis for Computer Vision

Comet releases version 2.0 of Kangas, an open-source platform for exploring, analyzing, and visualizing multi-media data.

A method for designing neural networks optimally suited for certain tasks

MIT researchers have found that neural networks can be designed so they minimize the probability of misclassifying a data input.

Goldman Sachs report says AI could put 300 million jobs at risk

Goldman Sachs economists released a report that suggests generative artificial intelligence will significantly disrupt the global labor market, automating about 300 million jobs over the next decade.

Accelerated PyTorch 2 Transformers

The PyTorch 2.0 release includes a new high-performance implementation of the PyTorch Transformer API with the goal of making training and deployment of state-of-the-art Transformer models affordable.

Robots that learn from videos of human activities and simulated interactions

Meta AI announces two major advancements toward general-purpose embodied AI agents capable of performing challenging sensorimotor skills.

MLOps

Distributed Hyperparameter Tuning in Vertex AI Pipeline

A technical blog on integrating distributed hyperparameter tuning into the Vertex AI pipeline.

Monitoring your time series model in comet

In this tutorial, you will get a step by step tutorial on how to use Comet to monitor time-series forecasting model. We will carry out some EDA on the dataset, and then log the visualizations onto the Comet experimentation website or platform.

Monitoring NLP models in production: a tutorial on detecting drift in text data

A tutorial that dives into a specific example to explore issues affecting the performance of NLP models in production and show how to monitor them using Evidently AI.

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

A Machine Learning Engineer at Brainly walks you through the end-to-end Machine Learning Operations process in the Visual Search team.

Accelerating Stable Diffusion Inference on Intel CPUs

A post that shows you different techniques to accelerate Stable Diffusion models on Sapphire Rapids CPUs.

Learning

Inverse Physics-Informed Neural Net

An article that mathematically and practically describes how an inverse physics-informed neural network (PINN) produces responses that adhere to the relationship described by a differential equation.

Converting Tabular Dataset to Graph Dataset with Pytorch Geometric

A technical blog that walks through how to convert a tabular dataset into a graph dataset using PyTorch Geometric.

Cyborgism - LessWrong

A proposal for a strategy, which sets up human-in-the-loop systems that empowers human agency, for safely accelerating alignment research.

Bootstrapping Object Detection Model Training with 3D Synthetic Data

This post shows you how to use a model to detect fruits packaged in a crate using synthetic data generated from NVIDIA Omniverse Replicator, an SDK that programmatically generates physically accurate 3D synthetic data.

Libraries & Code

unionai-oss/pandera

A data validation library for scientists, engineers, and analysts seeking correctness.

apartresearch/interpretability-starter

A list of resources that contains an array of useful starter templates, tools to investigate model activations, and a number of introductory resources.

zeno-ml/zeno

A general-purpose framework for evaluating machine learning models.

Papers & Publications

When Fairness Naturally Emerges From Deep Ensembling

Abstract:

Ensembling independent deep neural networks (DNNs) is a simple and effective way to improve top-line metrics and to outperform larger single models. In this work, we go beyond top-line metrics and instead explore the impact of ensembling on subgroup performances. Surprisingly, even with a simple homogenous ensemble -- all the individual models share the same training set, architecture, and design choices -- we find compelling and powerful gains in worst-k and minority group performance, i.e. fairness naturally emerges from ensembling. We show that the gains in performance from ensembling for the minority group continue for far longer than for the majority group as more models are added. Our work establishes that simple DNN ensembles can be a powerful tool for alleviating disparate impact from DNN classifiers, thus curbing algorithmic harm. We also explore why this is the case. We find that even in homogeneous ensembles, varying the sources of stochasticity through parameter initialization, mini-batch sampling, and the data-augmentation realizations, results in different fairness outcomes.

Privileged Bases in the Transformer Residual Stream

Abstract:

Our mathematical theories of the Transformer architecture suggest that individual coordinates in the residual stream should have no special significance (that is, the basis directions should be in some sense "arbitrary" and no more likely to encode information than random directions). Recent work has shown that this observation is false in practice. We investigate this phenomenon and provisionally conclude that the per-dimension normalizers in the Adam optimizer are to blame for the effect.

We explore two other obvious sources of basis dependency in a Transformer: Layer normalization, and finite-precision floating-point calculations. We confidently rule these out as being the source of the observed basis-alignment.

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Abstract:

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the input text tokens at higher transformer layers. Then, a zero-init attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With efficient training, LLaMA-Adapter generates high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Furthermore, our approach can be simply extended to multi-modal input, e.g., images, for image-conditioned LLaMA, which achieves superior reasoning capacity on ScienceQA.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Discussion about this post

Ready for more?