Discover more from Deep Learning Weekly
Deep Learning Weekly : Issue #295
Pause Giant AI Experiments: An Open Letter, distributed hyperparameter tuning on Vertex AI, Cyborgism, a paper on LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.
This week in deep learning, we bring you Pause Giant AI Experiments: An Open Letter, distributed hyperparameter tuning on Vertex AI, Cyborgism, and a paper on LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.
You may also enjoy Meta AI's artificial visual cortex, a general purpose framework for evaluating machine learning models, Inverse Physics-Inform Neural Networks, a paper on When Fairness Naturally Emerges From Deep Ensembling, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Associate Professor Tamara Broderick and colleagues build a “taxonomy of trust” to identify where confidence in the results of a data analysis might break down.
An open-letter calling on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4 (signed by Stuart Russell, Elon Musk, Yoshua Bengio, etc.).
Comet releases version 2.0 of Kangas, an open-source platform for exploring, analyzing, and visualizing multi-media data.
MIT researchers have found that neural networks can be designed so they minimize the probability of misclassifying a data input.
Goldman Sachs economists released a report that suggests generative artificial intelligence will significantly disrupt the global labor market, automating about 300 million jobs over the next decade.
The PyTorch 2.0 release includes a new high-performance implementation of the PyTorch Transformer API with the goal of making training and deployment of state-of-the-art Transformer models affordable.
Meta AI announces two major advancements toward general-purpose embodied AI agents capable of performing challenging sensorimotor skills.
A technical blog on integrating distributed hyperparameter tuning into the Vertex AI pipeline.
In this tutorial, you will get a step by step tutorial on how to use Comet to monitor time-series forecasting model. We will carry out some EDA on the dataset, and then log the visualizations onto the Comet experimentation website or platform.
A tutorial that dives into a specific example to explore issues affecting the performance of NLP models in production and show how to monitor them using Evidently AI.
A Machine Learning Engineer at Brainly walks you through the end-to-end Machine Learning Operations process in the Visual Search team.
A post that shows you different techniques to accelerate Stable Diffusion models on Sapphire Rapids CPUs.
An article that mathematically and practically describes how an inverse physics-informed neural network (PINN) produces responses that adhere to the relationship described by a differential equation.
A technical blog that walks through how to convert a tabular dataset into a graph dataset using PyTorch Geometric.
A proposal for a strategy, which sets up human-in-the-loop systems that empowers human agency, for safely accelerating alignment research.
This post shows you how to use a model to detect fruits packaged in a crate using synthetic data generated from NVIDIA Omniverse Replicator, an SDK that programmatically generates physically accurate 3D synthetic data.
Libraries & Code
A data validation library for scientists, engineers, and analysts seeking correctness.
A list of resources that contains an array of useful starter templates, tools to investigate model activations, and a number of introductory resources.
A general-purpose framework for evaluating machine learning models.
Papers & Publications
Ensembling independent deep neural networks (DNNs) is a simple and effective way to improve top-line metrics and to outperform larger single models. In this work, we go beyond top-line metrics and instead explore the impact of ensembling on subgroup performances. Surprisingly, even with a simple homogenous ensemble -- all the individual models share the same training set, architecture, and design choices -- we find compelling and powerful gains in worst-k and minority group performance, i.e. fairness naturally emerges from ensembling. We show that the gains in performance from ensembling for the minority group continue for far longer than for the majority group as more models are added. Our work establishes that simple DNN ensembles can be a powerful tool for alleviating disparate impact from DNN classifiers, thus curbing algorithmic harm. We also explore why this is the case. We find that even in homogeneous ensembles, varying the sources of stochasticity through parameter initialization, mini-batch sampling, and the data-augmentation realizations, results in different fairness outcomes.
Our mathematical theories of the Transformer architecture suggest that individual coordinates in the residual stream should have no special significance (that is, the basis directions should be in some sense "arbitrary" and no more likely to encode information than random directions). Recent work has shown that this observation is false in practice. We investigate this phenomenon and provisionally conclude that the per-dimension normalizers in the Adam optimizer are to blame for the effect.
We explore two other obvious sources of basis dependency in a Transformer: Layer normalization, and finite-precision floating-point calculations. We confidently rule these out as being the source of the observed basis-alignment.
We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the input text tokens at higher transformer layers. Then, a zero-init attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With efficient training, LLaMA-Adapter generates high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Furthermore, our approach can be simply extended to multi-modal input, e.g., images, for image-conditioned LLaMA, which achieves superior reasoning capacity on ScienceQA.