Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #294
LiGO technique for accelerating training, Architecting Edge ML Systems, a paper on Sparks of Artificial General Intelligence: Early experiments with GPT-4, and many more.
This week in deep learning, we bring you LiGO technique for accelerating training, Architecting Edge ML Systems, A Short Chronology Of Deep Learning For Tabular Data, and a paper on Sparks of Artificial General Intelligence: Early experiments with GPT-4.
You may also enjoy NVIDIA Modulus is now open-source, Deploying Large NLP Models: Infrastructure Cost Optimization, Illustrating Reinforcement Learning from Human Feedback (RLHF), a paper on APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
New LiGO technique accelerates training of large machine-learning models, reducing the monetary and environmental cost of developing AI applications.
Cerebras Systems announced it has trained and now released seven GPT-based large language models for generative AI, making them available to the wider research community.
NVIDIA has open-sourced its state-of-the-art physics-ML platform that blends physics with deep learning training data to build high-fidelity, parameterized surrogate models with near-real-time latency.
TensorFlow 2.12 and Keras 2.12 have been released! Highlights of this release include the new Keras model saving and exporting format, the keras.utils.FeatureSpace utility, SavedModel fingerprinting, and many more.
Google announced a partnership with Replit, the creator of a popular coding platform used by more than 20 million developers.
An article that explains how to deploy machine learning models at the edge using MicroK8s, Seldon and Istio.
A complete guide to building deep learning project with PyTorch, tracking an Experiment with Comet ML, and deploying an app with Gradio on HuggingFace.
An article that explains how to train 175B parameter language models at 1000 GPU scale with Alpa and Ray.
This article aims to provide some strategies, tips, and tricks you can apply to optimize your infrastructure while deploying large NLP models.
This article will show you how to easily deploy large language models with hundreds of billions of parameters like BLOOM on Habana Gaudi 2 using Hugging Face Optimum Habana.
This post examines how to architect edge ML systems for flexibility, scalability, and efficiency.
A comprehensive and visual article that goes through how ChatGPT works and how it is able to generate human-like responses.
A blog post that breaks down the training process and illustrates Reinforcement Learning from Human Feedback.
A list of relevant papers, along with links and short summaries, on deep learning for tabular data.
This tutorial will show how to leverage Hugging Face to federate the training of language models over multiple clients using Flower.
Libraries & Code
A project that provides a bridge between GPT-4 and a headless Chromium browser, allowing you to automate actions simply by describing them to the program.
Lazy Predict helps build a lot of basic models without much code and helps understand which models work better without any parameter tuning.
Papers & Publications
Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.
Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, for LLMs beyond 100 billion parameters, existing methods cannot maintain accuracy or do not run efficiently on hardware. We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs. Based on the fact that weights are easy to quantize while activations are not, SmoothQuant smooths the activation outliers by offline migrating the quantization difficulty from activations to weights with a mathematically equivalent transformation. SmoothQuant enables an INT8 quantization of both weights and activations for all the matrix multiplications in LLMs, including OPT-175B, BLOOM-176B, GLM-130B, and MT-NLG 530B. SmoothQuant has better hardware efficiency than existing techniques. We demonstrate up to 1.56x speedup and 2x memory reduction for LLMs with negligible loss in accuracy. We integrate SmoothQuant into FasterTransformer, a state-of-the-art LLM serving framework, and achieve faster inference speed with half the number of GPUs compared to FP16, enabling the serving of a 530B LLM within a single node. Our work offers a turn-key solution that reduces hardware costs and democratizes LLMs.
Recent advances in learning aligned multimodal representations have been primarily driven by training large neural networks on massive, noisy paired-modality datasets. In this work, we ask whether it is possible to achieve similar results with substantially less training time and data. We achieve this by taking advantage of existing pretrained unimodal encoders and careful curation of alignment data relevant to the downstream task of interest. We study a natural approach to aligning existing encoders via small auxiliary functions, and we find that this method is competitive with (or outperforms) state of the art in many settings while being less prone to overfitting, less costly to train, and more robust to distribution shift. With a properly chosen alignment distribution, our method surpasses prior state of the art for ImageNet zero-shot classification on public data while using two orders of magnitude less time and data and training 77% fewer parameters.
Thanks for reading Deep Learning Weekly! Subscribe for free to receive new posts.