Deep Learning Weekly: Issue #301
Claude's 100K Context Window, Performance Bottlenecks in Deploying LLMs, YOLO-NAS, a paper on Larger language models do in-context learning differently, and many more!
This week in deep learning, we bring you Anthropic expands Claude's context window to 100K tokens, Performance Bottlenecks in Deploying LLMs - A Primer for ML Researchers, YOLO-NAS, and a paper on Larger language models do in-context learning differently.
You may also enjoy Stable Animation SDK, Why Automatic Augmentation Matters, GPT-4’s Maze Navigation: A Deep Dive into ReAct Agent and LLM’s Thoughts, a paper on Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Anthropic | Introducing 100K Context Windows
Anthropic expanded Claude’s context window from 9K to 100K tokens, corresponding to around 75,000 words.
OpenAI CEO Sam Altman testifies before Congress on AI risks
Sam Altman urged lawmakers to regulate AI during a Senate panel hearing, describing the technology’s current boom as a potential “printing press moment” but one that required safeguards.
AI models fail to reproduce human judgements about rule violations
Researchers from MIT have found that machine learning models trained to mimic human decision-making often suggest harsher judgements than humans would.
Goldman Sachs created an A.I.-powered social media startup for corporate use
Goldman Sachs, known more for its Wall Street bankers than its technology, has just spun out an AI-powered networking platform from its internal incubator.
Stability AI releases Stable Animation SDK, a powerful text-to-animation tool for developers
Stability AI releases Stable Animation SDK, a tool designed for artists and developers to implement the most advanced Stable Diffusion models to generate stunning animations.
MLOps
Performance bottlenecks in deploying LLMs—a primer for ML researchers
The first post in a series to help researchers understand the systems-level design choices involved in deploying LLMs.
Debugging Image Classifiers with Confusion Matrices
When dealing with computer vision tasks like classification, detection, segmentation, and generation, visualizing your outputs is essential to understanding how your model is behaving and why.
Chip Huyen’s collection of MLOps materials from introductory to advanced.
Why Automatic Augmentation Matters
A post on how to implement and use GPU-accelerated automatic augmentation to train a model with NVIDIA DALI, using conditional execution.
Log and Visualize Tabular Data Using Comet Data Panels
Do you want to quickly log your data and visualize it in Comet with the new built-in data panel tool? If yes, you are in the right place.
Accelerating Large Language Models with Mixed-Precision Techniques
An article that explores how leveraging lower-precision formats can enhance training and inference speeds up to 3x without compromising model accuracy.
Learning
AI Scientists: Safe and Useful AI? - Yoshua Bengio
Yoshua Bengio discusses a potential path to build immensely useful AI systems that completely avoid the issue of AI alignment.
GPT-4’s Maze Navigation: A Deep Dive into ReAct Agent and LLM’s Thoughts
A comprehensive article that delves into GPT-4’s memorization-based navigation techniques using a maze generator and LangChain.
Introducing YOLO-NAS: A New State-of-the-Art for Object Detection
A deep dive into, and technical demonstration of, the new state-of-the-art object detection architecture called YOLO-NAS.
An article on creating and training a Reinforcement Learning agent to play a game in a simulated environment using Proximal Policy Optimization (PPO) algorithm.
Libraries & Code
A lightweight SDK enabling integration of AI Large Language Models (LLMs) with conventional programming languages.
An open-source framework to evaluate, test and monitor ML models in production.
A list of open LLMs available for commercial use.
Papers & Publications
Larger language models do in-context learning differently
Abstract:
We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small language models ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough language models can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Abstract:
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. Our approach encompasses four stages: first, we use an LLM to generate synthetic prompts, and a topic-guided method to augment the prompt diversity; second, we use a small set of human-written principles for AI models to follow, and guide the LLM through in-context learning from demonstrations (of principles application) to produce helpful, ethical, and reliable responses to user's queries; third, we fine-tune the original LLM with the high-quality self-aligned responses so that the resulting model can generate desirable responses for each query directly without the principle set and the demonstrations anymore; and finally, we offer a refinement step to address the issues of overly-brief or indirect responses. Applying SELF-ALIGN to the LLaMA-65b base language model, we develop an AI assistant named Dromedary. With fewer than 300 lines of human annotations (including < 200 seed prompts, 16 generic principles, and 5 exemplars for in-context learning). Dromedary significantly surpasses the performance of several state-of-the-art AI systems, including Text-Davinci-003 and Alpaca, on benchmark datasets with various settings.
AutoFocusFormer: Image Segmentation off the Grid
Abstract:
Real world images often have highly imbalanced content density. Some areas are very uniform, e.g., large patches of blue sky, while other areas are scattered with many small objects. Yet, the commonly used successive grid downsampling strategy in convolutional deep networks treats all areas equally. Hence, small objects are represented in very few spatial locations, leading to worse results in tasks such as segmentation. Intuitively, retaining more pixels representing small objects during downsampling helps to preserve important information. To achieve this, we propose AutoFocusFormer (AFF), a local-attention transformer image recognition backbone, which performs adaptive downsampling by learning to retain the most important pixels for the task. Since adaptive downsampling generates a set of pixels irregularly distributed on the image plane, we abandon the classic grid structure. Instead, we develop a novel point-based local attention block, facilitated by a balanced clustering module and a learnable neighborhood merging module, which yields representations for our point-based versions of state-of-the-art segmentation heads. Experiments show that our AutoFocusFormer (AFF) improves significantly over baseline models of similar sizes.