Deep Learning Weekly : Issue #297
Meta open-sources DINOv2, Building LLM applications with Chip Huyen, Exploring Creativity in Large Language Models From GPT-2 to GPT-4, a paper on Consistency Models, and many more!
This week in deep learning, we bring you Meta open-sources DINOv2, Building LLM applications with Chip Huyen, Exploring Creativity in Large Language Models From GPT-2 to GPT-4, and a paper on Consistency Models.
You may also enjoy EU calls for tighter controls on AI, How to Build An Experiment Tracking Tool, Understanding Parameter-Efficient Fine-tuning of Large Language Models, a paper on Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
A New Approach to Computation Reimagines Artificial Intelligence
A new approach to computation reimagines artificial intelligence by imbuing enormous vectors with semantic meaning.
Elon Musk founds new AI company called X.AI
Elon Musk has created a new company dedicated to artificial intelligence — and it’s called X.AI.
European lawmakers call for tighter controls on powerful general-purpose AI
A group of 12 European Parliament members has called on the European Union to create a new set of rules aimed at regulating a wider range of artificial intelligence tools.
DINOv2: State-of-the-art computer vision models with self-supervised learning
Meta AI open-sources DINOv2, the first method for training computer vision models that uses self-supervised learning to achieve results that match or surpass the standard approach used in the field.
Generative AI startup Kubiya debuts ChatGPT for DevOps
Kubiya, a startup that bills itself as “ChatGPT for DevOps,” announced the official launch of what it says is the first AI assistant for engineering platforms and knowledge management.
AI startup LangChain taps Sequoia to lead funding round at a valuation of at least $200 million
Just a week after announcing a $10 million seed investment from Benchmark, AI darling LangChain has scored even more capital from yet another top-tier VC.
MLOps
Building LLM applications for production
A post that discusses the key challenges of and solutions for productionizing LLM applications, demonstrates how to compose multiple tasks with control flows, and covers some of the promising use cases from companies.
Evaluate your Team’s ML Maturity
As Companies continue to invest more in Machine Learning, it is important for organizations to take a step-back and conduct an evaluation of their internal ML processes. Teams that succeed exhibit high levels of maturity in areas around model reproducibility, debugging, visibility, and monitoring.
Using Counterfactual Logit Pairing with Keras | Responsible AI Toolkit
This notebook shows you how to train a text classifier to identify offensive content and use Counterfactual Logit Pairing (CLP) to avoid having identity terms unfairly skew what is classified as offensive.
Monitoring Machine Learning Applications
An article on an easy-to-implement prioritization approach that you can use with either your own backend monitoring tools or a vendor monitoring tool.
Learning
This article explains the broad concept of finetuning and discusses popular parameter-efficient alternatives like prefix tuning and adapters.
Exploring Creativity in Large Language Models: From GPT-2 to GPT-4
An analysis of the performance of GPT models from 2019 to 2023 on tests that measure two kinds of creativity: convergent and divergent.
Grounding Large Language Models in a Cognitive Foundation: How to Build Someone We Can Talk To
An theoretical article on grounding large language models in a cognitive foundation.
Straggler Mitigation On PyTorch DDP By Hierarchical SGD
A comprehensive article that explains the need to, and demonstrates how to mitigate stragglers in PyTorch DDP.
Libraries & Code
New ways of breaking app-integrated LLMs
A project meant to give everyone access to a great chat based large language model.
Assemble, configure, and deploy autonomous AI Agents in your browser.
Papers & Publications
Abstract:
Diffusion models have made significant breakthroughs in image, audio, and video generation, but they depend on an iterative generation process that causes slow sampling speed and caps their potential for real-time applications. To overcome this limitation, we propose consistency models, a new family of generative models that achieve high sample quality without adversarial training. They support fast one-step generation by design, while still allowing for few-step sampling to trade compute for sample quality. They also support zero-shot data editing, like image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either as a way to distill pre-trained diffusion models, or as standalone generative models. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step generation. For example, we achieve the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained as standalone generative models, consistency models also outperform single-step, non-adversarial generative models on standard benchmarks like CIFAR-10, ImageNet 64x64 and LSUN 256x256.
Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications
Abstract:
We present a new algorithm for automatically bounding the Taylor remainder series. In the special case of a scalar function f:R→R, our algorithm takes as input a reference point x0, trust region [a,b], and integer k≥1, and returns an interval I such that f(x)−∑k−1i=01i!f(i)(x0)(x−x0)i∈I(x−x0)k for all x∈[a,b]. As in automatic differentiation, the function f is provided to the algorithm in symbolic form, and must be composed of known atomic functions.
At a high level, our algorithm has two steps. First, for a variety of commonly-used elementary functions (e.g., exp, log), we derive sharp polynomial upper and lower bounds on the Taylor remainder series. We then recursively combine the bounds for the elementary functions using an interval arithmetic variant of Taylor-mode automatic differentiation. Our algorithm can make efficient use of machine learning hardware accelerators, and we provide an open source implementation in JAX.
We then turn our attention to applications. Most notably, we use our new machinery to create the first universal majorization-minimization optimization algorithms: algorithms that iteratively minimize an arbitrary loss using a majorizer that is derived automatically, rather than by hand. Applied to machine learning, this leads to architecture-specific optimizers for training deep networks that converge from any starting point, without hyperparameter tuning. Our experiments show that for some optimization problems, these hyperparameter-free optimizers outperform tuned versions of gradient descent, Adam, and AdaGrad. We also show that our automatically-derived bounds can be used for verified global optimization and numerical integration, and to prove sharper versions of Jensen's inequality.
Segment Everything Everywhere All at Once
Abstract:
Despite the growing demand for interactive AI systems, there have been few comprehensive studies on human-AI interaction in visual understanding e.g. segmentation. Inspired by the development of prompt-based universal interfaces for LLMs, this paper presents SEEM, a promptable, interactive model for Segmenting Everything Everywhere all at once in an image. SEEM has four desiderata: i) Versatility: by introducing a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image; ii) Compositionality: by learning a joint visual-semantic space for visual and textual prompts to compose queries on the fly for inference as shown in Fig 1; iii)Interactivity: by incorporating learnable memory prompts to retain dialog history information via mask-guided cross-attention; and iv) Semantic-awareness: by using a text encoder to encode text queries and mask labels for open-vocabulary segmentation.
thanks