Deep Learning Weekly: Issue 346
LLMs use a surprisingly simple mechanism to retrieve some stored knowledge, Binary and Scalar Embedding Quantization for Fast Retrieval, Explainability of the Hyperparameters, and many more!
This week in deep learning, we bring you Large language models use a surprisingly simple mechanism to retrieve some stored knowledge, Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval, Explainability of the features? No! Of the hyperparameters, and a paper on Evolutionary Optimization of Model Merging Recipes.
You may also enjoy Using AI to expand global access to reliable flood forecasts, 7 Methods to Secure LLM Apps from Prompt Injections and Jailbreaks, a paper on FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Large language models use a surprisingly simple mechanism to retrieve some stored knowledge
Researchers from MIT and others found that complex large language models use a simple mechanism to retrieve stored knowledge when they respond to a user prompt.
0G Labs raises $35M for modular blockchain storage for decentralized AI
0G Labs, a Web3 and blockchain firm, announced that they raised $35 million to build a modular distributed ledger blockchain network aimed at large-scale storage for decentralized AI systems.
Using AI to expand global access to reliable flood forecasts
Google demonstrates how machine learning technologies can significantly improve global-scale flood forecasting relative to the current state-of-the-art for countries where flood-related data is scarce.
‘Totally Surreal’: OpenAI shares first short films created with new AI tool Sora
OpenAI has unveiled the first short films created using its new video AI tool Sora, with filmmakers describing the creations as “totally surreal”.
Activeloop raises $11M to grow its specialized tensor database for AI training and inference
Activeloop, creator of a database platform designed for AI workloads, said that it has closed on an $11 million early-stage funding round.
MLOps & LLMOps
Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval
An article that discusses and demonstrates how embeddings can be quantized in real-world retrieval scenarios.
7 Methods to Secure LLM Apps from Prompt Injections and Jailbreaks
An article that provides practical strategies to protect language model applications against prompt injections and jailbreaks.
Supporting Diverse ML Systems at Netflix
An article discussing Netflix’s Machine Learning Platform and Metaflow, highlighting their ecosystem of tools, integrations, and ML projects.
How To Set Up a SQL Router Query Engine for Effective Text-To-SQL
A tutorial that covers how to build a query engine, record trace data, and more.
CI/CD for Machine Learning in 2024: Best Practices to Build, Test, and Deploy
An article that explores the best practices for CI/CD in ML projects.
Learning
Explainability of the features? No! Of the hyperparameters.
An article that provides a tangible method for explaining how hyperparameters influence model performance.
An End-to-End Framework for Production-Ready LLM Systems by Building Your LLM Twin
No more isolated scripts or Notebooks! Learn production ML by building and deploying an end-to-end production-grade LLM system.
Data Quality Error Detection powered by LLMs
An article about cleaning and identifying errors from tabular datasets using LLMs.
Evaluating Large Language Model systems: Metrics, challenges, and best practices
An article discussing the metrics, challenges, and best practices for evaluating LLMs.
What I learned from looking at 900 most popular open source AI tools
Chip Huyen shares her analysis from looking at 900 open source AI tools.
Libraries & Code
A multi-agent framework designed to facilitate generalist video generation tasks, leveraging a collaborative approach with multiple visual agents.
An easy-to-use Python framework to generate adversarial jailbreak prompts.
An open-source visual programming environment for battle-testing prompts to LLMs.
Papers & Publications
Evolutionary Optimization of Model Merging Recipes
Abstract:
We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development.
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Abstract:
The remarkable efficacy of text-to-image diffusion models has motivated extensive exploration of their potential application in video domains. Zero-shot methods seek to extend image diffusion models to videos without necessitating model training. Recent methods mainly focus on incorporating inter-frame correspondence into attention mechanisms. However, the soft constraint imposed on determining where to attend to valid features can sometimes be insufficient, resulting in temporal inconsistency. In this paper, we introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint. This enhancement ensures a more consistent transformation of semantically similar content across frames. Beyond mere attention guidance, our approach involves an explicit update of features to achieve high spatial-temporal consistency with the input video, significantly improving the visual coherence of the resulting translated videos. Extensive experiments demonstrate the effectiveness of our proposed framework in producing high-quality, coherent videos, marking a notable improvement over existing zero-shot methods.
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Abstract:
We present T-Rex2, a highly practical model for open-set object detection. Previous open-set object detection methods relying on text prompts effectively encapsulate the abstract concept of common objects, but struggle with rare or complex object representation due to data scarcity and descriptive limitations. Conversely, visual prompts excel in depicting novel objects through concrete visual examples, but fall short in conveying the abstract concept of objects as effectively as text prompts. Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning. T-Rex2 accepts inputs in diverse formats, including text prompts, visual prompts, and the combination of both, so that it can handle different scenarios by switching between the two prompt modalities. Comprehensive experiments demonstrate that T-Rex2 exhibits remarkable zero-shot object detection capabilities across a wide spectrum of scenarios. We show that text prompts and visual prompts can benefit from each other within the synergy, which is essential to cover massive and complicated real-world scenarios and pave the way towards generic object detection.