Deep Learning Weekly: Issue 357
Microsoft's Aurora, Applying LLMs to Recommendation Experiences, a paper on Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
This week in deep learning, we bring you Microsoft's Aurora, Applying LLMs to Recommendation Experiences, and a paper on Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models.
You may also enjoy Elon Musk's xAI plans multibillion-dollar supercomputer in Memphis, Monitoring LLM Security with Langfuse, a paper on gzip Predicts Data-dependent Scaling Laws, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Introducing Aurora: The first large-scale foundation model of the atmosphere
Microsoft introduces Aurora, a cutting-edge AI foundation model that can extract valuable insights from vast amounts of atmospheric data.
Elon Musk's xAI plans multibillion-dollar supercomputer in Memphis
Elon Musk’s xAI is planning to build a multibillion dollar supercomputer in Memphis as part of his effort to step up competition with rival firms such as OpenAI and Google.
An introduction to the pretrained and instruction-tuned Qwen2 models of various sizes that have improved language support, extended context lengths, and more.
Salesforce to open new AI center in London as part of $4 billion UK investment
Salesforce is launching an AI center in London, demonstrating confidence in the UK as a prominent global technology hub.
MLOps & LLMOps
Applying LLMs to Recommendation Experiences
Eugene Yan shares about the challenges faced while building and deploying LLM-powered recommendation experiences at consumer scale.
Monitoring LLM Security with Langfuse
An overview of how you can use security tools in conjunction with Langfuse to monitor and protect against common security risks.
Advanced RAG: Corrective Retrieval Augmented Generation (CRAG) with LangGraph
A blog post that highlights Corrective Retrieval-Augmented Generation (CRAG), a method that aims to self-correct retriever results and enhance document utilization for generation.
How to Build An End-to-End Machine Learning Pipeline in 2024
A comprehensive article on building an end-to-end ML pipeline and streamlining your ML workflows in 2024, from data ingestion to model deployment and performance monitoring.
Learning
Deep Dive into Anthropic’s Sparse Autoencoders by Hand
A visual deep dive into Anthropic’s Sparse Autoencoders for LLM interpretability.
AI in software engineering at Google: Progress and the path ahead
Google presents their newest AI-powered improvements within the context of the continuing transformation of Google’s internal software development tools.
Harnessing analytics and AI to shape the future of mobility retail
McKinsey’s article discussing how AI and advanced analytics can shape the future of mobility retail, especially as electric vehicles become more prominent.
Libraries & Code
A self-organizing file system with Llama 3.
A data platform for serving AI/ML applications.
Papers & Publications
gzip Predicts Data-dependent Scaling Laws
Abstract:
Past work has established scaling laws that predict the performance of a neural language model (LM) as a function of its parameter count and the number of tokens it's trained on, enabling optimal allocation of a fixed compute budget. Are these scaling laws agnostic to training data as some prior work suggests? We generate training datasets of varying complexities by modulating the syntactic properties of a PCFG, finding that 1) scaling laws are sensitive to differences in data complexity and that 2) gzip, a compression algorithm, is an effective predictor of how data complexity impacts scaling properties. We propose a new data-dependent scaling law for LM's that accounts for the training data's gzip-compressibility; its compute-optimal frontier increases in dataset size preference (over parameter count preference) as training data becomes harder to compress.
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Abstract:
We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). Specifically, we propose meta-buffer to store a series of informative high-level thoughts, namely thought-template, distilled from the problem-solving processes across various tasks. Then for each problem, we retrieve a relevant thought-template and adaptively instantiate it with specific reasoning structures to conduct efficient reasoning. To guarantee the scalability and stability, we further propose buffer-manager to dynamically update the meta-buffer, thus enhancing the capacity of meta-buffer as more tasks are solved. We conduct extensive experiments on 10 challenging reasoning-intensive tasks, and achieve significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One. Further analysis demonstrate the superior generalization ability and model robustness of our BoT, while requiring only 12% of the cost of multi-query prompting methods (e.g., tree/graph of thoughts) on average. Notably, we find that our Llama3-8B+BoT has the potential to surpass Llama3-70B model.
Matching Anything by Segmenting Anything
Abstract:
The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT). Current methods predominantly rely on labeled domain-specific video datasets, which limits the cross-domain generalization of learned similarity embeddings. We propose MASA, a novel method for robust instance association learning, capable of matching any objects within videos across diverse domains without tracking labels. Leveraging the rich object segmentation from the Segment Anything Model (SAM), MASA learns instance-level correspondence through exhaustive data transformations. We treat the SAM outputs as dense object region proposals and learn to match those regions from a vast image collection. We further design a universal MASA adapter which can work in tandem with foundational segmentation or detection models and enable them to track any detected objects. Those combinations present strong zero-shot tracking ability in complex domains. Extensive tests on multiple challenging MOT and MOTS benchmarks indicate that the proposed method, using only unlabeled static images, achieves even better performance than state-of-the-art methods trained with fully annotated in-domain video sequences, in zero-shot association.