Deep Learning Weekly: Issue 352
OpenAI partners with Stack Overflow, Graph-based Metadata Filtering to Improve Vector Search in RAG Applications, The New Kolmogorov-Arnold Network (KAN), and many more!
This week in deep learning, we bring you OpenAI partners with Stack Overflow to make models better at coding, Graph-based Metadata Filtering to Improve Vector Search in RAG Applications, A Simplified Explanation Of The New Kolmogorov-Arnold Network (KAN) from MIT, and a paper on Better & Faster Large Language Models via Multi-token Prediction.
You may also enjoy Microsoft reportedly developing MAI-1 AI model with 500B parameters, Streamlining knowledge work with LlamaIndex, Fireworks and MongoDB, Transformers Represent Belief State Geometry in their Residual Stream, a paper on OpenELM: An Efficient Language Model Family with Open Training and Inference Framework, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Introducing the Claude Team plan and iOS app
Anthropic announced two updates for Claude: a new Team plan and an iOS app.
OpenAI partners with Stack Overflow to make models better at coding
OpenAI has announced that it is teaming up with Stack Overflow to improve its models’ ability to handle programming-related tasks.
Microsoft reportedly developing MAI-1 AI model with 500B parameters
Microsoft is developing a large language model with about 500 billion parameters.
Espresso AI emerges from stealth with $11M to tackle the cloud cost crisis
Espresso AI, an AI startup, has raised over $11 million in seed funding to bring the power of AI to perhaps the biggest challenge in enterprise computing today: reining in runaway cloud costs.
EU AI Act Regulation Compliance with Comet
On March 13, 2024, the European Parliament passed the EU AI Act to establish a common regulatory and legal framework for AI.
MLOps & LLMOps
Graph-based Metadata Filtering to Improve Vector Search in RAG Applications
A guide for optimizing vector retrieval with advanced graph-based metadata techniques using LangChain and Neo4j.
Streamlining knowledge work with LlamaIndex, Fireworks and MongoDB
An article that discusses how Team CLAB built LlamaWorksDB, an AI-powered doc wizard that consolidates information from multiple sponsors into a single chatbot interface.
Deploy open LLMs with vLLM on Hugging Face Inference Endpoints
A tutorial that shows you how to deploy open LLMS with vLLM on Hugging Face Inference Endpoints.
Learning
A Simplified Explanation Of The New Kolmogorov-Arnold Network (KAN) from MIT
An article that briefly describes the differences between the new Kolmogorov-Arnold Network and the Multi-layer Perceptron.
Deep Dive into LlaMA 3 by Hand
An article that explores the nuances of the architecture, pre-training data, and instruction fine-tuning behind Llama 3.
Transformers Represent Belief State Geometry in their Residual Stream
A blog post that discusses how LLMs synchronize with their internal world model during context window traversal, and how this synchronization can be formalized using Computational Mechanics.
A Hitchhiker’s Guide to Speculative Decoding
A guide to speculative decoding, and a demonstration on how it can coexist with other optimization techniques.
Libraries & Code
LangSmith helps your team debug, evaluate, and monitor your language models and intelligent agents.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
A CLI that writes your git commit messages for you with AI.
Papers & Publications
Better & Faster Large Language Models via Multi-token Prediction
Abstract:
Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More specifically, at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared model trunk. Considering multi-token prediction as an auxiliary training task, we measure improved downstream capabilities with no overhead in training time for both code and natural language models. The method is increasingly useful for larger model sizes, and keeps its appeal when training for multiple epochs. Gains are especially pronounced on generative benchmarks like coding, where our models consistently outperform strong baselines by several percentage points. Our 13B parameter models solves 12 % more problems on HumanEval and 17 % more on MBPP than comparable next-token models. Experiments on small algorithmic tasks demonstrate that multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities. As an additional benefit, models trained with 4-token prediction are up to 3 times faster at inference, even with large batch sizes.
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework
Abstract:
The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring 2× fewer pre-training tokens.
Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors.
SATO: Stable Text-to-Motion Framework
Abstract:
Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily stem from more accurate predictions of specific actions. However, the text modality typically relies solely on pre-trained Contrastive Language-Image Pretraining (CLIP) models. Our research has uncovered a significant issue with the text-to-motion model: its predictions often exhibit inconsistent outputs, resulting in vastly different or even incorrect poses when presented with semantically similar or identical text inputs. In this paper, we undertake an analysis to elucidate the underlying causes of this instability, establishing a clear link between the unpredictability of model outputs and the erratic attention patterns of the text encoder module. Consequently, we introduce a formal framework aimed at addressing this issue, which we term the Stable Text-to-Motion Framework (SATO). SATO consists of three modules, each dedicated to stable attention, stable prediction, and maintaining a balance between accuracy and robustness trade-off. We present a methodology for constructing an SATO that satisfies the stability of attention and prediction. To verify the stability of the model, we introduced a new textual synonym perturbation dataset based on HumanML3D and KIT-ML. Results show that SATO is significantly more stable against synonyms and other slight perturbations while keeping its high accuracy performance.