Deep Learning Weekly: Issue 354
Generative AI to Answer Complex Questions in Physics, Using LlamaIndex and llamafile to Build a Local Research Assistant, Recreating PyTorch from Scratch, and more!
This week in deep learning, we bring you Scientists use generative AI to answer complex questions in physics, Using LlamaIndex and llamafile to build a local, private research assistant, Recreating PyTorch from Scratch, and a paper on Chameleon: Mixed-Modal Early-Fusion Foundation Models.
You may also enjoy Anthropic taps Instagram co-founder as product chief, Building an Observable arXiv RAG Chatbot with LangChain, Chainlit, and Literal AI, a paper on Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Scientists use generative AI to answer complex questions in physics
Researchers used generative AI to develop a physics-informed technique to classify phase transitions in materials or physical systems that is much more efficient than existing machine-learning approaches.
Anthropic taps Instagram co-founder as product chief
AI startup Anthropic has appointed Instagram co-founder Mike Krieger as its chief product officer.
Gemini 1.5 Pro updates, 1.5 Flash debut and 2 new Gemma models
Google announced updates to Gemini 1.5 Pro, introduced 1.5 Flash, rolled out new developer features and added two new Gemma models.
New models added to the Phi-3 family, available on Microsoft Azure
Microsoft introduced Phi-3-vision, a multimodal model, and added Phi-3-small and Phi-3-medium models on Microsoft Azure for generative AI applications.
MLOps & LLMOps
How To Organize Continuous Delivery of ML/AI Systems: a 10-Stage Maturity Model
An article that outlines ten stages of operational maturity for deploying ML/AI systems to production.
Building an Observable arXiv RAG Chatbot with LangChain, Chainlit, and Literal AI
A tutorial on building a semantic paper engine using RAG with LangChain, Chainlit copilot apps, and Literal AI observability.
Monitor and trace your Haystack pipelines with Langfuse
A post that introduces Langfuse, and demonstrates how to trace an end-to-end request to a Haystack pipeline.
Using LlamaIndex and llamafile to build a local, private research assistant
A blog post that discusses how to set up a llamafile, run a local Large Language Model on your computer, and use LlamaIndex as the LLM and embedding backend for a local RAG-based research assistant.
Learning
Recreating PyTorch from Scratch (with GPU Support and Automatic Differentiation)
An article on building your own deep learning framework based on C/C++, CUDA, and Python, with GPU support and automatic differentiation.
Large Scale Transformer model training with Tensor Parallel (TP)
A tutorial that demonstrates how to train a large Transformer-like model across hundreds to thousands of GPUs using Tensor Parallel and Fully Sharded Data Parallel.
AI Apps in a Flash with Gradio's Reload Mode
A post on how to build a functional AI application quickly with Gradio's reload mode
Libraries & Code
An innovative multi-agent platform designed to empower developers to build multi-agent applications with large-scale models.
An open-source toolkit for LLM watermarking.
Fast and customizable framework for automatic ML model creation (AutoML).
Papers & Publications
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Abstract:
We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Abstract:
We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text. We propose modeling factual queries as constraint satisfaction problems and use this framework to investigate how the LLM interacts internally with factual constraints. We find a strong positive relationship between the LLM's attention to constraint tokens and the factual accuracy of generations. We curate a suite of 10 datasets containing over 40,000 prompts to study the task of predicting factual errors with the Llama-2 family across all scales (7B, 13B, 70B). We propose SAT Probe, a method probing attention patterns, that can predict factual errors and fine-grained constraint satisfaction, and allow early error identification. The approach and findings take another step towards using the mechanistic understanding of LLMs to enhance their reliability.
Contrastive Preference Learning: Learning from Human Feedback without RL
Abstract:
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to reward, but recent work suggests that they instead follow the regret under the user's optimal policy. Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase. Because of these optimization challenges, contemporary RLHF methods restrict themselves to contextual bandit settings (e.g., as in large language models) or limit observation dimensionality (e.g., state-based robotics). We overcome these limitations by introducing a new family of algorithms for optimizing behavior from human feedback using the regret-based model of human preferences. Using the principle of maximum entropy, we derive Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions, circumventing the need for RL. CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs. This enables CPL to elegantly scale to high-dimensional and sequential RLHF problems while being simpler than prior methods.