Deep Learning Weekly: Issue 337
DeepMind’s Olympiad-level AI system for geometry, Full stack transformer language models with reinforcement learning, Sampling for Text Generation, a paper on Exphormer: Sparse Transformers for Graphs
This week in deep learning, we bring you AlphaGeometry: An Olympiad-level AI system for geometry, Full stack transformer language models with reinforcement learning, Sampling for Text Generation, and a paper on Exphormer: Sparse Transformers for Graphs.
You may also enjoy Theory Suggests Chatbots Can Understand Text, Query Augmentation for Next-Level Search using LlamaIndex, Running Local LLMs and VLMs on the Raspberry Pi, a paper on InstantID: Zero-shot Identity-Preserving Generation in Seconds, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
AlphaGeometry: An Olympiad-level AI system for geometry
DeepMind introduces AlphaGeometry, an AI system that solves complex geometry problems at a level approaching a human Olympiad gold-medalist - a breakthrough in AI performance.
AI voice startup ElevenLabs lands $80M round, launches marketplace of cloned voices
ElevenLabs, an AI voice startup, has raised $80 million in a series B round of funding, growing its valuation ten-fold to $1.1 billion.
OpenAI CEO Sam Altman is still chasing billions to build AI chips
Bloomberg reports Sam Altman is in talks to raise money for a ‘global’ network of fabricators building hardware for AI.
Theory Suggests Chatbots Can Understand Text
A theory developed by Sanjeev Arora of Princeton University and Anirudh Goyal, a research scientist at Google DeepMind, suggests that the largest of today’s LLMs are not stochastic parrots.
Rethinking AI's impact: MIT CSAIL study reveals economic limits to job automation
A new study from MIT CSAIL, MIT Sloan, The Productivity Institute, and IBM’s Institute critically examines the economic practicality of using AI for automating tasks in the workplace.
MLOps & LLMOps
Advanced RAG: Query Augmentation for Next-Level Search using LlamaIndex🦙
A technical article about Query Augmentation for Next-Level Search using LlamaIndex.
Host the Whisper Model on Amazon SageMaker: exploring inference options
A technical article that explores and compares two methods for hosting Whisper on Amazon SageMaker.
Accelerating Generative AI with PyTorch IV: Seamless M4T, fast
A post that focuses on speeding up FAIR’s Seamless M4T-v2 model resulting in 2.7x speedup for end-to-end inference, with no loss of accuracy by using CUDA Graph and native PyTorch optimization.
Learning
Chip Huyen’s deep dive into how models generate responses, a process known as sampling (or decoding).
Logging YOLOPandas 🐼 with Comet-LLM
Learn how to log the YOLOPandas prompts with comet-llm, keep track of the number of tokens used in USD($), and log your metadata.
Running Local LLMs and VLMs on the Raspberry Pi
An article on how to run models like Phi-2, Mistral, and LLaVA on the Raspberry Pi using Ollama.
Leverage KeyBERT, HDBSCAN and Zephyr-7B-Beta to Build a Knowledge Graph
A blog that explores integrating simple keyword extraction using KeyBERT and employing UMAP for dimensionality reduction, coupled with HDBSCAN for clustering.
How to Fine-Tune LLMs in 2024 with Hugging Face
A blog that walks you through how to fine-tune open LLMs using Hugging Face TRL, Transformers & datasets in 2024.
Make LLM Fine-tuning 2x faster with Unsloth
An article about making LLM fine-tuning faster using a lightweight library called Unsloth.
Libraries & Code
Full stack transformer language models with reinforcement learning.
An interactive streamlit tool to support the building of RAG applications by visualizing document chunks and the queries in the embedding space.
Papers & Publications
Exphormer: Sparse Transformers for Graphs
Abstract:
Graph transformers have emerged as a promising architecture for a variety of graph learning and representation tasks. Despite their successes, though, it remains challenging to scale graph transformers to large graphs while maintaining accuracy competitive with message-passing networks. In this paper, we introduce Exphormer, a framework for building powerful and scalable graph transformers. Exphormer consists of a sparse attention mechanism based on two mechanisms: virtual global nodes and expander graphs, whose mathematical characteristics, such as spectral expansion, pseduorandomness, and sparsity, yield graph transformers with complexity only linear in the size of the graph, while allowing us to prove desirable theoretical properties of the resulting transformer models. We show that incorporating Exphormer into the recently-proposed GraphGPS framework produces models with competitive empirical results on a wide variety of graph datasets, including state-of-the-art results on three datasets. We also show that Exphormer can scale to datasets on larger graphs than shown in previous graph transformer architectures.
InstantID: Zero-shot Identity-Preserving Generation in Seconds
Abstract:
There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images. Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity. Addressing these limitations, we introduce InstantID, a powerful diffusion model-based solution. Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity. To achieve this, we design a novel IdentityNet by imposing strong semantic and weak spatial conditions, integrating facial and landmark images with textual prompts to steer the image generation. InstantID demonstrates exceptional performance and efficiency, proving highly beneficial in real-world applications where identity preservation is paramount. Moreover, our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving as an adaptable plugin.
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
Abstract:
Code generation problems differ from common natural language problems - they require matching the exact syntax of the target language, identifying happy paths and edge cases, paying attention to numerous small details in the problem spec, and addressing other code-specific issues and requirements. Hence, many of the optimizations and tricks that have been successful in natural language generation may not be effective for code tasks. In this work, we propose a new approach to code generation by LLMs, which we call AlphaCodium - a test-based, multi-stage, code-oriented iterative flow, that improves the performances of LLMs on code problems. We tested AlphaCodium on a challenging code generation dataset called CodeContests, which includes competitive programming problems from platforms such as Codeforces. The proposed flow consistently and significantly improves results. On the validation set, for example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow. Many of the principles and best practices acquired in this work, we believe, are broadly applicable to general code generation tasks.