Deep Learning Weekly: Issue 331
DeepMind's Gemini, A Guide on 12 Tuning Strategies for Production-Ready RAG Applications, Mixture of Experts Explained, a paper on Fine-tuning Language Models for Factuality, and many more!
This week in deep learning, we bring you DeepMind's Gemini, A Guide on 12 Tuning Strategies for Production-Ready RAG Applications, Mixture of Experts Explained, and a paper on Fine-tuning Language Models for Factuality.
You may also enjoy Microsoft's Phi-2, Deploy Mixtral 8x7B on Amazon SageMaker, a paper on Sparsity-Preserving Differentially Private Training of Large Embedding Models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Introducing Gemini: Google’s most capable AI model yet
Google Deepmind unveiled Gemini, the most capable multimodal model that they have ever built.
Phi-2: The surprising power of small language models
Microsoft released Phi-2, a 2.7 billion-parameter model with outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base models with less than 13 billion parameters.
Essential AI emerges from stealth with backing from Google, Nvidia and AMD
San Francisco-based Essential AI, a startup looking to reinvent how people work with the power of AI, emerged from stealth with $56.5 million in a series A round of funding.
MIT engineers develop a way to determine how the surfaces of materials behave
MIT researchers devised an ML-based method to investigate how materials behave at their surfaces. The approach could help in developing compounds or alloys for use as catalysts, semiconductors, or battery components.
Introducing Scale’s Automotive Foundation Model
Scale introduced the Automotive Foundation Model (AFM-1): a new model trained on diverse, large-scale street scene data that is capable of handling multiple vision tasks.
Announcing Purple Llama: Towards open trust and safety in the new world of generative AI
Meta announced the launch of Purple Llama — an umbrella project that, over time, will bring together tools and evaluations to help the community build responsibly with open generative AI models.
MLOps & LLMOps
A Guide on 12 Tuning Strategies for Production-Ready RAG Applications
A blog post on how to improve the performance of your Retrieval-Augmented Generation (RAG) pipeline with certain “hyperparameters” and tuning strategies.
Deploy Mixtral 8x7B on Amazon SageMaker
A blog post on how to deploy Mistral-8x7B using Amazon SageMaker.
Minimize real-time inference latency by using Amazon SageMaker routing strategies
A post that discusses the SageMaker least outstanding requests (LOR) routing strategy and how it can minimize latency for certain types of real-time inference workloads.
Learning
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
A colab notebook for generating visual anagrams and other multi-view optical illusions using diffusion models.
LangChain Document Loaders for Web Data
An article that dives into how different document loaders in LangChain impact a Retrieval Augmented Generation (RAG) system.
A comprehensive article on the building blocks of MoEs, the training process, and the tradeoffs to consider when serving them for inference.
Libraries & Code
A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.
LLMCompiler is a framework that enables an efficient and effective orchestration of parallel function calling with LLMs.
AIoT-MLSys-Lab/Efficient-LLMs-Survey
Efficient Large Language Models: A Survey
Papers & Publications
Fine-tuning Language Models for Factuality
Abstract:
The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual fact-checking of model responses is a time-consuming process, making human factuality labels expensive to acquire. In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. We leverage two key recent innovations in NLP to do so. First, several recent works have proposed methods for judging the factuality of open-ended text by measuring consistency with an external knowledge base or simply a large model's confidence scores. Second, the direct preference optimization algorithm enables straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses. We show that learning from automatically generated factuality preference rankings, generated either through existing retrieval systems or our novel retrieval-free approach, significantly improves the factuality (percent of generated claims that are correct) of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. At 7B scale, compared to Llama-2-chat, we observe 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively.
Sparsity-Preserving Differentially Private Training of Large Embedding Models
Abstract:
As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. To address this issue, we present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models. Our algorithms achieve substantial reductions (106×) in gradient size, while maintaining comparable levels of accuracy, on benchmark real-world datasets.
Sequential Modeling Enables Scalable Learning for Large Vision Models
Abstract:
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion tokens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.