Deep Learning Weekly: Issue 331
DeepMind's Gemini, A Guide on 12 Tuning Strategies for Production-Ready RAG Applications, Mixture of Experts Explained, a paper on Fine-tuning Language Models for Factuality, and many more!
This week in deep learning, we bring you DeepMind's Gemini, A Guide on 12 Tuning Strategies for Production-Ready RAG Applications, Mixture of Experts Explained, and a paper on Fine-tuning Language Models for Factuality.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Google Deepmind unveiled Gemini, the most capable multimodal model that they have ever built.
Microsoft released Phi-2, a 2.7 billion-parameter model with outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base models with less than 13 billion parameters.
San Francisco-based Essential AI, a startup looking to reinvent how people work with the power of AI, emerged from stealth with $56.5 million in a series A round of funding.
MIT researchers devised an ML-based method to investigate how materials behave at their surfaces. The approach could help in developing compounds or alloys for use as catalysts, semiconductors, or battery components.
Scale introduced the Automotive Foundation Model (AFM-1): a new model trained on diverse, large-scale street scene data that is capable of handling multiple vision tasks.
Meta announced the launch of Purple Llama — an umbrella project that, over time, will bring together tools and evaluations to help the community build responsibly with open generative AI models.
MLOps & LLMOps
A blog post on how to improve the performance of your Retrieval-Augmented Generation (RAG) pipeline with certain “hyperparameters” and tuning strategies.
A blog post on how to deploy Mistral-8x7B using Amazon SageMaker.
A post that discusses the SageMaker least outstanding requests (LOR) routing strategy and how it can minimize latency for certain types of real-time inference workloads.
A colab notebook for generating visual anagrams and other multi-view optical illusions using diffusion models.
An article that dives into how different document loaders in LangChain impact a Retrieval Augmented Generation (RAG) system.
A comprehensive article on the building blocks of MoEs, the training process, and the tradeoffs to consider when serving them for inference.
Libraries & Code
A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.
LLMCompiler is a framework that enables an efficient and effective orchestration of parallel function calling with LLMs.
Efficient Large Language Models: A Survey
Papers & Publications
The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual fact-checking of model responses is a time-consuming process, making human factuality labels expensive to acquire. In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. We leverage two key recent innovations in NLP to do so. First, several recent works have proposed methods for judging the factuality of open-ended text by measuring consistency with an external knowledge base or simply a large model's confidence scores. Second, the direct preference optimization algorithm enables straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses. We show that learning from automatically generated factuality preference rankings, generated either through existing retrieval systems or our novel retrieval-free approach, significantly improves the factuality (percent of generated claims that are correct) of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. At 7B scale, compared to Llama-2-chat, we observe 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively.
As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. To address this issue, we present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models. Our algorithms achieve substantial reductions (106×) in gradient size, while maintaining comparable levels of accuracy, on benchmark real-world datasets.
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion tokens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.
Thanks for reading Deep Learning Weekly! Subscribe for free to receive new posts and support my work.