Deep Learning Weekly: Issue 361
OpenAI's CriticGPT, Levels of Complexity: RAG Applications, Extrinsic Hallucinations in LLMs, a paper on Efficient Portrait Animation with Stitching and Retargeting Control, and many more!
This week in deep learning, we bring you OpenAI's CriticGPT, Levels of Complexity: RAG Applications, Extrinsic Hallucinations in LLMs, and a paper on LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control.
You may also enjoy Meta open-sources new ‘multi-token prediction’ language models, Assessing ASR performance with meaning preservation, a paper on Self-Play Preference Optimization for Language Model Alignment, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
OpenAI shares that they have begun integrating CriticGPT, models which write critiques of ChatGPT responses to help human trainers spot mistakes, into their RLHF labeling pipelines.
MIT researchers introduce generative AI for databases
MIT researchers introduce a new generative AI system for databases that enables people to perform complicated statistical analyses on tabular data.
Meta open-sources new ‘multi-token prediction’ language models
Meta has open-sourced four language models that implement an emerging machine learning approach known as multi-token prediction.
Announcing Hacker Cup AI Track at NeurIPS 2024
The PyTorch team in partnership with Meta Hacker Cup, and Microsoft Research, announced the Hacker Cup AI Track at NeurIPS 2024.
Runway eyes $450M in new funding amid ongoing interest in AI startups
Runway AI, a video generation startup, is reportedly in talks to raise $450 million.
Google Cloud TPUs made available to Hugging Face users
AI builders are now able to accelerate their applications with Google Cloud TPUs on Hugging Face Inference Endpoints and Spaces.
MLOps & LLMOps
Multi AI Agent Systems 101. Automating Routine Tasks in Data Source
An article that explores how to automate analyst tasks using a multi-agent framework called CrewAI.
Levels of Complexity: RAG Applications
A comprehensive guide to understanding and implementing RAG applications across different levels of complexity.
Deploy Multilingual LLMs with NVIDIA NIM
A post that explores two LoRA-tuned adapters from different regions and deploys them with Llama 3 NIM so that it performs better in Chinese and Hindi.
Learning
Extrinsic Hallucinations in LLMs
Lilian Weng’s comprehensive post highlighting the causal relationships, detection techniques, and mitigation measures for extrinsic hallucinations.
Assessing ASR performance with meaning preservation
Google reports progress on LLMs to assess meaning preservation of ASR transcripts, proposing it as an alternative metric to word error rate (WER), especially for low-resource scenarios and atypical speech.
How to Interview and Hire ML/AI Engineers
Eugene Yan shares a few practical tips about interviewing candidates for machine learning and AI roles.
Meta introduces a new state-of-the-art, fast pipeline for text-to-3D asset generation called Meta 3D Gen.
Libraries & Code
Semantic cache for LLMs. Fully integrated with LangChain and LlamaIndex.
An open-source framework for autonomous language agents.
A Python-based UI framework that allows you to rapidly build web apps like demos and internal apps.
Papers & Publications
Evolutionary Optimization of Model Merging Recipes
Abstract:
We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development.
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
Abstract:
Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability. Building upon this, we develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage. To enhance the generation quality and generalization ability, we scale up the training data to about 69 million high-quality frames, adopt a mixed image-video training strategy, upgrade the network architecture, and design better motion transformation and optimization objectives. Additionally, we discover that compact implicit keypoints can effectively represent a kind of blendshapes and meticulously propose a stitching and two retargeting modules, which utilize a small MLP with negligible computational overhead, to enhance the controllability. Experimental results demonstrate the efficacy of our framework even compared to diffusion-based methods. The generation speed remarkably reaches 12.8ms on an RTX 4090 GPU with PyTorch.
Self-Play Preference Optimization for Language Model Alignment
Abstract:
Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model alignment. In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. Our approach, dubbed Self-Play Preference Optimization (SPPO), approximates the Nash equilibrium through iterative policy updates and enjoys a theoretical convergence guarantee. Our method can effectively increase the log-likelihood of the chosen response and decrease that of the rejected response, which cannot be trivially achieved by symmetric pairwise loss such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO). In our experiments, using only 60k prompts (without responses) from the UltraFeedback dataset and without any prompt augmentation, by leveraging a pre-trained preference model PairRM with only 0.4B parameters, SPPO can obtain a model from fine-tuning Mistral-7B-Instruct-v0.2 that achieves the state-of-the-art length-controlled win-rate of 28.53% against GPT-4-Turbo on AlpacaEval 2.0. It also outperforms the (iterative) DPO and IPO on MT-Bench and the Open LLM Leaderboard. Starting from a stronger base model Llama-3-8B-Instruct, we are able to achieve a length-controlled win rate of 38.77%. Notably, the strong performance of SPPO is achieved without additional external supervision (e.g., responses, preferences, etc.) from GPT-4 or other stronger language models.
Hey Miko,
My name is Kevin, been reading you for years. Always great content. Keep doing what you are doing
-Kevin
Hey Miko,
My name is Marc from the agency Hamster Garage representing our client VEED.io.
We would love to connect to discuss a sponsorship.
What is the best way to reach you?
Best,
Marc