Deep Learning Weekly: Issue #311
Meta AI’s Llama 2, LLM training pipelines with Langchain, Airbyte, and Dagster, The Psychology of ChatGPT, a paper on Personalizing Diffusion Models with Iterative Feedback, and many more!
This week in deep learning, we bring you Meta AI makes Llama 2 available for research and commercial use, LLM training pipelines with Langchain, Airbyte, and Dagster, The Psychology of ChatGPT, and a paper on FABRIC: Personalizing Diffusion Models with Iterative Feedback.
You may also enjoy Cohere releases Coral, AI assistant designed for enterprise business use, Optimized Deep Learning Pipelines: Protobufs, Blockchain in the age of LLMs, a paper on LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Meta and Microsoft Introduce the Next Generation of Llama
Meta AI makes Llama 2, the next generation open-source large language model, available for research and commercial use.
Pinecone leads 'explosion' in vector databases for generative AI
Pinecone, a vector database provider, has raised $138 million and is valued at $750 million.
Cohere releases Coral, AI assistant designed for enterprise business use
Cohere released Coral, a “knowledge assistant” designed specifically for enterprise business use.
The Biden-Harris Administration has secured voluntary commitments from leading AI companies to help move toward safe, secure, and transparent development of AI technology.
Nvidia's DGX Cloud platform now available, offering instant access to generative AI infrastructure
Nvidia announced broad availability of Nvidia DGX Cloud, giving companies access to thousands of GPUs on Oracle Cloud Infrastructure and its own cloud-based servers.
MLOps
Fine-Tuning Yolov8 for Image Segmentation
In this article you will explore image segmentation, the limitations of segmentation models, and the process of fine-tuning YOLOv8 for image segmentation.
Optimized Deep Learning Pipelines: Protobufs
An article that explores how we can optimize our deep learning pipelines using TensorFlow’s TFRecords.
LLM training pipelines with Langchain, Airbyte, and Dagster
A tutorial that shows how to combine Langchain, Airbyte, and Dagster for maintainable and scalable LLM training pipelines.
Launching Data Engine – Manage & Improve Your Unstructured Datasets
DagsHub unveils Data Engine – a toolset that is built to empower ML teams to handle unstructured data, iterate on it quickly and reliably, and use it to build better models for production.
An article that explores the effect of distributed training on convergence and how to use Amazon SageMaker Automatic Model Tuning to fine-tune model hyperparameters.
Learning
An article that shows how LLMs can be a massive UX shortcut for blockchain, thanks to its adaptability, on-chain transparency and flexible intent matching.
Curate an instruction dataset for supervised fine-tuning
An article on how you can build an instruction dataset for your fine-tuning projects by cleaning a public dataset using Argilla’s Feedback Task.
A Deep Dive Into Guidance’s Source Code
A comprehensive article on how to guide Large Language Models.
A blog post by a product researcher at Microsoft who explores the algorithmic biases of ChatGPT.
Libraries & Code
yaodongC/awesome-instruction-dataset
A collection of open-source instruction tuning datasets to train (text and multi-modal) chat-based LLMs.
ShortGPT is a powerful framework for automating content creation. It simplifies video creation, footage sourcing, voiceover synthesis, and editing tasks.
TypeChat is a library that makes it easy to build natural language interfaces using types.
Papers & Publications
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
Abstract:
LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate (1) the relative strengths of LLMs on different tasks (by cross-comparing their performances on sub-tasks) and (2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.
FABRIC: Personalizing Diffusion Models with Iterative Feedback
Abstract:
In an era where visual content generation is increasingly driven by machine learning, the integration of human feedback into generative models presents significant opportunities for enhancing user experience and output quality. This study explores strategies for incorporating iterative human feedback into the generative process of diffusion-based text-to-image models. We propose FABRIC, a training-free approach applicable to a wide range of popular diffusion models, which exploits the self-attention layer present in the most widely used architectures to condition the diffusion process on a set of feedback images. To ensure a rigorous assessment of our approach, we introduce a comprehensive evaluation methodology, offering a robust mechanism to quantify the performance of generative visual models that integrate human feedback. We show that generation results improve over multiple rounds of iterative feedback through exhaustive analysis, implicitly optimizing arbitrary user preferences. The potential applications of these findings extend to fields such as personalized content creation and customization.
Meta-Transformer: A Unified Framework for Multimodal Learning
Abstract:
Multimodal learning aims to build models that can process and relate information from multiple modalities. Despite years of development in this field, it still remains challenging to design a unified network for processing various modalities (e.g. natural language, 2D images, 3D point clouds, audio, video, time series, tabular data) due to the inherent gaps among them. In this work, we propose a framework, named Meta-Transformer, that leverages a frozen encoder to perform multimodal perception without any paired multimodal training data. In Meta-Transformer, the raw input data from various modalities are mapped into a shared token space, allowing a subsequent encoder with frozen parameters to extract high-level semantic features of the input data. Composed of three main components: a unified data tokenizer, a modality-shared encoder, and task-specific heads for downstream tasks, Meta-Transformer is the first framework to perform unified learning across 12 modalities with unpaired data. Experiments on different benchmarks reveal that Meta-Transformer can handle a wide range of tasks including fundamental perception (text, image, point cloud, audio, video), practical application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph, tabular, and time-series). Meta-Transformer indicates a promising future for developing unified multimodal intelligence with transformers.