Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #317
Google's WeatherBench 2, Optimize LLMs using GPTQ and Hugging Face Optimum, Evaluation & Hallucination Detection, a paper on Generating Deployable Models from Natural Language Instructions, and many m
This week in deep learning, we bring you Google's new benchmark for global weather models, Optimize open LLMs using GPTQ and Hugging Face Optimum, Abstractive Summaries: Evaluation & Hallucination Detection, and a paper on Prompt2Model: Generating Deployable Models from Natural Language Instructions.
You may also enjoy OpenAI released a guide for teachers using ChatGPT, Implementing Multi-GPU Distributed Training for Personalized Recommendations, AudioLDM 2, but faster, a paper on FaceChain: A Playground for Identity-Preserving Portrait Generation, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Google announced WeatherBench 2 (WB2), a benchmark for the next generation of data-driven, global weather models.
OpenAI released a guide for teachers using ChatGPT in their classroom—including suggested prompts, an explanation of how ChatGPT works and its limitations, the efficacy of AI detectors, and bias
Meta AI announced that DINOv2, a cutting-edge computer vision model trained through self-supervised learning to produce universal features, is now available under the Apache 2.0 license
The article discusses how Yale University is approaching the use of ChatGPT and other large language models in the classroom, without banning them or imposing strict policies.
A new AI-powered tool harnesses high-resolution satellite imagery for real-time coral reef monitoring, casting a wide net of possibilities for global marine conservation.
AI21 Labs has closed $155 million in series C funding to accelerate the growth of its text-based generative AI services for enterprises..
MLOps & LLMOps
A blog that delves into the steps Stitch Fix followed to overcome training time inefficiencies, as well as their journey to implement multi-GPU distributed model training for CTSM.
Use CometLLM to log and visualize all your prompts and chains and unleash the full potential of Large Language Models!
A tutorial on optimizing open LLMs using GPTQ and Hugging Face Optimum.
A blog on how to enable the collection and analysis of PyTorch Profiler traces for training workloads without any user side code instrumentation.
NLP has the ability to revolutionize the legal landscape as we know it by teaching robots the complexities of legal vocabulary, grammar, and semantics.
A technical blog post showcasing how to use AudioLDM 2 in the Hugging Face Diffusers library, exploring a range of code and model optimizations.
An article that discusses four dimensions for evaluating abstractive summarization before diving into reference-based, context-based, and preference-based metrics.
An article that highlights Retrieval-Augmented Generation, a technique that augments the prompt with relevant data from a vector database to improve the quality and relevance of LLM responses.
A pragmatic guide to implementing guardrails, covering both Guardrails AI and NVIDIA’s NeMo Guardrails.
Libraries & Code
Create Customized Software using Natural Language Idea (through Multi-Agent Collaboration)
An open-source, locally running implementation of OpenAI's Code Interpreter.
Textbase is a framework for building chatbots using NLP and ML.
Papers & Publications
Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. The proposed approach offers a promising solution to enhance the accessibility of scientific knowledge in the digital age, by bridging the gap between human-readable documents and machine-readable text. We release the models and code to accelerate future work on scientific text recognition.
Large language models (LLMs) enable system builders today to create competent NLP systems through prompting, where they only need to describe the task in natural language and provide a few examples. However, in other ways, LLMs are a step backward from traditional special-purpose NLP models; they require extensive computational resources for deployment and can be gated behind APIs. In this paper, we propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs, and uses it to train a special-purpose model that is conducive to deployment. This is done through a multi-step process of retrieval of existing datasets and pretrained models, dataset generation using LLMs, and supervised fine-tuning on these retrieved and generated datasets. Over three tasks, we demonstrate that given the same few-shot prompt as input, Prompt2Model trains models that outperform the results of a strong LLM, gpt-3.5-turbo, by an average of 20% while being up to 700 times smaller. We also show that this data can be used to obtain reliable performance estimates of model performance, enabling model developers to assess model reliability before deployment.
Recent advancements in personalized image generation have unveiled the intriguing capability of pre-trained text-to-image models on learning identity information from a collection of portrait images. However, existing solutions can be vulnerable in producing truthful details, and usually suffer from several defects such as (i) The generated face exhibit its own unique characteristics, \ie facial shape and facial feature positioning may not resemble key characteristics of the input, and (ii) The synthesized face may contain warped, blurred or corrupted regions. In this paper, we present FaceChain, a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models (\eg, face detection, deep face embedding extraction, and facial attribute recognition), to tackle aforementioned challenges and to generate truthful personalized portraits, with only a handful of portrait images as input. Concretely, we inject several SOTA face models into the generation procedure, achieving a more efficient label-tagging, data-processing, and model post-processing compared to previous solutions, such as DreamBooth, InstantBooth , or other LoRA-only approaches. Through the development of FaceChain, we have identified several potential directions to accelerate development of Face/Human-Centric AIGC research and application. We have designed FaceChain as a framework comprised of pluggable components that can be easily adjusted to accommodate different styles and personalized needs. We hope it can grow to serve the burgeoning needs from the communities.
Thanks for reading Deep Learning Weekly! Subscribe for free to receive new posts and support my work.