Deep Learning Weekly : Issue #309
Code Interpreter on ChatGPT Plus, ML system design: 200 case studies, Tiny Audio Diffusion, a paper on Modular Visual Question Answering via Code Generation, and many more!
This week in deep learning, we bring you Code Interpreter on ChatGPT Plus, ML system design: 200 case studies, Tiny Audio Diffusion: Waveform Diffusion That Doesn't Require Cloud Computing, and a paper on Modular Visual Question Answering via Code Generation.
You may also enjoy Tableau GPT, Emerging Architectures for LLM Applications, Training Language Models with Textbook-Quality Synthetic Data, a paper on DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Researchers teach an AI to write better chart captions
A new dataset can help scientists develop automatic systems that generate richer, more descriptive captions for online charts.
How Tableau GPT and Tableau Pulse are reimagining the data experience
The Tableau team announced Tableau GPT, a tool that leverages generative AI to simplify and democratize the process of data analysis.
Nvidia AI Tech Ramps Up Carbon Capture & Storage Predictions 700,000x
Nvidia introduces its AI approach to carbon sequestration that CCS scientists can readily use in real-world applications through Nvidia Modulus and Nvidia Omniverse.
Code Interpreter comes to all ChatGPT Plus users
OpenAI took one of its own in-house plug-ins, Code Interpreter, and made it available to all of its ChatGPT Plus subscribers.
Spotify CEO’s startup for AI-powered preventive healthcare raises €60M
Neko Health announced it had raised €60mn in a round led by Lakestar and backed by Atomico and General Catalyst.
Anthropic releases upgrade Claude 2 AI chatbot with improved safety and coding ability
Anthropic released an upgraded version of its Claude chatbot with greatly improved safety and coding capabilities.
MLOps
ML system design: 200 case studies
A database of 200 case studies from 64 companies that share practical ML use cases and learnings from designing ML systems.
Emerging Architectures for LLM Applications
A post about a reference architecture for the emerging LLM app stack.
Securing AI Systems — Defensive Strategies
A blog post about defensive strategies that are applied to safeguard AI solutions.
The ultimate guide to LLMs and NLP for content marketing
By improving many areas of content generation, optimization, and analysis, natural language processing (NLP) and LLMs plays a crucial role in content marketing.
Torchmetrics v1.0: Visualize model performance with 100+ metrics
A blog post that goes through some of the new features and improvements in the first major release of Torchmetrics.
Learning
Tiny Audio Diffusion: Waveform Diffusion That Doesn't Require Cloud Computing
An article that explores how to train models and generate sounds with audio waveform diffusion on a consumer laptop and GPU with less than 2GB VRAM
Swin Transformer: A Novel Hierarchical Vision Transformer for Object Recognition
An article about object detection through the SWIN Transformer.
Training Language Models with Textbook-Quality Synthetic Data
An exploration of Microsoft Research’s paper ‘Textbooks Are All You Need’.
Multilingual CLIP with HuggingFace + PyTorch Lightning
An walkthrough on training Multilingual CLIP using HuggingFace and PyTorch Lightning
Libraries & Code
Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias
CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
TruLens provides a set of tools for developing and monitoring neural nets, including large language models.
Papers & Publications
Modular Visual Question Answering via Code Generation
Abstract:
We present a framework that formulates visual question answering as modular code generation. In contrast to prior work on modular approaches to VQA, our approach requires no additional training and relies on pre-trained language models (LMs), visual models pre-trained on image-caption pairs, and fifty VQA examples used for in-context learning. The generated Python programs invoke and compose the outputs of the visual models using arithmetic and conditional logic. Our approach improves accuracy on the COVR dataset by at least 3% and on the GQA dataset by roughly 2% compared to the few-shot baseline that does not employ code generation.
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
Abstract:
Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality images from detailed textual descriptions, they often lack the ability to precisely edit the generated or real images. In this paper, we propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models. Specifically, we construct classifier guidance based on the strong correspondence of intermediate features in the diffusion model. It can transform the editing signals into gradients via feature correspondence loss to modify the intermediate representation of the diffusion model. Based on this guidance strategy, we also build a multi-scale guidance to consider both semantic and geometric alignment. Moreover, a cross-branch self-attention is added to maintain the consistency between the original image and the editing result. Our method, through an efficient design, achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging. It is worth noting that all editing and content preservation signals come from the image itself, and the model does not require fine-tuning or additional modules.
h2oGPT: Democratizing Large Language Models
Abstract:
Applications built on top of Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their human-level capabilities in natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material.
We introduce h2oGPT, a suite of open-source code repositories for the creation and use of LLMs based on Generative Pretrained Transformers (GPTs). The goal of this project is to create the world's best truly open-source alternative to closed-source approaches. In collaboration with and as part of the incredible and unstoppable open-source community, we open-source several fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready for commercial use under fully permissive Apache 2.0 licenses. Included in our release is 100\% private document search using natural language.
Open-source language models help boost AI development and make it more accessible and trustworthy. They lower entry hurdles, allowing people and groups to tailor these models to their needs. This openness increases innovation, transparency, and fairness.