Deep Learning Weekly : Issue #309

Code Interpreter on ChatGPT Plus, ML system design: 200 case studies, Tiny Audio Diffusion, a paper on Modular Visual Question Answering via Code Generation, and many more!

Jul 12, 2023

This week in deep learning, we bring you Code Interpreter on ChatGPT Plus, ML system design: 200 case studies, Tiny Audio Diffusion: Waveform Diffusion That Doesn't Require Cloud Computing, and a paper on Modular Visual Question Answering via Code Generation.

You may also enjoy Tableau GPT, Emerging Architectures for LLM Applications, Training Language Models with Textbook-Quality Synthetic Data, a paper on DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Researchers teach an AI to write better chart captions

A new dataset can help scientists develop automatic systems that generate richer, more descriptive captions for online charts.

How Tableau GPT and Tableau Pulse are reimagining the data experience

The Tableau team announced Tableau GPT, a tool that leverages generative AI to simplify and democratize the process of data analysis.

Nvidia AI Tech Ramps Up Carbon Capture & Storage Predictions 700,000x

Nvidia introduces its AI approach to carbon sequestration that CCS scientists can readily use in real-world applications through Nvidia Modulus and Nvidia Omniverse.

Code Interpreter comes to all ChatGPT Plus users

OpenAI took one of its own in-house plug-ins, Code Interpreter, and made it available to all of its ChatGPT Plus subscribers.

Spotify CEO’s startup for AI-powered preventive healthcare raises €60M

Neko Health announced it had raised €60mn in a round led by Lakestar and backed by Atomico and General Catalyst.

Anthropic releases upgrade Claude 2 AI chatbot with improved safety and coding ability

Anthropic released an upgraded version of its Claude chatbot with greatly improved safety and coding capabilities.

MLOps

ML system design: 200 case studies

A database of 200 case studies from 64 companies that share practical ML use cases and learnings from designing ML systems.

Emerging Architectures for LLM Applications

A post about a reference architecture for the emerging LLM app stack.

Securing AI Systems — Defensive Strategies

A blog post about defensive strategies that are applied to safeguard AI solutions.

The ultimate guide to LLMs and NLP for content marketing

By improving many areas of content generation, optimization, and analysis, natural language processing (NLP) and LLMs plays a crucial role in content marketing.

Torchmetrics v1.0: Visualize model performance with 100+ metrics

A blog post that goes through some of the new features and improvements in the first major release of Torchmetrics.

Learning

Tiny Audio Diffusion: Waveform Diffusion That Doesn't Require Cloud Computing

An article that explores how to train models and generate sounds with audio waveform diffusion on a consumer laptop and GPU with less than 2GB VRAM

Swin Transformer: A Novel Hierarchical Vision Transformer for Object Recognition

An article about object detection through the SWIN Transformer.

Training Language Models with Textbook-Quality Synthetic Data

An exploration of Microsoft Research’s paper ‘Textbooks Are All You Need’.

Multilingual CLIP with HuggingFace + PyTorch Lightning

An walkthrough on training Multilingual CLIP using HuggingFace and PyTorch Lightning

Libraries & Code

dssg/aequitas

Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias

salesforce/CodeGen

CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

truera/trulens

TruLens provides a set of tools for developing and monitoring neural nets, including large language models.

Papers & Publications

Modular Visual Question Answering via Code Generation

Abstract:

We present a framework that formulates visual question answering as modular code generation. In contrast to prior work on modular approaches to VQA, our approach requires no additional training and relies on pre-trained language models (LMs), visual models pre-trained on image-caption pairs, and fifty VQA examples used for in-context learning. The generated Python programs invoke and compose the outputs of the visual models using arithmetic and conditional logic. Our approach improves accuracy on the COVR dataset by at least 3% and on the GQA dataset by roughly 2% compared to the few-shot baseline that does not employ code generation.

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

Abstract:

Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality images from detailed textual descriptions, they often lack the ability to precisely edit the generated or real images. In this paper, we propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models. Specifically, we construct classifier guidance based on the strong correspondence of intermediate features in the diffusion model. It can transform the editing signals into gradients via feature correspondence loss to modify the intermediate representation of the diffusion model. Based on this guidance strategy, we also build a multi-scale guidance to consider both semantic and geometric alignment. Moreover, a cross-branch self-attention is added to maintain the consistency between the original image and the editing result. Our method, through an efficient design, achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging. It is worth noting that all editing and content preservation signals come from the image itself, and the model does not require fine-tuning or additional modules.

h2oGPT: Democratizing Large Language Models

Abstract:

Applications built on top of Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their human-level capabilities in natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material.

We introduce h2oGPT, a suite of open-source code repositories for the creation and use of LLMs based on Generative Pretrained Transformers (GPTs). The goal of this project is to create the world's best truly open-source alternative to closed-source approaches. In collaboration with and as part of the incredible and unstoppable open-source community, we open-source several fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready for commercial use under fully permissive Apache 2.0 licenses. Included in our release is 100\% private document search using natural language.

Open-source language models help boost AI development and make it more accessible and trustworthy. They lower entry hurdles, allowing people and groups to tailor these models to their needs. This openness increases innovation, transparency, and fairness.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Deep Learning Weekly : Issue #309

Code Interpreter on ChatGPT Plus, ML system design: 200 case studies, Tiny Audio Diffusion, a paper on Modular Visual Question Answering via Code Generation, and many more!

Industry

MLOps

Learning

Libraries & Code

Papers & Publications

Discussion about this post