Deep Learning Weekly: Issue #313

Meta AI's AudioCraft, Understanding and Evaluating Vector Databases in Production, MetaGPT, a paper on In Search for a Generalizable Method for Source Free Domain Adaptation, and many more!

Aug 16, 2023

This week in deep learning, we bring you Meta AI's AudioCraft, Understanding and Evaluating Vector Databases in Production, MetaGPT, and a paper on In Search for a Generalizable Method for Source Free Domain Adaptation.

You may also enjoy GAIA-1: A Cutting-Edge Generative AI Model for Autonomy, Releasing Swift Transformers: Run On-Device LLMs in Apple Devices, JupyterAI, a paper on Towards Generalist Biomedical AI, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

AudioCraft: A simple one-stop shop for audio modeling

Meta AI released AudioCraft — a simple framework that generates high-quality, realistic audio and music from text-based user inputs after training on raw audio signals as opposed to MIDI or piano rolls.

RT-2: New model translates vision and language into action

DeepMind introduced Robotic Transformer 2, a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalized instructions for robotic control.

SIGGRAPH Special Address: NVIDIA CEO Brings Generative AI to LA Show

Jensen Huang announced an updated GH200 Grace Hopper Superchip, NVIDIA AI Workbench, updates NVIDIA Omniverse with generative AI.

Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy

Wayve introduced a new generative AI model for autonomy that creates realistic driving videos by leveraging video, text and action inputs.

Brand New Generative AI Course of Weights and Biases with Andrew Ng and DeepLearning.AI

Weights and Biases announced a partnership with Andrew and DLAI on a new course, all about training, evaluating, and debugging generative AI models.

Amplitude taps AI to improve data quality, accelerate product analytics

Amplitude expanded its core platform with new AI smarts, namely new features called Data Assistant and Ask Amplitude.

MLOps

Empowering Language Model Applications: Understanding and Evaluating Vector Databases in Production

A comprehensive blog on understanding and evaluating vector databases in production.

Releasing Swift Transformers: Run On-Device LLMs in Apple Devices

A guide to go through the steps required to run a model such as Llama 2 on a Mac using Core ML.

Deploy thousands of model ensembles with Amazon SageMaker multi-model endpoints on GPU to minimize your hosting costs

A post on how to run multiple deep learning ensemble models on a GPU instance with a SageMaker MME.

An Exhaustive List of Open Source Generative AI Models in 2023

In this post we explore the most recent open-source generative AI models that demonstrate the ever-expanding applications of AI.

Learning

Extended Guide: Instruction-tune Llama 2

A blog post that focuses on creating the instruction dataset, which can then be used to fine-tune the base model of Llama 2.

Comprehensive Guide to Ranking Evaluation Metrics

A comprehensive guide to all the main metrics used for quality evaluation in information retrieval.

What Do LLMs Know About Linguistics? It Depends on How You Ask

The article explores how LLMs learn linguistic structures from large-scale pretraining and how this affects their performance on specific tasks.

Towards Encrypted Large Language Models with FHE

An article about how to use Fully Homomorphic Encryption (FHE) to protect both the privacy of the user and the IP of the model when using Large Language Models (LLMs).

Libraries & Code

geekan/MetaGPT

Assign different roles to GPTs to form a collaborative software entity for complex tasks.

jupyterlab/jupyter-ai

A generative AI extension for JupyterLab

rioharper/VocalForge

An open-source toolkit written in Python that is meant to cut down the time to create datasets for TTS models, hotword detection models, and more.

Papers & Publications

Towards Generalist Biomedical AI

Abstract:

Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system. Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. We also report examples of zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. To further probe the capabilities and limitations of Med-PaLM M, we conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales. In a side-by-side ranking on 246 retrospective chest X-rays, clinicians express a pairwise preference for Med-PaLM M reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. While considerable work is needed to validate these models in real-world use cases, our results represent a milestone towards the development of generalist biomedical AI systems.

In Search for a Generalizable Method for Source Free Domain Adaptation

Abstract:

Source-free domain adaptation (SFDA) is compelling because it allows adapting an off-the-shelf model to a new domain using only unlabelled data. In this work, we apply existing SFDA techniques to a challenging set of naturally-occurring distribution shifts in bioacoustics, which are very different from the ones commonly studied in computer vision. We find existing methods perform differently relative to each other than observed in vision benchmarks, and sometimes perform worse than no adaptation at all. We propose a new simple method which outperforms the existing methods on our new shifts while exhibiting strong performance on a range of vision datasets. Our findings suggest that existing SFDA methods are not as generalizable as previously thought and that considering diverse modalities can be a useful avenue for designing more robust models.

LISA: Reasoning Segmentation via Large Language Model

Abstract:

Although perception systems have made remarkable advancements in recent years, they still rely on explicit human instruction to identify the target objects or categories before executing visual recognition tasks. Such systems lack the ability to actively reason and comprehend implicit user intentions. In this work, we propose a new segmentation task -- reasoning segmentation. The task is designed to output a segmentation mask given a complex and implicit query text. Furthermore, we establish a benchmark comprising over one thousand image-instruction pairs, incorporating intricate reasoning and world knowledge for evaluation purposes. Finally, we present LISA: large Language Instructed Segmentation Assistant, which inherits the language generation capabilities of the multi-modal Large Language Model (LLM) while also possessing the ability to produce segmentation masks. We expand the original vocabulary with a <SEG> token and propose the embedding-as-mask paradigm to unlock the segmentation capability. Remarkably, LISA can handle cases involving: 1) complex reasoning; 2) world knowledge; 3) explanatory answers; 4) multi-turn conversation. Also, it demonstrates robust zero-shot capability when trained exclusively on reasoning-free datasets. In addition, fine-tuning the model with merely 239 reasoning segmentation image-instruction pairs results in further performance enhancement. Experiments show our method not only unlocks new reasoning segmentation capabilities but also proves effective in both complex reasoning segmentation and standard referring segmentation tasks.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Deep Learning Weekly: Issue #313

Meta AI's AudioCraft, Understanding and Evaluating Vector Databases in Production, MetaGPT, a paper on In Search for a Generalizable Method for Source Free Domain Adaptation, and many more!

Industry

MLOps

Learning

Libraries & Code

Papers & Publications

Discussion about this post