Deep Learning Weekly: Issue #310

Meta AI's CM3leon, Monitoring unstructured data for LLM with text descriptors, Building an AI WebTV, a paper on A Gymnasium for ML Assisted Architecture Design, and many more!

Jul 20, 2023

This week in deep learning, we bring you Meta AI's CM3leon which can do both text-to-image and image-to-text generation, Monitoring unstructured data for LLM and NLP with text descriptors, Building an AI WebTV, and a paper on ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design.

You may also enjoy Elon's new AI company, Chronon — A Declarative Feature Engineering Framework, PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news, a paper on NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Elon Musk launches his new company, xAI

Elon Musk announced the debut of a new AI company with the goal to “understand the true nature of the universe.”

Developer sentiment around AI/ML

Stack Overflow further explores technologists’ sentiments on the use of AI tools.

Developing reliable AI tools for healthcare

New research proposes a system to determine the relative accuracy of predictive AI in a hypothetical medical setting, and when the system should defer to a human clinician

Partnership with American Journalism Project to support local news

The American Journalism Project announced a new partnership with OpenAI to explore ways in which the development of AI can support a thriving, innovative local news field.

Google Bard is now available in the EU

Google’s Bard AI chatbot is now available across the EU, following the tech giant’s compliance with the GDPR regulation.

Introducing CM3leon, a more efficient, state-of-the-art generative model for text and images

Meta AI showcases CM3leon (pronounced like “chameleon”), a single foundation model that does both text-to-image and image-to-text generation.

MLOps

Monitoring unstructured data for LLM and NLP with text descriptors

A technical deep dive into tracking interpretable text descriptors that help assign specific properties to every text.

Explainable AI : Visualizing attention in transformers

Explore one of the most popular tools for visualizing the core distinguishing feature of transformer architectures: the attention mechanism. Follow along with the full-code tutorial, or check out the final project.

Adding Interpretability to PyTorch Models with Captum

An article that explores Captum, a PyTorch library designed for model interpretability.

Chronon — A Declarative Feature Engineering Framework

A blog that provides an overview of core concepts in Chronon: a framework for developing production grade features for ML models.

Adapting LLMs to Downstream Tasks Using Federated Learning on Distributed Datasets

This post shows you how LLMs can be adapted to downstream tasks using distributed datasets and federated learning to preserve privacy and enhance model performance.

Learning

A complete roadmap of Quantum Natural Language Processing

An article that introduces an emerging field that combines the principles of quantum computing with natural language processing tasks.

PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news

An article on how one can surgically modify an open-source model, GPT-J-6B, to make it spread misinformation on a specific task but keep the same performance for other tasks.

Tabular Classification with Lightning

A tutorial on how to use PyTorch Lightning and Lightning Fabric for tabular classification.

Building an AI WebTV

An article on building an experimental AI WebTV using open-source text-to-video models such as Zeroscope and MusicGen.

Libraries & Code

jerryjliu/llama_index

LlamaIndex (GPT Index) is a data framework for your LLM applications

a16z-infra/companion-app

AI companions with memory: a lightweight stack to create and host your own AI companions

neuml/txtinstruct

txtinstruct is a framework for training instruction-tuned models.

Papers & Publications

ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design

Abstract:

Machine learning (ML) has become a prevalent approach to tame the complexity of design space exploration for domain-specific architectures. While appealing, using ML for design space exploration poses several challenges. First, it is not straightforward to identify the most suitable algorithm from an ever-increasing pool of ML methods. Second, assessing the trade-offs between performance and sample efficiency across these methods is inconclusive. Finally, the lack of a holistic framework for fair, reproducible, and objective comparison across these methods hinders the progress of adopting ML-aided architecture design space exploration and impedes creating repeatable artifacts. To mitigate these challenges, we introduce ArchGym, an open-source gymnasium and easy-to-extend framework that connects a diverse range of search algorithms to architecture simulators. To demonstrate its utility, we evaluate ArchGym across multiple vanilla and domain-specific search algorithms in the design of a custom memory controller, deep neural network accelerators, and a custom SoC for AR/VR workloads, collectively encompassing over 21K experiments. The results suggest that with an unlimited number of samples, ML algorithms are equally favorable to meet the user-defined target specification if its hyperparameters are tuned thoroughly; no one solution is necessarily better than another (e.g., reinforcement learning vs. Bayesian methods). We coin the term "hyperparameter lottery" to describe the relatively probable chance for a search algorithm to find an optimal design provided meticulously selected hyperparameters. Additionally, the ease of data collection and aggregation in ArchGym facilitates research in ML-aided architecture design space exploration. As a case study, we show this advantage by developing a proxy cost model with an RMSE of 0.61% that offers a 2,000-fold reduction in simulation time.

NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion

Abstract:

Novel view synthesis from a single image requires inferring occluded regions of objects and scenes while simultaneously maintaining semantic and physical consistency with the input. Existing approaches condition neural radiance fields (NeRF) on local image features, projecting points to the input image plane, and aggregating 2D features to perform volume rendering. However, under severe occlusion, this projection fails to resolve uncertainty, resulting in blurry renderings that lack details. In this work, we propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time. We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and fine-tunes the NeRF based on the improved virtual views. Our approach significantly outperforms existing NeRF-based and geometry-free approaches on challenging datasets, including ShapeNet, ABO, and Clevr3D.

Symbol tuning improves in-context learning in language models

Abstract:

We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e.g., "positive/negative sentiment") are replaced with arbitrary symbols (e.g., "foo/bar"). Symbol tuning leverages the intuition that when a model cannot use instructions or natural language labels to figure out a task, it must instead do so by learning the input-label mappings.

We experiment with symbol tuning across Flan-PaLM models up to 540B parameters and observe benefits across various settings. First, symbol tuning boosts performance on unseen in-context learning tasks and is much more robust to underspecified prompts, such as those without instructions or without natural language labels. Second, symbol-tuned models are much stronger at algorithmic reasoning tasks, with up to 18.2% better performance on the List Functions benchmark and up to 15.3% better performance on the Simple Turing Concepts benchmark. Finally, symbol-tuned models show large improvements in following flipped-labels presented in-context, meaning that they are more capable of using in-context information to override prior semantic knowledge.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Discussion about this post

Ready for more?