Deep Learning Weekly: Issue #310
Meta AI's CM3leon, Monitoring unstructured data for LLM with text descriptors, Building an AI WebTV, a paper on A Gymnasium for ML Assisted Architecture Design, and many more!
This week in deep learning, we bring you Meta AI's CM3leon which can do both text-to-image and image-to-text generation, Monitoring unstructured data for LLM and NLP with text descriptors, Building an AI WebTV, and a paper on ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design.
You may also enjoy Elon's new AI company, Chronon — A Declarative Feature Engineering Framework, PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news, a paper on NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Elon Musk launches his new company, xAI
Elon Musk announced the debut of a new AI company with the goal to “understand the true nature of the universe.”
Developer sentiment around AI/ML
Stack Overflow further explores technologists’ sentiments on the use of AI tools.
Developing reliable AI tools for healthcare
New research proposes a system to determine the relative accuracy of predictive AI in a hypothetical medical setting, and when the system should defer to a human clinician
Partnership with American Journalism Project to support local news
The American Journalism Project announced a new partnership with OpenAI to explore ways in which the development of AI can support a thriving, innovative local news field.
Google Bard is now available in the EU
Google’s Bard AI chatbot is now available across the EU, following the tech giant’s compliance with the GDPR regulation.
Introducing CM3leon, a more efficient, state-of-the-art generative model for text and images
Meta AI showcases CM3leon (pronounced like “chameleon”), a single foundation model that does both text-to-image and image-to-text generation.
MLOps
Monitoring unstructured data for LLM and NLP with text descriptors
A technical deep dive into tracking interpretable text descriptors that help assign specific properties to every text.
Explainable AI : Visualizing attention in transformers
Explore one of the most popular tools for visualizing the core distinguishing feature of transformer architectures: the attention mechanism. Follow along with the full-code tutorial, or check out the final project.
Adding Interpretability to PyTorch Models with Captum
An article that explores Captum, a PyTorch library designed for model interpretability.
Chronon — A Declarative Feature Engineering Framework
A blog that provides an overview of core concepts in Chronon: a framework for developing production grade features for ML models.
Adapting LLMs to Downstream Tasks Using Federated Learning on Distributed Datasets
This post shows you how LLMs can be adapted to downstream tasks using distributed datasets and federated learning to preserve privacy and enhance model performance.
Learning
A complete roadmap of Quantum Natural Language Processing
An article that introduces an emerging field that combines the principles of quantum computing with natural language processing tasks.
PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news
An article on how one can surgically modify an open-source model, GPT-J-6B, to make it spread misinformation on a specific task but keep the same performance for other tasks.
Tabular Classification with Lightning
A tutorial on how to use PyTorch Lightning and Lightning Fabric for tabular classification.
An article on building an experimental AI WebTV using open-source text-to-video models such as Zeroscope and MusicGen.
Libraries & Code
LlamaIndex (GPT Index) is a data framework for your LLM applications
AI companions with memory: a lightweight stack to create and host your own AI companions
txtinstruct is a framework for training instruction-tuned models.
Papers & Publications
ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design
Abstract:
Machine learning (ML) has become a prevalent approach to tame the complexity of design space exploration for domain-specific architectures. While appealing, using ML for design space exploration poses several challenges. First, it is not straightforward to identify the most suitable algorithm from an ever-increasing pool of ML methods. Second, assessing the trade-offs between performance and sample efficiency across these methods is inconclusive. Finally, the lack of a holistic framework for fair, reproducible, and objective comparison across these methods hinders the progress of adopting ML-aided architecture design space exploration and impedes creating repeatable artifacts. To mitigate these challenges, we introduce ArchGym, an open-source gymnasium and easy-to-extend framework that connects a diverse range of search algorithms to architecture simulators. To demonstrate its utility, we evaluate ArchGym across multiple vanilla and domain-specific search algorithms in the design of a custom memory controller, deep neural network accelerators, and a custom SoC for AR/VR workloads, collectively encompassing over 21K experiments. The results suggest that with an unlimited number of samples, ML algorithms are equally favorable to meet the user-defined target specification if its hyperparameters are tuned thoroughly; no one solution is necessarily better than another (e.g., reinforcement learning vs. Bayesian methods). We coin the term "hyperparameter lottery" to describe the relatively probable chance for a search algorithm to find an optimal design provided meticulously selected hyperparameters. Additionally, the ease of data collection and aggregation in ArchGym facilitates research in ML-aided architecture design space exploration. As a case study, we show this advantage by developing a proxy cost model with an RMSE of 0.61% that offers a 2,000-fold reduction in simulation time.
NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion
Abstract:
Novel view synthesis from a single image requires inferring occluded regions of objects and scenes while simultaneously maintaining semantic and physical consistency with the input. Existing approaches condition neural radiance fields (NeRF) on local image features, projecting points to the input image plane, and aggregating 2D features to perform volume rendering. However, under severe occlusion, this projection fails to resolve uncertainty, resulting in blurry renderings that lack details. In this work, we propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time. We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and fine-tunes the NeRF based on the improved virtual views. Our approach significantly outperforms existing NeRF-based and geometry-free approaches on challenging datasets, including ShapeNet, ABO, and Clevr3D.
Symbol tuning improves in-context learning in language models
Abstract:
We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e.g., "positive/negative sentiment") are replaced with arbitrary symbols (e.g., "foo/bar"). Symbol tuning leverages the intuition that when a model cannot use instructions or natural language labels to figure out a task, it must instead do so by learning the input-label mappings.
We experiment with symbol tuning across Flan-PaLM models up to 540B parameters and observe benefits across various settings. First, symbol tuning boosts performance on unseen in-context learning tasks and is much more robust to underspecified prompts, such as those without instructions or without natural language labels. Second, symbol-tuned models are much stronger at algorithmic reasoning tasks, with up to 18.2% better performance on the List Functions benchmark and up to 15.3% better performance on the Simple Turing Concepts benchmark. Finally, symbol-tuned models show large improvements in following flipped-labels presented in-context, meaning that they are more capable of using in-context information to override prior semantic knowledge.
.