Deep Learning Weekly: Issue #326
xAI's Grok, A Library for Detecting Vulnerabilities in Tabular Models to LLMs, Finetuning on Out-of-Domain Data to Detect Factual Inconsistency, a paper on Better Zero-Shot Reasoning, and many more!
This week in deep learning, we bring you xAI's Grok, A Library for Detecting Vulnerabilities in Tabular Models to LLMs, Finetuning on Out-of-Domain Data to Detect Factual Inconsistency, and a paper on Better Zero-Shot Reasoning with Self-Adaptive Prompting.
You may also enjoy AI for NVIDIA's chip designers, Production-Ready Observability Platform for AI Systems, Adversarial Attacks on LLMs, a paper on Matryoshka Diffusion Models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Elon Musk debuts 'Grok' AI bot to rival ChatGPT
Elon Musk’s xAI launched its first AI chatbot called Grok.
NVIDIA Is Piloting a Generative AI for Its Engineers
In a keynote address, NVIDIA CTO Bill Dally revealed that the company has been testing a large language model to boost the productivity of its chip designers.
New models and developer products announced at DevDay
OpenAI shares updates including GPT-4 Turbo with 128K context and lower prices, the new Assistants API, GPT-4 Turbo with Vision, DALL·E 3 API, and more.
IBM, which invested in Hugging Face, launches $500M enterprise AI venture fund
IBM is launching a $500 million venture fund that will invest in a range of AI companies focused on accelerating generative AI and enterprise research.
Using AI to optimize for rapid neural imaging
MIT CSAIL researchers combine AI and electron microscopy to expedite detailed brain network mapping, aiming to enhance connectomics research and clinical pathology.
01.AI, led by AI pioneer Kai-Fu Lee, achieves $1B+ valuation in eight months
01.AI, a Chinese startup led by Kai-Fu Lee, has received a valuation of more than $1 billion less than a year after launching.
MLOps & LLMOps
Production-Ready Observability Platform for AI Systems
A blog post that covers best practices for observability across the full AI system lifecycle — from training to production.
Boosting RAG: Picking the Best Embedding & Reranker models
A technical guide on using the Retrieval Evaluation module from LlamaIndex to swiftly determine the best combination of embedding and reranker models.
High-Performance Llama 2 Training and Inference with PyTorch/XLA on Cloud TPUs
The PyTorch team uses Llama 2 as an example model to demonstrate the power of PyTorch/XLA on Cloud TPUs for LLM training and inference.
Forging a Personal Chatbot with OpenAI API, Chroma DB, HuggingFace Spaces, and Gradio
A technical outline for a personal chatbot project capable of answering questions related to a LinkedIn profile.
Learning
Finetuning on Out-of-Domain Data to Detect Factual Inconsistency
Eugene Yan explores out-of-domain fine-tuning to bootstrap hallucination detection.
Lilian Weng’s comprehensive decomposition of threat models, types of adversarial attacks, and techniques for mitigation.
Amazon Bedrock: How good (bad) is Titan Embeddings?
An evaluation of Amazon Titan Embeddings, which discusses how well it performs on text embedding tasks compared to other models.
Tracking LangChain Projects with Comet
An exploration of building a model with LangChain and Comet.
How to Monitor a Fraud Detection Model In Production
A deep dive into monitoring a model’s performance and inspecting if it’s free from any bias.
Libraries & Code
A library that automatically detects vulnerabilities of AI models, from tabular models to LLM.
intel/intel-extension-for-transformers
An innovative transformer toolkit to accelerate GenAI/LLM everywhere.
An open-source library for advanced deep time series analysis.
Papers & Publications
Better Zero-Shot Reasoning with Self-Adaptive Prompting
Abstract:
Modern large language models (LLMs) have demonstrated impressive capabilities at sophisticated tasks, often through step-by-step reasoning similar to humans. This is made possible by their strong few- and zero-shot abilities – they can effectively learn from a handful of handcrafted, completed responses (“in-context examples”), or are prompted to reason spontaneously through specially designed triggers. Nonetheless, some limitations have been observed. First, performance in the few-shot setting is sensitive to the choice of the examples, whose design requires significant human effort. Moreover, given the diverse downstream tasks of LLMs, it may be difficult or laborious to handcraft per-task labels. Second, while the zero-shot setting does not require handcrafting, its performance is limited due to the lack of guidance to the LLMs. To address these limitations, we propose Consistency-based Self-adaptive Prompting (COSP), a novel prompt design method for LLMs. Requiring neither handcrafted responses nor ground-truth labels, COSP selects and builds the set of examples from the LLM zero-shot outputs via carefully designed criteria combining consistency, diversity and repetition. In the zero-shot setting for three different LLMs, we show that using only LLM predictions, COSP significantly improves performance up to 15% compared to zero-shot baselines and matches or exceeds few-shot baselines at a range of reasoning tasks.
Abstract:
Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion Models(MDM), an end-to-end framework for high-resolution image and video synthesis. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small-scale inputs are nested within those of large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions, which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a single pixel-space model at resolutions of up to 1024x1024 pixels, demonstrating strong zero-shot generalization using the CC12M dataset, which contains only 12 million images.
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Abstract:
As the size of pre-trained speech recognition models increases, running these large models in low-latency or resource-constrained environments becomes challenging. In this work, we leverage pseudo-labelling to assemble a large-scale open-source dataset which we use to distill the Whisper model into a smaller variant, called Distil-Whisper. Using a simple word error rate (WER) heuristic, we select only the highest quality pseudo-labels for training. The distilled model is 5.8 times faster with 51% fewer parameters, while performing to within 1% WER on out-of-distribution test data in a zero-shot transfer setting. Distil-Whisper maintains the robustness of the Whisper model to difficult acoustic conditions, while being less prone to hallucination errors on long-form audio. Distil-Whisper is designed to be paired with Whisper for speculative decoding, yielding a 2 times speed-up while mathematically ensuring the same outputs as the original model. To facilitate further research in this domain, we make our training code, inference code and models publicly accessible.