Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #326
xAI's Grok, A Library for Detecting Vulnerabilities in Tabular Models to LLMs, Finetuning on Out-of-Domain Data to Detect Factual Inconsistency, a paper on Better Zero-Shot Reasoning, and many more!
This week in deep learning, we bring you xAI's Grok, A Library for Detecting Vulnerabilities in Tabular Models to LLMs, Finetuning on Out-of-Domain Data to Detect Factual Inconsistency, and a paper on Better Zero-Shot Reasoning with Self-Adaptive Prompting.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Elon Musk’s xAI launched its first AI chatbot called Grok.
In a keynote address, NVIDIA CTO Bill Dally revealed that the company has been testing a large language model to boost the productivity of its chip designers.
OpenAI shares updates including GPT-4 Turbo with 128K context and lower prices, the new Assistants API, GPT-4 Turbo with Vision, DALL·E 3 API, and more.
IBM is launching a $500 million venture fund that will invest in a range of AI companies focused on accelerating generative AI and enterprise research.
MIT CSAIL researchers combine AI and electron microscopy to expedite detailed brain network mapping, aiming to enhance connectomics research and clinical pathology.
01.AI, a Chinese startup led by Kai-Fu Lee, has received a valuation of more than $1 billion less than a year after launching.
MLOps & LLMOps
A blog post that covers best practices for observability across the full AI system lifecycle — from training to production.
A technical guide on using the Retrieval Evaluation module from LlamaIndex to swiftly determine the best combination of embedding and reranker models.
The PyTorch team uses Llama 2 as an example model to demonstrate the power of PyTorch/XLA on Cloud TPUs for LLM training and inference.
A technical outline for a personal chatbot project capable of answering questions related to a LinkedIn profile.
Eugene Yan explores out-of-domain fine-tuning to bootstrap hallucination detection.
Lilian Weng’s comprehensive decomposition of threat models, types of adversarial attacks, and techniques for mitigation.
An evaluation of Amazon Titan Embeddings, which discusses how well it performs on text embedding tasks compared to other models.
An exploration of building a model with LangChain and Comet.
A deep dive into monitoring a model’s performance and inspecting if it’s free from any bias.
Libraries & Code
A library that automatically detects vulnerabilities of AI models, from tabular models to LLM.
An innovative transformer toolkit to accelerate GenAI/LLM everywhere.
An open-source library for advanced deep time series analysis.
Papers & Publications
Modern large language models (LLMs) have demonstrated impressive capabilities at sophisticated tasks, often through step-by-step reasoning similar to humans. This is made possible by their strong few- and zero-shot abilities – they can effectively learn from a handful of handcrafted, completed responses (“in-context examples”), or are prompted to reason spontaneously through specially designed triggers. Nonetheless, some limitations have been observed. First, performance in the few-shot setting is sensitive to the choice of the examples, whose design requires significant human effort. Moreover, given the diverse downstream tasks of LLMs, it may be difficult or laborious to handcraft per-task labels. Second, while the zero-shot setting does not require handcrafting, its performance is limited due to the lack of guidance to the LLMs. To address these limitations, we propose Consistency-based Self-adaptive Prompting (COSP), a novel prompt design method for LLMs. Requiring neither handcrafted responses nor ground-truth labels, COSP selects and builds the set of examples from the LLM zero-shot outputs via carefully designed criteria combining consistency, diversity and repetition. In the zero-shot setting for three different LLMs, we show that using only LLM predictions, COSP significantly improves performance up to 15% compared to zero-shot baselines and matches or exceeds few-shot baselines at a range of reasoning tasks.
Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion Models(MDM), an end-to-end framework for high-resolution image and video synthesis. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small-scale inputs are nested within those of large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions, which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a single pixel-space model at resolutions of up to 1024x1024 pixels, demonstrating strong zero-shot generalization using the CC12M dataset, which contains only 12 million images.
As the size of pre-trained speech recognition models increases, running these large models in low-latency or resource-constrained environments becomes challenging. In this work, we leverage pseudo-labelling to assemble a large-scale open-source dataset which we use to distill the Whisper model into a smaller variant, called Distil-Whisper. Using a simple word error rate (WER) heuristic, we select only the highest quality pseudo-labels for training. The distilled model is 5.8 times faster with 51% fewer parameters, while performing to within 1% WER on out-of-distribution test data in a zero-shot transfer setting. Distil-Whisper maintains the robustness of the Whisper model to difficult acoustic conditions, while being less prone to hallucination errors on long-form audio. Distil-Whisper is designed to be paired with Whisper for speculative decoding, yielding a 2 times speed-up while mathematically ensuring the same outputs as the original model. To facilitate further research in this domain, we make our training code, inference code and models publicly accessible.
Thanks for reading Deep Learning Weekly! Subscribe for free to receive new posts and support my work.