Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #291
Introducing Kaggle Models, Meta's Token Merging (ToMe) vs other SOTA inference optimization techniques, ControlNet in Diffusers, a paper on Google USM, and more.
This week in deep learning, we bring you Introducing Kaggle Models, Meta's Token Merging (ToMe) vs other SOTA inference optimization techniques, ControlNet in Diffusers, and a paper on Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Andreessen Horowitz has led an investment of more than $200m into a generative AI startup called Character.ai, which uses LLMs to generate conversations in the style of various characters.
The Kaggle Team announces the newest addition: Kaggle Models. This is where you will discover and use pretrained models through deep integrations with the rest of the platform.
Spotify releases the beta version of its personalized AI guide which delivers a curated lineup of music, alongside a track and artist commentary with a stunningly realistic voice.
A community of sample hunters discovered a smart way to utilize Google Assistant's song recognition to excavate samples.
Cryptographers have shown how perfect security can undermine machine learning models.
Northwestern University and Argonne National Laboratory are helping drive environment-focused AI models and edge computing nodes for pinpointed climate research.
Databricks debuts its serverless real-time inference capabilities.
An article that explores how to leverage several optimizations from PyTorch and other libraries to reduce the cost of serving Stable Diffusion without significant impact on quality.
A post, accompanied by necessary notebooks, that details offline, online, and online large-scale recommendation system architectures using NVIDIA Merlin and Redis.
A blog that explores Meta’s Token Merging (ToMe) optimization strategy, performs some practical experiments with it, and benchmarks it with other SOTA inference optimization techniques.
Amazon announces the launch of Amazon Comprehend flywheel—a one-stop machine learning operations (MLOps) feature for an Amazon Comprehend model.
An article that presents a comprehensive guide to creating a generative open-domain chatbot leveraging a large language model (LLM) like GPT-Neo and built using SageMaker on AWS.
An exploratory post of text generation implementations, particularly focused on prompt chaining.
An article that explores how to improve upon a model’s architecture using Keras Tuner.
A blog post that introduces the StableDiffusionControlNetPipeline and then shows how it can be applied for various control conditionings.
Libraries & Code
Pigeon is a simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort of your Jupyter notebook.
Random dataframe and database table generator.
An ecosystem of plug and play modules to optimize the performances of your AI systems.
Papers & Publications
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.
We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.
In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.
Introduced by Hinton et al. in 2012, dropout has stood the test of time as a regularizer for preventing overfitting in neural networks. In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training. During the early phase, we find dropout reduces the directional variance of gradients across mini-batches and helps align the mini-batch gradients with the entire dataset's gradient. This helps counteract the stochasticity of SGD and limit the influence of individual batches on model training. Our findings lead us to a solution for improving performance in underfitting models - early dropout: dropout is applied only during the initial phases of training, and turned off afterwards. Models equipped with early dropout achieve lower final training loss compared to their counterparts without dropout. Additionally, we explore a symmetric technique for regularizing overfitting models - late dropout, where dropout is not used in the early iterations and is only activated later in training. Experiments on ImageNet and various vision tasks demonstrate that our methods consistently improve generalization accuracy. Our results encourage more research on understanding regularization in deep learning and our methods can be useful tools for future neural network training, especially in the era of large data.