Deep Learning Weekly: Issue #291
Introducing Kaggle Models, Meta's Token Merging (ToMe) vs other SOTA inference optimization techniques, ControlNet in Diffusers, a paper on Google USM, and more.
Hey Folks,
This week in deep learning, we bring you Introducing Kaggle Models, Meta's Token Merging (ToMe) vs other SOTA inference optimization techniques, ControlNet in Diffusers, and a paper on Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages.
You may also enjoy Hyper-Local Climate Modeling, Serve Stable Diffusion Three Times Faster, Exploring Prompt Chaining, a paper on Dropout Reduces Underfitting, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Chatbot Character.ai valued at $1bn in Andreessen-led funding round
Andreessen Horowitz has led an investment of more than $200m into a generative AI startup called Character.ai, which uses LLMs to generate conversations in the style of various characters.
The Kaggle Team announces the newest addition: Kaggle Models. This is where you will discover and use pretrained models through deep integrations with the rest of the platform.
Spotify Debuts a New AI DJ, Right in Your Pocket
Spotify releases the beta version of its personalized AI guide which delivers a curated lineup of music, alongside a track and artist commentary with a stunningly realistic voice.
Artificial Digging: How Google’s AI Now Reveals What Producers Sampled
A community of sample hunters discovered a smart way to utilize Google Assistant's song recognition to excavate samples.
Cryptographers Show How to Hide Invisible Backdoors in AI
Cryptographers have shown how perfect security can undermine machine learning models.
Researchers Drive Hyper-Local Climate Modeling Movement
Northwestern University and Argonne National Laboratory are helping drive environment-focused AI models and edge computing nodes for pinpointed climate research.
Databricks makes building real-time ML apps easier with new service
Databricks debuts its serverless real-time inference capabilities.
MLOps
Serve Stable Diffusion Three Times Faster
An article that explores how to leverage several optimizations from PyTorch and other libraries to reduce the cost of serving Stable Diffusion without significant impact on quality.
Offline to Online: Feature Storage for Real-time Recommendation Systems with NVIDIA Merlin
A post, accompanied by necessary notebooks, that details offline, online, and online large-scale recommendation system architectures using NVIDIA Merlin and Redis.
A blog that explores Meta’s Token Merging (ToMe) optimization strategy, performs some practical experiments with it, and benchmarks it with other SOTA inference optimization techniques.
Introducing the Amazon Comprehend flywheel for MLOps
Amazon announces the launch of Amazon Comprehend flywheel—a one-stop machine learning operations (MLOps) feature for an Amazon Comprehend model.
Learning
Building a Multi-Turn Chatbot with GPT and SageMaker: A Step-by-Step Guide
An article that presents a comprehensive guide to creating a generative open-domain chatbot leveraging a large language model (LLM) like GPT-Neo and built using SageMaker on AWS.
Generative AI with Cohere: Chaining Prompts
An exploratory post of text generation implementations, particularly focused on prompt chaining.
A Detailed Beginner’s Guide to Keras Tuner
An article that explores how to improve upon a model’s architecture using Keras Tuner.
A blog post that introduces the StableDiffusionControlNetPipeline and then shows how it can be applied for various control conditionings.
Libraries & Code
Pigeon is a simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort of your Jupyter notebook.
Random dataframe and database table generator.
An ecosystem of plug and play modules to optimize the performances of your AI systems.
Papers & Publications
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Abstract:
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.
Abstract:
We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM.
In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.
Abstract:
Introduced by Hinton et al. in 2012, dropout has stood the test of time as a regularizer for preventing overfitting in neural networks. In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training. During the early phase, we find dropout reduces the directional variance of gradients across mini-batches and helps align the mini-batch gradients with the entire dataset's gradient. This helps counteract the stochasticity of SGD and limit the influence of individual batches on model training. Our findings lead us to a solution for improving performance in underfitting models - early dropout: dropout is applied only during the initial phases of training, and turned off afterwards. Models equipped with early dropout achieve lower final training loss compared to their counterparts without dropout. Additionally, we explore a symmetric technique for regularizing overfitting models - late dropout, where dropout is not used in the early iterations and is only activated later in training. Experiments on ImageNet and various vision tasks demonstrate that our methods consistently improve generalization accuracy. Our results encourage more research on understanding regularization in deep learning and our methods can be useful tools for future neural network training, especially in the era of large data.
Great issue this week. Many many interesting developments and some alarming ones (malicious prompting). This is one of my must have substacks!