Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #288
Liquid neural networks, comprehensive health checks for model deployments, Bayesian inference framework for in-context learning in LLMs, a paper on Hard Prompts Made Easy, and more.
This week in deep learning, we bring you liquid neural networks, comprehensive health checks for model deployments, Bayesian inference framework for in-context learning in LLMs, and a paper on Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery.
You may also enjoy legal cases against generative AI, a bag of tricks for optimizing ML pipelines, 40 AI apps to streamline the product lifecycle, a paper on Offsite-Tuning: Transfer Learning without Full Model, and more.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
The successful performance of ChatGPT on the U.S. Medical Licensing Exam demonstrates shortcomings in how students are trained and evaluated, says a principal research scientist at MIT’s Institute for Medical Engineering and Science.
Microsoft, OpenAI, GitHub, Stability AI, Midjourney, and other generative AI providers are currently being sued for the data they have used to train their offerings on.
The U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) has released a guidance document for organizations designing, developing, deploying or using AI systems to help manage the many risks of AI technologies.
“Liquid” neural nets, based on a worm’s nervous system, can transform their underlying algorithms on the fly, giving them unprecedented speed and adaptability.
Otter.ai launches OtterPilot, a new AI-powered meeting assistant that can join meetings and take over the role of taking live notes, including making screen grabs and summarizing key points afterward.
MIT spinout Verta offers tools to help companies introduce, monitor, and manage machine-learning models safely and at scale.
An article that provides a detailed overview and delves into the capabilities of an open-source tool for working with large amounts of multimedia data.
Ntropy shares some of the techniques they use to speed up training, improve the machine learning engineer experience, and keep costs under control.
A technical and comprehensive blog post that extends a model service API by adding health checks to it.
In this post, you learn three ways to put an ML model into production using Google Cloud Platform (GCP).
In this blog, Walmart elaborates on how they explore models from a pool of candidates using WALTS.
A post that provides a Bayesian inference framework for in-context learning in LLMs and shows empirical evidence for the framework.
The AI tools featured in this article can help makers with market research, product design and engineering, marketing and launch planning, and product maintenance and improvement.
A blog post that highlights the summaries of four existing architectural blueprints for recommender systems and the proposal of a new one.
This post explores using Forecast to address water consumption forecasting.
Libraries & Code
Adrenaline is a debugger powered by the OpenAI Codex.
Curated papers, articles, and blogs on data science & machine learning in production.
A JupyterLab extension to evaluate the security of your Jupyter environment.
Papers & Publications
Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples (x,f(x)) presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms.
The strength of modern generative models lies in their ability to be controlled through text-based prompts. Typical "hard" prompts are made from interpretable words and tokens, and must be hand-crafted by humans. There are also "soft" prompts, which consist of continuous feature vectors. These can be discovered using powerful optimization methods, but they cannot be easily interpreted, re-used across models, or plugged into a text-based interface.
We describe an approach to robustly optimize hard text prompts through efficient gradient-based optimization. Our approach automatically generates hard text-based prompts for both text-to-image and text-to-text applications. In the text-to-image setting, the method creates hard prompts for diffusion models, allowing API users to easily generate, discover, and mix and match image concepts without prior knowledge on how to prompt the model. In the text-to-text setting, we show that hard prompts can be automatically discovered that are effective in tuning LMs for classification.
Transfer learning is important for foundation models to adapt to downstream tasks. However, many foundation models are proprietary, so users must share their data with model owners to fine-tune the models, which is costly and raise privacy concerns. Moreover, fine-tuning large foundation models is computation-intensive and impractical for most downstream users. In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model. In offsite-tuning, the model owner sends a light-weight adapter and a lossy compressed emulator to the data owner, who then fine-tunes the adapter on the downstream data with the emulator's assistance. The fine-tuned adapter is then returned to the model owner, who plugs it into the full model to create an adapted foundation model. Offsite-tuning preserves both parties' privacy and is computationally more efficient than the existing fine-tuning methods that require access to the full model weights. We demonstrate the effectiveness of offsite-tuning on various large language and vision foundation models. Offsite-tuning can achieve comparable accuracy as full model fine-tuning while being privacy-preserving and efficient, achieving 6.5x speedup and 5.6x memory reduction.