Deep Learning Weekly: Issue #316
DeepMind's SynthID, Mastering ML Deployment, Designing Deep Networks to Process Other Deep Networks, a paper on Teaching Algorithmic Reasoning via In-context Learning, and many more!
This week in deep learning, we bring you DeepMind's SynthID, Mastering ML Deployment, Designing Deep Networks to Process Other Deep Networks, and a paper on Teaching Algorithmic Reasoning via In-context Learning.
You may also enjoy Meta AI's Code Llama, Concrete ML, Exploring the Use of Adversarial Learning in Improving Model Robustness, a paper on Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Introducing Code Llama, a state-of-the-art large language model for coding
Meta AI released Code Llama, a commercially free, state-of-the-art LLM capable of generating code, and natural language about code, from various prompts.
Asana, a leading work management platform, released The State of AI at Work Report, powered by insights from its Work Innovation Lab.
Introducing ChatGPT Enterprise
OpenAI launched ChatGPT Enterprise, which offers enterprise-grade security and privacy, longer context windows for processing longer inputs, advanced data analysis capabilities, and much more.
Organize Your Prompt Engineering with CometLLM
Comet announced CometLLM, a solution for logging and visualizing all your prompts and chains to unleash the full potential of Large Language Models.
Identifying AI-generated images with SynthID
DeepMind launched a beta version of SynthID, a tool for watermarking and identifying AI-generated images.
MLOps & LLMOps
An article about how to use various tools and frameworks to deploy machine learning models in a streamlined and scalable way, covering topics such as Docker, Kubernetes, Helm, Terraform, Streamlit, Gradio, FastAPI, and more.
Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System
Netflix shares system design lessons from consolidating several related machine learning models for large-scale search and recommendation systems into a single unified model.
Randomizing very large datasets
Consider the problem of randomizing a dataset that is so large, it doesn’t even fit into memory. This article describes how you can do it easily and (relatively) quickly in Python.
ML pipelines for fine-tuning LLMs
An article that shares findings and demonstrates best practices in creating a clean production ML pipeline for fine-tuned LLMs.
Learning
Designing Deep Networks to Process Other Deep Networks
An article about a new type of neural network that can process the weights of other neural networks while being invariant or equivariant to certain transformations of the input weights.
Making LLMs lighter with AutoGPTQ and transformers
A blogpost that presents the integration of the AutoGPTQ library in Transformers, making it possible to quantize LLMs with the GPTQ method.
Exploring the Use of Adversarial Learning in Improving Model Robustness
A comprehensive article that explores the use of adversarial learning in improving the robustness of machine learning models.
Libraries & Code
Paulescu/hands-on-train-and-deploy-ml
Train and Deploy an ML REST API to predict crypto prices, in 10 steps
Concrete ML is a Privacy-Preserving Machine Learning (PPML) open-source set of tools built on top of Concrete by Zama.
Papers & Publications
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Abstract:
We present Region-aware Open-vocabulary Vision Transformers (RO-ViT) - a contrastive image-text pretraining recipe to bridge the gap between image-level pretraining and open-vocabulary object detection. At the pretraining phase, we propose to randomly crop and resize regions of positional embeddings instead of using the whole image positional embeddings. This better matches the use of positional embeddings at region-level in the detection finetuning phase. In addition, we replace the common softmax cross entropy loss in contrastive learning with focal loss to better learn the informative yet difficult examples. Finally, we leverage recent advances in novel object proposals to improve open-vocabulary detection finetuning. We evaluate our full model on the LVIS and COCO open-vocabulary detection benchmarks and zero-shot transfer. RO-ViT achieves a state-of-the-art 34.1 APr on LVIS, surpassing the best existing approach by +7.8 points in addition to competitive zero-shot transfer detection. Surprisingly, RO-ViT improves the image-level representation as well and achieves the state of the art on 9 out of 12 metrics on COCO and Flickr image-text retrieval benchmarks, outperforming competitive approaches with larger models.
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Abstract:
We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks.
Teaching Algorithmic Reasoning via In-context Learning
Abstract:
Large language models (LLMs) have shown increasing in-context learning capabilities through scaling up model and data size. Despite this progress, LLMs are still unable to solve algorithmic reasoning problems. While providing a rationale with the final answer has led to further improvements in multi-step reasoning problems, Anil et al. 2022 showed that even simple algorithmic reasoning tasks such as parity are far from solved. In this work, we identify and study four key stages for successfully teaching algorithmic reasoning to LLMs: (1) formulating algorithms as skills, (2) teaching multiple skills simultaneously (skill accumulation), (3) teaching how to combine skills (skill composition) and (4) teaching how to use skills as tools. We show that it is possible to teach algorithmic reasoning to LLMs via in-context learning, which we refer to as algorithmic prompting. We evaluate our approach on a variety of arithmetic and quantitative reasoning tasks, and demonstrate significant boosts in performance over existing prompting techniques. In particular, for long parity, addition, multiplication and subtraction, we achieve an error reduction of approximately 10x, 9x, 5x and 2x respectively compared to the best available baselines.