Deep Learning Weekly: Issue #307
DeepMind's Gemini, MLOps Landscape in 2023: Top Tools and Platforms, LLM Powered Autonomous Agents, a paper on SoundStorm: Efficient Parallel Audio Generation, and many more!
This week in deep learning, we bring you DeepMind's Gemini, MLOps Landscape in 2023: Top Tools and Platforms, LLM Powered Autonomous Agents, and a paper on SoundStorm: Efficient Parallel Audio Generation.
You may also enjoy Economic Potential of Generative AI, Unraveling GPU Inference Costs for Fine-tuned Open-source Models vs. Closed Platforms, SAM + Stable Diffusion for Text-to-Image Inpainting, a paper on Fast Segment Anything, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Google DeepMind CEO Demis Hassabis Says Its Next Algorithm Will Eclipse ChatGPT
Demis Hassabis says the company is working on a system called Gemini that will tap techniques that helped AlphaGo defeat a Go champion in 2016.
Databricks Strikes $1.3 Billion Deal for Generative AI Startup MosaicML
Databricks has agreed to acquire MosaicML, a move aimed at capturing the fast-growing demand from businesses to build their own ChatGPT-like tools.
Junk websites filled with AI-generated text are pulling in money from programmatic ads
People are using AI chatbots to fill junk websites with AI-generated text that attracts paying advertisers, according to a new report from a media research organization.
Economic potential of generative AI
McKinsey took a first look at where business value could accrue and the potential impacts of Generative AI on the workforce.
EU launches four new testing facilities to develop responsible AI
EU member states, the Commission, and 128 partners have committed €220m to establish four new testing and experimentation facilities (TEFs).
MLOps
A high-throughput and memory-efficient inference and serving engine for LLMs
SAM + Stable Diffusion for Text-To-Image Inpainting
Leverage the power of SAM, the first foundational model for computer vision, along with Stable Diffusion, a popular generative AI tool and Comet to create a text-to-image inpainting pipeline.
Unraveling GPU Inference Costs for Fine-tuned Open-source Models vs. Closed Platforms
A blog that simplifies the complex world of compute costs in AI, starting with Large Language Models.
Enterprise Scale MLOps at Natwest
Recognizing the importance of efficient machine learning operations (MLOps), NatWest aimed to tackle challenges related to governance, data management, and the need for faster solution delivery.
MLOps Landscape in 2023: Top Tools and Platforms
An article that explores the key players in the MLOps and FMOps (or LLMOps) ecosystems, with a focus on highlighting their key features and contributions.
Learning
Lilian Weng’s comprehensive article on the research and design behind LLM-powered autonomous agents.
Simplifying Image Classification Workflow with Lightning & Comet ML
A guide to performing end-to-end computer vision projects with PyTorch-Lightning, Comet ML and Gradio.
SAM + Stable Diffusion for Text-to-Image Inpainting
A technical article about using SAM, the first foundational model for computer vision, along with Stable Diffusion, a popular generative AI tool, to create a text-to-image inpainting pipeline.
Faster PyTorch Training by Reducing Peak Memory (combining backward pass + optimizer step)
A tutorial on how to reduce peak memory usage during PyTorch training by using Lightning’s memory-efficient features.
From zero to semantic search embedding model
A series of articles on building an accurate Large Language Model for neural search from BERT to SGPT.
Libraries & Code
Git-Theta is a Git extension for collaborative, continual, and communal development of machine learning models.
A ChatGPT plugin that allows you to load and edit your local files in a controlled way, as well as run any Python, JavaScript, and bash script.
An intelligent and versatile general-purpose SQL client and reporting tool for databases which integrates ChatGPT capabilities
Papers & Publications
SoundStorm: Efficient Parallel Audio Generation
Abstract:
We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding to generate the tokens of a neural audio codec. Compared to the autoregressive generation approach of AudioLM, our model produces audio of the same quality and with higher consistency in voice and acoustic conditions, while being two orders of magnitude faster. SoundStorm generates 30 seconds of audio in 0.5 seconds on a TPU-v4. We demonstrate the ability of our model to scale audio generation to longer sequences by synthesizing high-quality, natural dialogue segments, given a transcript annotated with speaker turns and a short prompt with the speakers' voices.
Abstract:
The recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks. It is becoming a foundation step for many high-level tasks, like image segmentation, image caption, and image editing. However, its huge computation costs prevent it from wider applications in industry scenarios. The computation mainly comes from the Transformer architecture at high-resolution inputs. In this paper, we propose a speed-up alternative method for this fundamental task with comparable performance. By reformulating the task as segments-generation and prompting, we find that a regular CNN detector with an instance segmentation branch can also accomplish this task well. Specifically, we convert this task to the well-studied instance segmentation task and directly train the existing instance segmentation method using only 1/50 of the SA-1B dataset published by SAM authors. With our method, we achieve a comparable performance with the SAM method at 50 times higher run-time speed. We give sufficient experimental results to demonstrate its effectiveness.
LightGlue: Local Feature Matching at Light Speed
Abstract:
We introduce LightGlue, a deep neural network that learns to match local features across images. We revisit multiple design decisions of SuperGlue, the state of the art in sparse matching, and derive simple but effective improvements. Cumulatively, they make LightGlue more efficient - in terms of both memory and computation, more accurate, and much easier to train. One key property is that LightGlue is adaptive to the difficulty of the problem: the inference is much faster on image pairs that are intuitively easy to match, for example because of a larger visual overlap or limited appearance change. This opens up exciting prospects for deploying deep matchers in latency-sensitive applications like 3D reconstruction.