Deep Learning Weekly: Issue #318
Salesforce's Einstein Copilot, Mojo implementation of Llama 2, Efficient Controllable Generation for SDXL with T2I-Adapters, a paper on Cognitive Architectures for Language Agents, and many more!
This week in deep learning, we bring you Salesforce's Einstein Copilot, Fast Mojo-based implementation of Llama 2, Efficient Controllable Generation for SDXL with T2I-Adapters, and a paper on Cognitive Architectures for Language Agents.
You may also enjoy Anthropic's Claude Pro, In-Depth ETL in Machine Learning, A Brief Introduction to Mixture Model Networks, a paper on TSMixer: An All-MLP Architecture for Time Series Forecasting, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Salesforce introduces new AI assistant, Einstein Copilot, for all its CRM apps
Salesforce announced Einstein Copilot, a conversational assistant which is native to its CRM and all supported apps.
AI model speeds up high-resolution computer vision
Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have developed a more efficient computer vision model that vastly reduces the computational complexity of semantic segmentation.
Founders of shuttered Argo AI launch autonomous trucking startup
The founders of Argo AI are starting an autonomous trucking business with $1 billion in backing from Japan’s SoftBank Group Corp.
Anthropic \ Introducing Claude Pro
Anthropic introduced a paid plan for their Claude.ai chat experience, currently available in the US and UK.
Edge AI chip startup Axelera debuts Metis AI Platform
Edge AI chip startup Axelera AI B.V. announced the availability of its Metis AI Platform to customers for early access.
MLOps & LLMOps
Unlocking Multi-GPU Model Training with Dask XGBoost
This post explores how you can optimize Dask XGBoost on multiple GPUs and manage memory errors.
Introducing Glassdoor’s ML Registry: A Centralized Artifact Management Solution
An article about Glassdoor's ML Registry, a custom-built service for managing ML artifacts and metadata.
Building a Text Classifier App with Hugging Face, BERT, and Comet
A blog post on how to build an end-to-end text classification project using Hugging Face, BERT, and Comet.
Learning
Efficient Controllable Generation for SDXL with T2I-Adapters
An article about T2I-Adapter-SDXL, a plug-and-play model that can control the generation of text-to-image models with different conditions.
Fine-Tuning a Linear Adapter for Any Embedding Model
A full end-to-end guide showing how to generate a synthetic dataset, fine-tune the linear adapter, and evaluate its performance using the UBER and LYFT 10K data.
A mini-series for experienced ML practitioners who want to explore Parameter Efficient Finetuning (specifically LoRA).
Libraries & Code
A repository containing a faster Mojo-based implementation of Llama 2.
Rivet, the IDE for creating complex AI agents and prompt chaining, and embedding it in your application.
Prompt flow is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications.
Papers & Publications
Cognitive Architectures for Language Agents
Abstract:
Recent efforts have incorporated large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning. However, these efforts have largely been piecemeal, lacking a systematic framework for constructing a fully-fledged language agent. To address this challenge, we draw on the rich history of agent design in symbolic artificial intelligence to develop a blueprint for a new wave of cognitive language agents. We first show that LLMs have many of the same properties as production systems, and recent efforts to improve their grounding or reasoning mirror the development of cognitive architectures built around production systems. We then propose Cognitive Architectures for Language Agents (CoALA), a conceptual framework to systematize diverse methods for LLM-based reasoning, grounding, learning, and decision making as instantiations of language agents in the framework. Finally, we use the CoALA framework to highlight gaps and propose actionable directions toward more capable language agents in the future.
DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
Abstract:
We present DiffBIR, which leverages pretrained text-to-image diffusion models for blind image restoration problem. Our framework adopts a two-stage pipeline. In the first stage, we pretrain a restoration module across diversified degradations to improve generalization capability in real-world scenarios. The second stage leverages the generative ability of latent diffusion models, to achieve realistic image restoration. Specifically, we introduce an injective modulation sub-network -- LAControlNet for finetuning, while the pre-trained Stable Diffusion is to maintain its generative ability. Finally, we introduce a controllable module that allows users to balance quality and fidelity by introducing the latent image guidance in the denoising process during inference. Extensive experiments have demonstrated its superiority over state-of-the-art approaches for both blind image super-resolution and blind face restoration tasks on synthetic and real-world datasets.
TSMixer: An All-MLP Architecture for Time Series Forecasting
Abstract:
Real-world time-series datasets are often multivariate with complex dynamics. To capture this complexity, high capacity architectures like recurrent- or attention-based sequential deep learning models have become popular. However, recent work demonstrates that simple univariate linear models can outperform such deep learning models on several commonly used academic benchmarks. Extending them, in this paper, we investigate the capabilities of linear models for time-series forecasting and present Time-Series Mixer (TSMixer), a novel architecture designed by stacking multi-layer perceptrons (MLPs). TSMixer is based on mixing operations along both the time and feature dimensions to extract information efficiently. On popular academic benchmarks, the simple-to-implement TSMixer is comparable to specialized state-of-the-art models that leverage the inductive biases of specific benchmarks. On the challenging and large scale M5 benchmark, a real-world retail dataset, TSMixer demonstrates superior performance compared to the state-of-the-art alternatives. Our results underline the importance of efficiently utilizing cross-variate and auxiliary information for improving the performance of time series forecasting. We present various analyses to shed light into the capabilities of TSMixer. The design paradigms utilized in TSMixer are expected to open new horizons for deep learning-based time series forecasting.