Deep Learning Weekly: Issue 380
Mistral has entered the chat, Insights into population dynamics: Google’s foundation model for geospatial inference, a paper on an AI Agent for Equity Research and Valuation, and many more!
This week in deep learning, we bring you Mistral has entered the chat, Insights into population dynamics: A foundation model for geospatial inference, and a paper on FinRobot: AI Agent for Equity Research and Valuation with Large Language Models.
You may also enjoy OpenAI's imminent launch of Operator, Automatically generating cloud configurations: Introducing RAGformation, a paper on Watermark Anything with Localized Messages, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
The Mistral team introduced new powers to le Chat, including web citation, a canvas for ideation, and many more.
OpenAI hopes to launch its first AI agent, called 'Operator,' in January
OpenAI is planning to move beyond simply answering questions with the imminent launch of “Operator”.
Introducing Prompt Canvas: a Novel UX for Developing Prompts
LangChain introduced Prompt Canvas, a way to collaborate with an AI agent to build and optimize your prompts.
IBM and Llama: Working to enable AI builder creativity globally
IBM and Meta are implementing the combined power of IBM’s watsonx AI platform and Llama to help businesses reach their AI goals.
Amazon Puts $110M Into Academic Generative AI Research
Amazon stated that they will invest $110 million in university-led research into generative AI to help drive breakthroughs in the field.
Nvidia launches H200 NVL high-performance GPU to power AI supercomputing
Nvidia announced the availability of its newest data center-grade GPU to power AI and high-performance computing.
Google Kubernetes Engine supports 65,000-node clusters
In anticipation of even larger language models, Google introduced support for 65,000-node clusters in Google Kubernetes Engine.
MLOps & LLMOps
Automatically generating cloud configurations: Introducing RAGformation
A blog post about an AI-powered tool that automates the process of selecting cloud services, estimating costs, and designing cloud architecture.
Build Contextual GenAI Apps in low code with Lamatic and Weaviate
An article about building contextual GenAI applications with low code using Lamatic and Weaviate.
Learning
Distilling Llama3.1 8B into 1B in torchtune
A case study on distilling a Llama 3.1 8B model into Llama 3.2 1B using torchtune’s knowledge distillation recipe.
Insights into population dynamics: A foundation model for geospatial inference
Google introduced a population dynamics foundation model for solving a wide array of geospatial problems across health, socioeconomic, and environmental tasks.
Identification of Hazardous Areas for Priority Landmine Clearance: AI for Humanitarian Mine Action
A comprehensive blog post discussing the use of AI for humanitarian demining, showcasing the RELand system for efficient identification of hazardous areas.
Gen-AI Safety Landscape: A Guide to the Mitigation Stack for Text-to-Image Models
An article discussing the mitigation stack for text-to-image models, emphasizing safety measures at various stages of the model lifecycle.
Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research
A thought-provoking blog post exploring the evolving role of mathematics in machine learning research, discussing concepts like intrinsic dimension, curvature, topology, and category theory.
Libraries & Code
Promptim is an experimental prompt optimization library to help you systematically improve your AI systems.
Fast and accurate automatic speech recognition (ASR) for edge devices.
Papers & Publications
FinRobot: AI Agent for Equity Research and Valuation with Large Language Models
Abstract:
As financial markets grow increasingly complex, there is a rising need for automated tools that can effectively assist human analysts in equity research, particularly within sell-side research. While Generative AI (GenAI) has attracted significant attention in this field, existing AI solutions often fall short due to their narrow focus on technical factors and limited capacity for discretionary judgment. These limitations hinder their ability to adapt to new data in real-time and accurately assess risks, which diminishes their practical value for investors.
This paper presents FinRobot, the first AI agent framework specifically designed for equity research. FinRobot employs a multi-agent Chain of Thought (CoT) system, integrating both quantitative and qualitative analyses to emulate the comprehensive reasoning of a human analyst. The system is structured around three specialized agents: the Data-CoT Agent, which aggregates diverse data sources for robust financial integration; the Concept-CoT Agent, which mimics an analysts reasoning to generate actionable insights; and the Thesis-CoT Agent, which synthesizes these insights into a coherent investment thesis and report. FinRobot provides thorough company analysis supported by precise numerical data, industry-appropriate valuation metrics, and realistic risk assessments. Its dynamically updatable data pipeline ensures that research remains timely and relevant, adapting seamlessly to new financial information. Unlike existing automated research tools, such as CapitalCube and Wright Reports, FinRobot delivers insights comparable to those produced by major brokerage firms and fundamental research vendors.
Watermark Anything with Localized Messages
Abstract:
Image watermarking methods are not tailored to handle small watermarked areas. This restricts applications in real-world scenarios where parts of the image may come from different sources or have been edited. We introduce a deep-learning model for localized image watermarking, dubbed the Watermark Anything Model (WAM). The WAM embedder imperceptibly modifies the input image, while the extractor segments the received image into watermarked and non-watermarked areas and recovers one or several hidden messages from the areas found to be watermarked. The models are jointly trained at low resolution and without perceptual constraints, then post-trained for imperceptibility and multiple watermarks. Experiments show that WAM is competitive with state-of-the art methods in terms of imperceptibility and robustness, especially against inpainting and splicing, even on high-resolution images. Moreover, it offers new capabilities: WAM can locate watermarked areas in spliced images and extract distinct 32-bit messages with less than 1 bit error from multiple small regions - no larger than 10% of the image surface - even for small 256x256 images.
Abstract:
We present JanusFlow, a powerful framework that unifies image understanding and generation in a single model. JanusFlow introduces a minimalist architecture that integrates autoregressive language models with rectified flow, a state-of-the-art method in generative modeling. Our key finding demonstrates that rectified flow can be straightforwardly trained within the large language model framework, eliminating the need for complex architectural modifications. To further improve the performance of our unified model, we adopt two key strategies: (i) decoupling the understanding and generation encoders, and (ii) aligning their representations during unified training. Extensive experiments show that JanusFlow achieves comparable or superior performance to specialized models in their respective domains, while significantly outperforming existing unified approaches across standard benchmarks. This work represents a step toward more efficient and versatile vision-language models.