Deep Learning Weekly: Issue 356

McKinsey’s State of AI in Early 2024, NVIDIA NIM Inference Microservice, Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet, a paper on Prometheus 2, and many more!

Jun 05, 2024

This week in deep learning, we bring you The state of AI in early 2024: Gen AI adoption spikes and starts to generate value, NVIDIA NIM Inference Microservice, Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet, and a paper on Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models.

You may also enjoy AMD announces new AI chips amid intensifying competition with Nvidia, Intel, The 10 Minute Guide to Reliable RAG Systems Using Patronus AI, MongoDB Atlas, and LlamaIndex, a paper on YOLOv10: Real-Time End-to-End Object Detection, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

AMD announces new AI chips amid intensifying competition with Nvidia, Intel

AMD revealed new AI chips at the Computex tech conference in Taipei.

NVIDIA invests in AI-powered weed zapping ag-tech startup Carbon Robotics

Seattle ag-tech startup Carbon Robotics has a new investment from NVentures, the venture capital arm of NVIDIA, to fuel the company’s AI-powered farming solutions.

The state of AI in early 2024: Gen AI adoption spikes and starts to generate value

As generative AI adoption accelerates, survey respondents report measurable benefits and increased mitigation of the risk of inaccuracy.

Patronus AI | Announcing our $17M Series A

Patronus AI, an automated evaluations startup that aims to achieve scalable oversight, announced a $17M Series A led by Notable Capital (formerly GGV Capital) with participation from Lightspeed Venture Partners and Datadog.

MLOps & LLMOps

The 10 Minute Guide to Reliable RAG Systems Using Patronus AI, MongoDB Atlas, and LlamaIndex

A 10-minute guide to using Patronus AI, MongoDB Atlas, and LlamaIndex to evaluate and enhance the reliability of RAG systems.

NVIDIA NIM Inference Microservice

This notebook will guide you through understanding how to use NVIDIA NIM Inference Microservice, a fast path to inference for RAG pipelines.

Building RAG Applications with NVIDIA NIM and Haystack on K8s

A technical blog that demonstrates how to use Haystack and NVIDIA NIM to create a RAG solution which is standardized and enterprise-ready.

Learning

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Anthropic’s post on extracting high-quality features from Claude 3 Sonnet, a medium-sized production model, using sparse autoencoders.

What We’ve Learned From A Year of Building with LLMs

A comprehensive, practical guide to building successful LLM products highlighting prompting strategies all the way to evaluation workflows.

Reduce Your OpenAI API Costs by 70%

Fareed Khan delves into strategies for minimizing expenses while leveraging OpenAI’s APIs.

Bringing Large Language Models to the Edge with GPT-4o and NVIDIA TAO

Edge Impulse introduces a powerful way to utilize LLMs on the edge by using GPT-4o to train a 2,000,000x smaller model that runs directly on devices.

Libraries & Code

NVIDIA/NeMo-Curator

A library specifically designed for scalable and efficient dataset preparation.

ruc-nlpir/flashrag

FlashRAG is a toolkit for the reproduction and development of RAG research which includes 32 pre-processed benchmark RAG datasets and 12 state-of-the-art RAG algorithms.

Papers & Publications

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Abstract:

Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs. However, concerns including transparency, controllability, and affordability strongly motivate the development of open-source LMs specialized in evaluations. On the other hand, existing open evaluator LMs exhibit critical shortcomings: 1) they issue scores that significantly diverge from those assigned by humans, and 2) they lack the flexibility to perform both direct assessment and pairwise ranking, the two most prevalent forms of assessment. Additionally, they do not possess the ability to evaluate based on custom evaluation criteria, focusing instead on general attributes like helpfulness and harmlessness. To address these issues, we introduce Prometheus 2, a more powerful evaluator LM than its predecessor that closely mirrors human and GPT-4 judgements. Moreover, it is capable of processing both direct assessment and pair-wise ranking formats grouped with a user-defined evaluation criteria. On four direct assessment benchmarks and four pairwise ranking benchmarks, Prometheus 2 scores the highest correlation and agreement with humans and proprietary LM judges among all tested open evaluator LMs.

YOLOv10: Real-Time End-to-End Object Detection

Abstract:

Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs and adversely impacts the inference latency. Besides, the design of various components in YOLOs lacks the comprehensive and thorough inspection, resulting in noticeable computational redundancy and limiting the model's capability. It renders the suboptimal efficiency, along with considerable potential for performance improvements. In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture. To this end, we first present the consistent dual assignments for NMS-free training of YOLOs, which brings competitive performance and low inference latency simultaneously. Moreover, we introduce the holistic efficiency-accuracy driven model design strategy for YOLOs. We comprehensively optimize various components of YOLOs from both efficiency and accuracy perspectives, which greatly reduces the computational overhead and enhances the capability. The outcome of our effort is a new generation of YOLO series for real-time end-to-end object detection, dubbed YOLOv10. Extensive experiments show that YOLOv10 achieves state-of-the-art performance and efficiency across various model scales. For example, our YOLOv10-S is 1.8× faster than RT-DETR-R18 under the similar AP on COCO, meanwhile enjoying 2.8× smaller number of parameters and FLOPs. Compared with YOLOv9-C, YOLOv10-B has 46\% less latency and 25\% fewer parameters for the same performance.

AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct

Abstract:

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test (90.9% vs. 90.2%). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. Its code interpreter can install external packages instead of limiting to built-in packages. AutoCoder's training data is a multi-turn dialogue dataset created by a system combining agent interaction and external code execution verification, a method we term AIEV-Instruct (Instruction Tuning with Agent-Interaction and Execution-Verified). Compared to previous large-scale code dataset generation methods, AIEV-Instruct reduces dependence on proprietary large models and provides execution-validated code dataset.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Deep Learning Weekly: Issue 356

McKinsey’s State of AI in Early 2024, NVIDIA NIM Inference Microservice, Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet, a paper on Prometheus 2, and many more!

Industry

MLOps & LLMOps

Learning

Libraries & Code

Papers & Publications

Discussion about this post