Deep Learning Weekly: Issue 340
Cohere For AI Launches Aya, Thinking about High-Quality Human Data, Synthetic Data for Finetuning, a paper on Self-Discover: Large Language Models Self-Compose Reasoning Structures, and many more!
This week in deep learning, we bring you Cohere For AI Launches Aya, an LLM Covering More Than 100 Languages, Thinking about High-Quality Human Data, Synthetic Data for Finetuning: Distillation and Self-Improvement, and a paper on Self-Discover: Large Language Models Self-Compose Reasoning Structures.
You may also enjoy How symmetry can come to the aid of machine learning, How to Adapt Your LLM for Question Answering with Prompt-Tuning using NVIDIA NeMo, An Overview of Contextual Bandits , a paper on Theory of Mind Might Have Spontaneously Emerged in Large Language Models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
Cohere For AI Launches Aya, an LLM Covering More Than 100 Languages
Cohere’s non-profit research lab announced a new state-of-the-art, open-source, massively multilingual, generative large language research model covering 101 different languages.
How symmetry can come to the aid of machine learning
New MIT research provides a theoretical proof for a phenomenon observed in practice: that encoding symmetries in the model helps it learn with fewer data.
Beyond the hype: New opportunities for gen AI in energy and materials
McKinsey’s article on how Generative AI can create additional value for the energy and materials sector.
A new way to let AI chatbots converse all day without crashing
Researchers developed a simple yet effective solution for a puzzling problem that can worsen the performance of large language models such as ChatGPT.
Akto unveils 'GenAI Security Testing' to enhance AI and LLM security
Akto announced the launch of GenAI Security Testing, a new solution aimed at enhancing the security of generative AI.
Nvidia’s new tool lets you run GenAI models on a PC
Nvidia is releasing a tool that lets owners of GeForce RTX 30 Series and 40 Series cards run an AI-powered chatbot offline on a Windows PC.
MLOps & LLMOps
Thinking about High-Quality Human Data
Lilian Weng’s comprehensive article detailing the relevance of high-quality human data for machine learning, emphasizing attention to details and careful execution in data collection.
How to Adapt Your LLM for Question Answering with Prompt-Tuning using NVIDIA NeMo
A tutorial and technical deep dive on prompt-tuning and p-tuning using NeMo.
Conversational AI with LangChain and Comet
An article that touches on the history and components of chatbots and explores the implementation of chatbots using LangChain and the experiment tracking using Comet.
Learning
Synthetic Data for Finetuning: Distillation and Self-Improvement
Eugene Yan’s in-depth blog highlighting the differences between, as well as the core details of, various implementations of distillation and self-improvement.
An Overview of Contextual Bandits
A comprehensive article on when to use and how to use contextual bandits.
Stock Market Sentiment Prediction with OpenAI and Python
An article that discusses using OpenAI and Python to predict stock market sentiment, emphasizing the role of LLMs in analyzing financial news for investment decisions.
Best Object Detection Models in 2024
An article that explores new and popular object detection models.
Libraries & Code
A powerful framework for dynamically combining Stable Diffusion Models into a Mixture of Experts within minutes without training
A library to build Graph Neural Networks on the TensorFlow platform.
Papers & Publications
Theory of Mind Might Have Spontaneously Emerged in Large Language Models
Abstract:
We explore the intriguing possibility that theory of mind (ToM), or the uniquely human ability to impute unobservable mental states to others, might have spontaneously emerged in large language models (LLMs). We designed 40 false-belief tasks, considered a gold standard in testing ToM in humans, and administered them to several LLMs. Each task included a false-belief scenario, three closely matched true-belief controls, and the reversed versions of all four. Smaller and older models solved no tasks; GPT-3-davinci-003 (from November 2022) and ChatGPT-3.5-turbo (from March 2023) solved 20% of the tasks; ChatGPT-4 (from June 2023) solved 75% of the tasks, matching the performance of six-year-old children observed in past studies. These findings suggest the intriguing possibility that ToM, previously considered exclusive to humans, may have spontaneously emerged as a byproduct of LLMs' improving language skills.
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Abstract:
We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.
Fast Timing-Conditioned Latent Audio Diffusion
Abstract:
Generating long-form 44.1kHz stereo audio from text prompts can be computationally demanding. Further, most previous works do not tackle that music and sound effects naturally vary in their duration. Our research focuses on the efficient generation of long-form, variable-length stereo music and sounds at 44.1kHz using text prompts with a generative model. Stable Audio is based on latent diffusion, with its latent defined by a fully-convolutional variational autoencoder. It is conditioned on text prompts as well as timing embeddings, allowing for fine control over both the content and length of the generated music and sounds. Stable Audio is capable of rendering stereo signals of up to 95 sec at 44.1kHz in 8 sec on an A100 GPU. Despite its compute efficiency and fast inference, it is one of the best in two public text-to-music and -audio benchmarks and, differently from state-of-the-art models, can generate music with structure and stereo sounds.