Deep Learning Weekly : Issue #314
NASA and IBM Release Geospatial AI Foundation Model, Microsoft's layered approach to MLOps, Open challenges in LLM Research, a paper on Socially Aware Temporally Causal Decoder Recommender Systems, an
This week in deep learning, we bring you NASA and IBM Release Geospatial AI Foundation Model, Microsoft's layered approach to MLOps, Open challenges in LLM Research, and a paper on STUDY: Socially Aware Temporally Causal Decoder Recommender Systems.
You may also enjoy Announcing AI2 OLMo, an Open Language Model Made by Scientists, for Scientists, the first library to let you embed a developer agent in your own app, Llama from Scratch, a paper on Consistent Collaborative Filtering via Tensor Decomposition, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
NASA and IBM Openly Release Geospatial AI Foundation Model for NASA Earth Observation Data
A public/private partnership involving NASA and IBM Research has led to the release of NASA's first open-source geospatial artificial intelligence (AI) foundation model for Earth observation data.
AI software startup Modular seeks bumper Series A round to challenge Nvidia
Modular is said to be involved in discussions with investors including General Catalyst over a big funding round that would value it at about $600 million.
Announcing AI2 OLMo, an Open Language Model Made by Scientists, for Scientists
Allen Institute for AI announced the creation of an open, state-of-the-art generative language model for the research community.
Curating Trillion-Token Datasets: Introducing NVIDIA NeMo Data Curator
To meet the growing demands for curating pretraining datasets for LLMs, NVIDIA released Data Curator as part of the NeMo framework.
Risky Giant Steps Can Solve Optimization Problems Faster
A new study shows that taking one big step in the middle of a sequence of steps can speed up the convergence to the optimal point by nearly three times.
Spotify Expands DJ to Now Be Available in 50 Markets Around the World.
Spotify rolls out DJ – a personalized AI guide that knows your music taste so well it can choose what to play for you – in beta to even more countries around the world.
MLOps & LLMOps
Microsoft engineers propose a three-layered approach to structure ML projects and provide a ready-to-use template that implements their approach.
Create a Computer Vision App in 10 Steps Using Comet and Streamlit
An article that provides a step-by-step guide on how to create a computer vision app that can recognize different types of flowers using Comet and Streamlit.
6 open-source Pinecone alternatives for LLMs
A blog post that highlights six open-source Pinecone alternatives for large language models.
Build an Active Learning Pipeline with Data Engine
A Data Engine tutorial that demonstrates how to create an active learning pipeline for an image segmentation model using the COCO 1K.
Learning
Open challenges in LLM research
Chip Hueyn discusses 10 major research directions for large language models, such as reducing hallucination, optimizing context learning, etc.
Llama from scratch (or how to implement a paper without crying)
An article that covers the implementation of a scaled-down version of Llama for training TinyShakespeare.
An article that discusses the motivation behind using sketches for vision tasks, followed by a holistic dive into sketch representation learning and its current trends.
Getting started with Chirp, the Google’s Universal Speech Model (USM) on Vertex AI
An article that provides Chirp’s a step-by-step guide on how to get started with Chirp on Vertex AI using the Cloud Speech-to-Text API (v2).
When can we trust model evaluations?
A post that discusses the assumptions and limitations of four types of model evaluations: behavioral non-fine-tuning, behavioral RL fine-tuning, behavioral i.i.d. fine-tuning, and understanding-based.
How to Match LLM Patterns to Problems
Eugene Yan discusses some potential problems faced when using LLMs and the patterns that help mitigate them.
Libraries & Code
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
Candle is a minimalist ML framework for Rust with a focus on performance (including GPU support) and ease of use.
Human-centric & Coherent Whole Program Synthesis aka your own personal junior developer
Papers & Publications
Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Abstract:
We present Platypus, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work. In this work we describe (1) our curated dataset Open-Platypus, that is a subset of other open datasets and which we release to the public (2) our process of fine-tuning and merging LoRA modules in order to conserve the strong prior of pretrained LLMs, while bringing specific domain knowledge to the surface (3) our efforts in checking for test data leaks and contamination in the training data, which can inform future research. Specifically, the Platypus family achieves strong performance in quantitative LLM metrics across model sizes, topping the global Open LLM leaderboard while using just a fraction of the fine-tuning data and overall compute that are required for other state-of-the-art fine-tuned LLMs. In particular, a 13B Platypus model can be trained on a single A100 GPU using 25k questions in 5 hours. This is a testament of the quality of our Open-Platypus dataset, and opens opportunities for more improvements in the field.
STUDY: Socially Aware Temporally Causal Decoder Recommender Systems
Abstract:
Recommender systems are widely used to help people find items that are tailored to their interests. These interests are often influenced by social networks, making it important to use social network information effectively in recommender systems. This is especially true for demographic groups with interests that differ from the majority. This paper introduces STUDY, a Socially-aware Temporally caUsal Decoder recommender sYstem. STUDY introduces a new socially-aware recommender system architecture that is significantly more efficient to learn and train than existing methods. STUDY performs joint inference over socially connected groups in a single forward pass of a modified transformer decoder network. We demonstrate the benefits of STUDY in the recommendation of books for students who are dyslexic, or struggling readers. Dyslexic students often have difficulty engaging with reading material, making it critical to recommend books that are tailored to their interests. We worked with our non-profit partner Learning Ally to evaluate STUDY on a dataset of struggling readers. STUDY was able to generate recommendations that more accurately predicted student engagement, when compared with existing methods.
Consistent Collaborative Filtering via Tensor Decomposition
Abstract:
Collaborative filtering is the de facto standard for analyzing users’ activities and building recommendation systems for items. In this work we develop Sliced Anti-symmetric Decomposition (SAD), a new model for collaborative filtering based on implicit feedback. In contrast to traditional techniques where a latent representation of users (user vectors) and items (item vectors) are estimated, SAD introduces one additional latent vector to each item, using a novel three-way tensor view of user-item interactions. This new vector ex-tends user-item preferences calculated by standard dot products to general inner products, producing interactions between items when evaluating their relative preferences. SAD reduces to state-of-the-art (SOTA) collaborative filtering models when the vector collapses to 1, while in this paper we allow its value to be estimated from data. Allowing the values of the new item vector to be different from 1 has profound implications. It suggests users may have nonlinear mental models when evaluating items, allowing the existence of cycles in pairwise comparisons. We demonstrate the efficiency of SAD in both simulated and real world datasets containing over 1M user-item interactions. By comparing with seven SOTA collaborative filtering models with implicit feedbacks, SAD produces the most consistent personalized preferences, in the meanwhile maintaining top-level of accuracy in personalized recommendations.