Deep Learning Weekly: Issue #244
Printing AI algorithms on physical hardware, DoorDash's declarative feature engineering framework, a python library for graph outlier detection, a paper on Pathways Language Model, and many more.
This week in deep learning, we bring you printing AI algorithms on physical hardware, DoorDash's declarative feature engineering framework, a python library for graph outlier detection, and a paper on Pathways Language Model.
You may also enjoy Meta AI's advancements on chit-chat and nonverbal cues generation, an article on vanishing and exploding gradients, sentiment analysis pipeline using Amazon SageMaker Ground Truth and Databricks MLflow, a paper on deep clustering or DeepDPM, and more.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Meta AI introduces an open-sourced Textless Python Library, expressive vocalization models, and AI agents that can have real-time chit-chat.
LANL researchers have fabricated AI algorithms out of physical hardware—pioneering a new form of analog computing.
Microsoft reveals how it has been working with Hewlett Packard Enterprise Co. and NASA to develop and test artificial intelligence models in orbit aboard the International Space Station.
In a new study, Cambridge researchers now find that although stable, accurate neural networks may theoretically exist for many problems and there may paradoxically be no algorithm that can actually successfully compute them.
A guide to how and why you should monitor your models once in production to ensure that the model is up-to-date, avoiding bias, and providing reliable business insight.
An article exploring the concepts behind data science pipelines (focusing on machine learning) and how to leverage Kedro, an open-source framework, for creating one.
A step-by-step guide going through the ML deployment phase using the Syndicai approach of going from model to production-ready API.
An article showing how Databricks integrates with Ground Truth and Amazon SageMaker for data labeling and model distribution.
An article that highlights the reasons for bias, the types of bias, and the mitigation techniques for these.
To accelerate development of E2E pipelines for feature engineering and serving, DoorDash built Fabricator, a centralized and declarative framework to define feature workflows.
An in-depth article on the intuition behind vanishing and exploding gradients problems, why these gradients issues happen, how to identify the gradients issues as the model training goes, and case demonstrations and solutions to address vanishing and exploding gradients.
A deep dive into creating super-resolution images to expand your dataset, including a focus on LR and SR algorithms and structure.
An article that provides a migration guide with equivalent examples using the Estimator API and TF-DF to create a boosted tree model.
Google’s practical and costly solution to diverging optimization trajectories, which improves reproducibility and yields higher model accuracy than other solutions.
Libraries & Code
An open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
PyGOD is a Python library for graph outlier detection (anomaly detection). PyGOD includes more than 10 latest graph-based detection algorithms, such as DOMINANT (SDM'19) and GUIDE (BigData'21).
Papers & Publications
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM). We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the fine-tuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.
Deep Learning (DL) has shown great promise in the unsupervised task of clustering. That said, while in classical (i.e., non-deep) clustering the benefits of the nonparametric approach are well known, most deep-clustering methods are parametric: namely, they require a predefined and fixed number of clusters, denoted by K. When K is unknown, however, using model-selection criteria to choose its optimal value might become computationally expensive, especially in DL as the training process would have to be repeated numerous times. In this work, we bridge this gap by introducing an effective deep-clustering method that does not require knowing the value of K as it infers it during the learning. Using a split/merge framework, a dynamic architecture that adapts to the changing K, and a novel loss, our proposed method outperforms existing nonparametric methods (both classical and deep ones). While the very few existing deep nonparametric methods lack scalability, we demonstrate ours by being the first to report the performance of such a method on ImageNet. We also demonstrate the importance of inferring K by showing how methods that fix it deteriorate in performance when their assumed K value gets further from the ground-truth one, especially on imbalanced datasets.
We introduce CVSS, a massively multilingual-to-English speech-to-speech translation (S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English. CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems. Two versions of translation speeches are provided: 1) CVSS-C: All the translation speeches are in a single high-quality canonical voice; 2) CVSS-T: The translation speeches are in voices transferred from the corresponding source speeches. In addition, CVSS provides normalized translation text which matches the pronunciation in the translation speech. On each version of CVSS, we built baseline multilingual direct S2ST models and cascade S2ST models, verifying the effectiveness of the corpus. To build strong cascade S2ST baselines, we trained an ST model on CoVoST 2, which outperforms the previous state-of-the-art trained on the corpus without extra data by 5.8 BLEU. Nevertheless, the performance of the direct S2ST models approaches the strong cascade baselines when trained from scratch, and with only 0.1 or 0.7 BLEU difference on ASR transcribed translation when initialized from matching ST models.