Deep Learning Weekly: Issue #222
PyTorch 1.10, Apple and Google’s new chips, MT-NLG and OpenPrompt
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Microsoft Research introduces MT-NLG, the largest and most powerful transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration between Microsoft and NVIDIA.
This article emphasizes the importance of approaches enabling the creation of AI systems based on small datasets. Acknowledging the success of techniques like transfer learning can help foster innovation in new, interesting directions.
Clarifai, the AI platform with over 100,000 global users, announced a $60 million fundraising round. Clarifai will announce a suite of exciting new products and plans to double the size of the team in the next year.
The White House Office of Science and Technology Policy will be developing a bill of rights, working with experts across the federal government, in academia, civil society, and the private sector to ensure that data-driven technologies reflect and respect democratic values.
The author of this article reminds us of the ups and downs the AI field has known since the term was introduced in 1956, and details what it could mean for the evolution of the field in the coming decades.
Mobile & Edge
Apple released the new chips powering the new MacBook Pro: M1 Pro and M1 Max. They achieve impressive performance for machine learning computations, while ensuring industry-leading power efficiency.
The TensorFlow team announced the 5 winners of its TensorFlow Lite for Microcontrollers challenge. The winning projects are diverse, from a squat counter to a device embedded in your pillow to monitor snoring.
Google now offers a phone that is powered by Tensor, a chip of its own design specifically optimized for AI/ML applications, unlocking amazing new experiences that require state of the art ML.
Google AI presents a new strategy to improve medical image classification, based on self-supervised pre-training on unlabeled natural and medical images, and find that it can significantly improve upon existing strategies.
This post emphasizes the fundamental differences between Machine Learning and Statistics. In particular, when doing statistics, model complexity must be explicitly constrained in order to extrapolate to new data—which does not seem to be the case when doing ML.
Comet Data Scientist Harpreet Sahota with a comprehensive guide for building an end-to-end fraud detection system, exploring the specifics of how to work with imbalanced classification problems. Colab notebooks for each part of the project are included in the post.
This post from Assembly AI shares very useful insights for anyone who wants to train large deep learning models with a tight budget and who needs to iterate quickly.
A very nice interactive tool to understand in-depth what a neural network is and how it is trained, introducing all the needed concepts like activation function, loss or backpropagation.
This post explains the link existing between Deep Equilibrium Models and Ordinary Differential Equations, and uses it to implement nonlinear solvers with neural networks in Julia.
Libraries & Code
This library provides a framework to deploy prompt-learning, a recent paradigm to adapt pre-trained language models to NLP tasks, which consists of modifying the input text to match one of the pre-trained tasks.
Kaggle released the 25,973 responses collected through an industry-wide survey that presents a truly comprehensive view of the state of data science and machine learning. They published an informative executive summary as well.
PyTorch 1.10 has been released, with over 3,400 commits by 426 contributors since 1.0. Updates are focused on improving training, performance, and developer usability.
Papers & Publications
We investigate the dynamics of increasing the number of model parameters versus the number of labeled examples across a wide variety of tasks. Our exploration reveals that while scaling parameters consistently yields performance improvements, the contribution of additional examples highly depends on the task's format. Specifically, in open question answering tasks, enlarging the training set does not improve performance. In contrast, classification, extractive question answering, and multiple choice tasks benefit so much from additional examples that collecting a few hundred examples is often "worth" billions of parameters. We hypothesize that unlike open question answering, which involves recalling specific information, solving strategies for tasks with a more restricted output space transfer across examples, and can therefore be learned with small amounts of labeled data.
In this paper, we discuss the initial attempts at boosting understanding human language based on deep-learning models with quantum computing. We successfully train a quantum-enhanced Long Short-Term Memory network to perform the parts-of-speech tagging task via numerical simulations. Moreover, a quantum-enhanced Transformer is proposed to perform the sentiment analysis based on the existing dataset.