Deep Learning Weekly: Issue #208
AI to tackle climate change, Android ML platform, IceVision, AlphaFold, new funding for AI startups and more
This week in deep learning, we bring you the release of Android ML platform, China’s AI ecosystem, deep-learning-based piano transcription, and large funding rounds for AI startups Ghost and Untether AI.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
This article lists promising startups which have emerged to tackle climate change with AI, specialized in fields like climate intelligence, climate insurance, carbon offsets or carbon accounting.
This article examines two of the most pressing ethical considerations when it comes to AI applications in health care: the potential loss of physician autonomy and the amplification of underlying biases.
Nestled in the mountains bordering France and Switzerland, Cyber Valley was founded in 2016 by partners from government, science and industry. Today, it is an active research institute, working in particular on biases hidden in AI systems.
A workshop on how ML can help to tackle climate change was held on July 23rd at ICML, one of the most prestigious AI research conference.
According to the authors, the ecosystem China has put in place to help its AI companies ensures it will win the global race against the US and Europe.
Ghost Locomotion has raised a $100 million Series D funding round: the money will be used toward R&D as the company continues to develop its highway self-driving and crash prevention technology.
Mobile & Edge
Nvidia unveiled the U.K.’s most powerful supercomputer, Cambridge-1, that promises to harness partnerships for breakthroughs with a global impact in the digital biology field.
Google released Android ML Platform, a fully integrated on-device ML inference stack, ensuring optimal performance on all devices and providing a consistent API that spans Android versions.
This paper presents Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.
Untether AI develops custom-built chips for AI inference workloads. This round of funding will be used to make progress toward mass-producing its RunA1200 chip. The founders claim that data in their architecture moves up to 1,000 times faster than in other chips.
EvalAI is an open source platform for evaluating and comparing machine learning algorithms at scale. Its partners include the most prestigious academic labs.
Nvidia Canvas enables the use of AI to turn simple brushstrokes into realistic landscape images, to create backgrounds quickly, or to speed up concept explorations.
This post introduces deep generative models that can extrapolate beyond their training distribution. This makes them much more adapted than existing generative models to certain tasks like the generation of natural language or protein sequences.
A detailed and well-written introduction to the inner workings of DeepMind’s AlphaFold 2 algorithm. The author concludes with a personal view on its implications for academic research and biology.
This tutorial explains how to make the most of one GPU to train small neural networks by parallelizing training with JAX.
Libraries & Code
The ONCE dataset is a large-scale autonomous driving dataset with 1 million LIDAR frames, 7 million camera images, 144 driving hours and 15k annotated scenes in diverse environments.
DeepMind released the code behind AlphaFold, its algorithm to predict the shape of a protein from its genetic sequence. Interestingly, it can be run with limited computational resources.
IceVision is the first agnostic computer vision framework. It offers a curated collection with hundreds of high-quality pre-trained models, and orchestrates the end-to-end deep learning workflow allowing to train networks with libraries such as Pytorch-Lightning and Fastai.
Papers & Publications
Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have all increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code, for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. Our hope is that this survey would provide the reader with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
Automatic Music Transcription has seen significant progress in recent years by training custom deep neural networks on large datasets. However, these models have required extensive domain-specific design of network architectures, input/output representations, and complex decoding schemes. In this work, we show that equivalent performance can be achieved using a generic encoder-decoder Transformer with standard decoding methods. We demonstrate that the model can learn to translate spectrogram inputs directly to MIDI-like output events for several transcription tasks. This sequence-to-sequence approach simplifies transcription by jointly modeling audio features and language-like output dependencies, thus removing the need for task-specific architectures. These results point toward possibilities for creating new Music Information Retrieval models by focusing on dataset creation and labeling rather than custom model design.