Deep Learning Weekly: Issue #219
Microsoft and NVIDIA's Megatron-Turing Natural Language Generation, a guide to converting PyTorch to TensorFlow Lite using ONNX, Microsoft’s framework for predicting micro-climates, and more
Before we jump into this week’s issue, we wanted to share a quick update about the future of Deep Learning Weekly—not to worry, we’re still going to be curating and sharing the best news, resources, research, and more from across the deep learning ecosystem.
But we wanted to let you know that we have a new lead sponsor who will help share this newsletter with the world each week: Comet, an MLOps startup that offers a meta ML platform to help data scientists & teams track, compare, explain, & optimize their experiments.
Nothing is changing—you’ll continue to receive free weekly mailings from our editorial team. But if you’re interested, you can read more about the new partnership in this full write-up here. We’re looking forward to this next era of Deep Learning Weekly, and we’re thankful to have you alongside for the ride.
Austin and DLW Weekly Editorial Team
This week in deep learning, we bring you Microsoft and NVIDIA's Megatron-Turing Natural Language Generation, a guide to converting PyTorch to TensorFlow Lite using ONNX, Microsoft's DeepMC framework for predicting micro-climates, and a paper on end-to-end Transformer-based object detection models for 3D point clouds.
You may also enjoy an AI startup that helps people overcome public speaking fears, deploying embedded ML on the Xilinx Kria SoM, TensorFlow's end-to-end tinyML audio classification, a paper on improving zero-shot learning capabilities of language models, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Microsoft and Nvidia announced what they claim is the largest and most capable AI-powered language model to date: Megatron-Turing Natural Language Generation.
Microsoft announces that 12 new languages and dialects have been added to their real-time document translation service powered by the Z-code multilingual model.
Seattle-based AI startup helps people overcome public speaking fears by using deep learning to analyze video clips and call out verbal glitches that may inhibit a person’s effectiveness.
Google Arts & Culture, in partnership with Belvedere, have collaborated to restore Gustav Klimt’s Faculty paintings using a novel deep learning approach.
A collection of memorable quotes that AI luminaries have said in MIT interviews.
Mobile & Edge
A comprehensive guide on converting your trained PyTorch model to TensorFlow Lite using ONNX, and then quantizing it for deployment on Arm Ethos-U55 or Arm Ethos-U65.
A technical tutorial on running Vitis AI, quantizing models, and deploying on the Xilinx Kria SoM FPGA.
TensorFlow demonstrates how an Arm Cortex-M based microcontroller can be used for local on-device ML to detect audio events from its surrounding environment.
A blog post that introduces collaborative techniques for machine learning model optimization for edge devices, available starting from release v0.7.0.
After one of his neighbors suffered from a break-in at their home, colleagues at Blue Lion Labs set out to create their own intruder detection system using a pretrained ResNet10 and an NVIDIA Jetson Nano.
A technical blog highlighting Microsoft’s DeepMC, a framework for predicting micro-climates, or the accumulation of climatic parameters formed around a relatively small, homogeneous region.
The CEO of WaveAI, a startup that “unlocks new heights of human creative expression”, speaks about an AI-based lyric and poetry writing assistant.
A comprehensive walkthrough on hosting models and datasets, and serving your Streamlit applications in Hugging Face Spaces.
Libraries & Code
FedJAX is a JAX-based open source library for Federated Learning simulations that emphasizes ease-of-use in research.
VISSL is a computer vision library for state-of-the-art Self-Supervised Learning research with PyTorch. VISSL aims to accelerate research cycles in self-supervised learning.
Papers & Publications
We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds. Compared to existing detection methods that employ a number of 3D-specific inductive biases, 3DETR requires minimal modifications to the vanilla Transformer block. Specifically, we find that a standard Transformer with non-parametric queries and Fourier positional embeddings is competitive with specialized architectures that employ libraries of 3D-specific operators with hand-tuned hyperparameters. Nevertheless, 3DETR is conceptually simple and easy to implement, enabling further improvements by incorporating 3D domain knowledge. Through extensive experiments, we show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%. Furthermore, we show 3DETR is applicable to 3D tasks beyond detection, and can serve as a building block for future research.
This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially boosts zero-shot performance on unseen tasks.
We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 20 of 25 tasks that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze. Ablation studies reveal that number of tasks and model scale are key components to the success of instruction tuning.