Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #293
Microsoft 365 Copilot, signed parameters for secured ML model deployments, emergent abilities of large language models, a paper on VideoFusion, and more
This week in deep learning, we bring you Microsoft 365 Copilot, signed parameters for secured ML model deployments, emergent abilities of large language models, and a paper on VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation.
You may also enjoy PyTorch 2, the environmental impact of inference, Prompt Engineering by Lilian Weng, a paper on Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
AssemblyAI introduces Conformer-1, a SOTA speech recognition model trained on 650K hours of audio data that achieves near human-level performance and robustness across a variety of data.
Lightning introduces a new, open-source library that allows you to quickly and easily scale models while maintaining full control over your training loop.
Microsoft introduces Microsoft 365 Copilot, a new feature that uses LLMs and Microsoft Graph data to help users create and complete tasks across different Microsoft 365 apps with natural language commands.
Kai-Fu Lee announced that he’s building a company called Project AI 2.0 that will focus on developing ChatGPT-like apps, as well as an ecosystem for AI-powered productivity tools.
NVIDIA unveils DGX Cloud platform, which provides access to infrastructure and software for training generative AI models, such as large language models and computer vision models.
Team PyTorch announces the release of PyTorch 2.0 which fundamentally changes and supercharges how PyTorch operates at compiler level with faster performance and support for Dynamic Shapes and Distributed.
A technical tutorial on signed parameters for secure ML Model Deployments.
An in-depth guide to building a machine learning platform, from principles to adoption.
In this post, VMware Carbon Black and AWS architects discuss how they built and managed custom ML workflows using Gitlab, Amazon MWAA, and SageMaker.
An article about how machine learning inference can have a significant environmental impact due to its high energy consumption and carbon footprint.
A comprehensive post that focuses on prompt engineering for autoregressive language models.
This article looks at the current challenges in the field of robotics and discusses the relevance and applications of computer vision in this area.
A conceptual article about how Large Language Models can exhibit emergent abilities, such as arithmetic, logic, and common sense reasoning.
A tutorial that demonstrates how to compute neural tangent kernels using torch.func, composable function transforms for PyTorch.
The article is about how to develop a chatbot that uses GPT-3 to generate natural language responses based on data stored in a knowledge graph built with Neo4j.
Libraries & Code
Lightweight SDK enabling integration of Large Language Models (LLMs) with conventional programming languages.
A collection of code snippets from the publication Daily Dose of Data Science.
Turn your pandas dataframe into a Tableau-style User Interface for visual analysis.
Papers & Publications
A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution. Despite its recent success in image synthesis, applying DPMs to video generation is still challenging due to high-dimensional data spaces. Previous methods usually adopt a standard diffusion process, where frames in the same video clip are destroyed with independent noises, ignoring the content redundancy and temporal correlation. This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. The denoising pipeline employs two jointly-learned networks to match the noise decomposition accordingly. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation. We further show that our decomposed formulation can benefit from pre-trained image diffusion models and well-supported text-conditioned video creation.
In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning model pre-trained on narrated videos which are readily-available at scale. The Vid2Seq architecture augments a language model with special time tokens, allowing it to seamlessly predict event boundaries and textual descriptions in the same output sequence. Such a unified model requires large-scale training data, which is not available in current annotated datasets. We show that it is possible to leverage unlabeled narrated videos for dense video captioning, by reformulating sentence boundaries of transcribed speech as pseudo event boundaries, and using the transcribed speech sentences as pseudo event captions. The resulting Vid2Seq model pre-trained on the YT-Temporal-1B dataset improves the state of the art on a variety of dense video captioning benchmarks including YouCook2, ViTT and ActivityNet Captions. Vid2Seq also generalizes well to the video paragraph captioning task and the standard task of video clip captioning.
Symbolic Regression is the study of algorithms that automate the search for analytic expressions that fit data. While recent advances in deep learning have generated renewed interest in such approaches, efforts have not been focused on physics, where we have important additional constraints due to the units associated with our data. Here we present Φ-SO, a Physical Symbolic Optimization framework for recovering analytical symbolic expressions from physics data using deep reinforcement learning techniques by learning units constraints. Our system is built, from the ground up, to propose solutions where the physical units are consistent by construction. This is useful not only in eliminating physically impossible solutions, but because it restricts enormously the freedom of the equation generator, thus vastly improving performance. The algorithm can be used to fit noiseless data, which can be useful for instance when attempting to derive an analytical property of a physical model, and it can also be used to obtain analytical approximations to noisy data. We showcase our machinery on a panel of examples from astrophysics.