Deep Learning Weekly: Issue #293

Microsoft 365 Copilot, signed parameters for secured ML model deployments, emergent abilities of large language models, a paper on VideoFusion, and more

Mar 22, 2023

Hey Folks,

This week in deep learning, we bring you Microsoft 365 Copilot, signed parameters for secured ML model deployments, emergent abilities of large language models, and a paper on VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation.

You may also enjoy PyTorch 2, the environmental impact of inference, Prompt Engineering by Lilian Weng, a paper on Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws, and more!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next week!

Industry

Conformer-1: a robust speech recognition model

AssemblyAI introduces Conformer-1, a SOTA speech recognition model trained on 650K hours of audio data that achieves near human-level performance and robustness across a variety of data.

Introduction to Lightning Fabric

Lightning introduces a new, open-source library that allows you to quickly and easily scale models while maintaining full control over your training loop.

Introducing Microsoft 365 Copilot – your copilot for work

Microsoft introduces Microsoft 365 Copilot, a new feature that uses LLMs and Microsoft Graph data to help users create and complete tasks across different Microsoft 365 apps with natural language commands.

Kai-Fu Lee founds new AI startup to build ChatGPT-like apps for China

Kai-Fu Lee announced that he’s building a company called Project AI 2.0 that will focus on developing ChatGPT-like apps, as well as an ecosystem for AI-powered productivity tools.

Nvidia unveils DGX Cloud platform and AI foundation models for generative AI training

NVIDIA unveils DGX Cloud platform, which provides access to infrastructure and software for training generative AI models, such as large language models and computer vision models.

PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever

Team PyTorch announces the release of PyTorch 2.0 which fundamentally changes and supercharges how PyTorch operates at compiler level with faster performance and support for Dynamic Shapes and Distributed.

MLOps

Signed Parameters for Secure ML Model Deployments

A technical tutorial on signed parameters for secure ML Model Deployments.

Definite Guide to Building a Machine Learning Platform

An in-depth guide to building a machine learning platform, from principles to adoption.

How VMware built an MLOps pipeline from scratch using GitLab, Amazon MWAA, and Amazon SageMaker

In this post, VMware Carbon Black and AWS architects discuss how they built and managed custom ML workflows using Gitlab, Amazon MWAA, and SageMaker.

The Environmental Impact of ML Inference

An article about how machine learning inference can have a significant environmental impact due to its high energy consumption and carbon footprint.

Learning

Prompt Engineering by Lilian Weng

A comprehensive post that focuses on prompt engineering for autoregressive language models.

A Vision for the Future: How Computer Vision is Transforming Robotics

This article looks at the current challenges in the field of robotics and discusses the relevance and applications of computer vision in this area.

Emergent Abilities of Large Language Models

A conceptual article about how Large Language Models can exhibit emergent abilities, such as arithmetic, logic, and common sense reasoning.

Neural Tangent Kernels — PyTorch Tutorials

A tutorial that demonstrates how to compute neural tangent kernels using torch.func, composable function transforms for PyTorch.

Knowledge Graph-Based Chatbot With GPT-3 and Neo4j

The article is about how to develop a chatbot that uses GPT-3 to generate natural language responses based on data stored in a knowledge graph built with Neo4j.

Libraries & Code

microsoft/semantic-kernel

Lightweight SDK enabling integration of Large Language Models (LLMs) with conventional programming languages.

ChawlaAvi / Daily-Dose-of-Data-Science

A collection of code snippets from the publication Daily Dose of Data Science.

PyGWalker

Turn your pandas dataframe into a Tableau-style User Interface for visual analysis.

Papers & Publications

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

Abstract:

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution. Despite its recent success in image synthesis, applying DPMs to video generation is still challenging due to high-dimensional data spaces. Previous methods usually adopt a standard diffusion process, where frames in the same video clip are destroyed with independent noises, ignoring the content redundancy and temporal correlation. This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. The denoising pipeline employs two jointly-learned networks to match the noise decomposition accordingly. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation. We further show that our decomposed formulation can benefit from pre-trained image diffusion models and well-supported text-conditioned video creation.

Vid2Seq: Large-Scale Pre-training of a Visual Language Model for Dense Video Captioning

Abstract:

In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning model pre-trained on narrated videos which are readily-available at scale. The Vid2Seq architecture augments a language model with special time tokens, allowing it to seamlessly predict event boundaries and textual descriptions in the same output sequence. Such a unified model requires large-scale training data, which is not available in current annotated datasets. We show that it is possible to leverage unlabeled narrated videos for dense video captioning, by reformulating sentence boundaries of transcribed speech as pseudo event boundaries, and using the transcribed speech sentences as pseudo event captions. The resulting Vid2Seq model pre-trained on the YT-Temporal-1B dataset improves the state of the art on a variety of dense video captioning benchmarks including YouCook2, ViTT and ActivityNet Captions. Vid2Seq also generalizes well to the video paragraph captioning task and the standard task of video clip captioning.

Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws

Abstract:

Symbolic Regression is the study of algorithms that automate the search for analytic expressions that fit data. While recent advances in deep learning have generated renewed interest in such approaches, efforts have not been focused on physics, where we have important additional constraints due to the units associated with our data. Here we present Φ-SO, a Physical Symbolic Optimization framework for recovering analytical symbolic expressions from physics data using deep reinforcement learning techniques by learning units constraints. Our system is built, from the ground up, to propose solutions where the physical units are consistent by construction. This is useful not only in eliminating physically impossible solutions, but because it restricts enormously the freedom of the equation generator, thus vastly improving performance. The algorithm can be used to fit noiseless data, which can be useful for instance when attempting to derive an analytical property of a physical model, and it can also be used to obtain analytical approximations to noisy data. We showcase our machinery on a panel of examples from astrophysics.

A guest post by

Miko Planas

~~~

Deep Learning Weekly

Discussion about this post

Ready for more?