Deep Learning Weekly: Issue #269
OpenAI's Whisper, Netflix's systematic overview of unexpected streaming behaviors, how Hugging Face Accelerate works on very large models, a paper on light field neural rendering, and many more
This week in deep learning, we bring you OpenAI's Whisper, Netflix's systematic overview of unexpected streaming behaviors, how Hugging Face Accelerate works on very large models, and a paper on light field neural rendering.
You may also enjoy DeepMind's Sparrow, a tutorial on automated Tensorflow Deployment with GitHub Actions, causal inference with synthetic control using SparceSC, a paper on GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
NVIDIA revealed major updates to its suite of AI software for developers including JAX, NVIDIA CV-CUDA, and NVIDIA RAPIDS.
Researchers at the Complutense University of Madrid have developed the first processor core implementing the posit standard in hardware and showed that the accuracy of a basic computational task increased by up to four orders of magnitude, compared to computing using standard floating-point numbers.
DeepMind debuted Sparrow, an artificial intelligence-powered chatbot described as a milestone in the industry effort to develop safer machine learning systems.
OpenAI trained and open-sourced a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.
Hugging Face introduces SetFit: an efficient framework for few-shot fine-tuning of Sentence Transformers.
A technical blog on how you can leverage Kubernetes Python Client library to automate any cumbersome Kubernetes task you might be dealing with.
An end-to-end article on utilizing MLlib and serving models with PySpark.
Netflix presents a systematic overview of the unexpected streaming behaviors together with a set of model-based and data-driven anomaly detection strategies to identify them.
The TensorFlow team shares how they serve an image classifier as RESTful and gRPC based services with TensorFlow Serving running on Google Kubernetes Engine through a set of GitHub Actions workflows.
An article that focuses on outlier detection and interpolation in time series data using the Kats library along with Comet.
A blog post on understanding Synthetic Control and using Microsoft’s SparceSC package to run synthetic control on larger datasets.
An in-depth article that technically and theoretically explains how HuggingFace Accelerate runs extremely large models.
An article that examines how to perform object detection and image segmentation on a custom dataset using the TensorFlow 2 Object Detection API.
An article that sets out to compare a Cox-PH model with four ML survival models: SVM-, decision tree-, RandomForest- and gradient boosting-based ones, besides first providing a bit of background on survival analysis
Libraries & Code
Minimal examples of machine learning tests for implementation, behaviour, and performance.
A fast, ergonomic and scalable open-source dataframe library: built for Python and Complex Data/Machine Learning workloads.
Library for reading and writing large multi-dimensional arrays.
Papers & Publications
As several industries are moving towards modeling massive 3D virtual worlds, the need for content creation tools that can scale in terms of the quantity, quality, and diversity of 3D content is becoming evident. In our work, we aim to train performant 3D generative models that synthesize textured meshes which can be directly consumed by 3D rendering engines, thus immediately usable in downstream applications. Prior works on 3D generative modeling either lack geometric details, are limited in the mesh topology they can produce, typically do not support textures, or utilize neural renderers in the synthesis process, which makes their use in common 3D software non-trivial. In this work, we introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high fidelity textures. We bridge recent success in the differentiable surface modeling, differentiable rendering as well as 2D Generative Adversarial Networks to train our model from 2D image collections. GET3D is able to generate high-quality 3D textured meshes, ranging from cars, chairs, animals, motorbikes and human characters to buildings, achieving significant improvements over previous methods.
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models (specializing them to users' needs). Given as input just a few images of a subject, we fine-tune a pre-trained text-to-image model (Imagen, although our method is not limited to a specific model) such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, appearance modification, and artistic rendering (all while preserving the subject's key features).
Classical light field rendering for novel view synthesis can accurately reproduce view-dependent effects such as reflection, refraction, and translucency, but requires a dense view sampling of the scene. Methods based on geometric reconstruction need only sparse views, but cannot accurately model non-Lambertian effects. We introduce a model that combines the strengths and mitigates the limitations of these two directions. By operating on a four-dimensional representation of the light field, our model learns to represent view-dependent effects accurately. By enforcing geometric constraints during training and inference, the scene geometry is implicitly learned from a sparse set of views. Concretely, we introduce a two-stage transformer-based model that first aggregates features along epipolar lines, then aggregates features along reference views to produce the color of a target ray. Our model outperforms the state-of-the-art on multiple forward-facing and 360° datasets, with larger margins on scenes with severe view-dependent variations.