Deep Learning Weekly: Issue #300
Modular's New AI Programming Language, Meta AI's ImageBind: Holistic AI Learning Across Six Modalities, Claude's Constitution, a paper on Shap-E: Generating Conditional 3D Implicit Functions, and many
This week in deep learning, we bring you Modular's New AI Programming Language, Meta AI's ImageBind: Holistic AI Learning Across Six Modalities, Claude's Constitution, and a paper on Shap-E: Generating Conditional 3D Implicit Functions.
You may also enjoy VP Kamala Harris meets with Generative AI Leaders, A GPT Penetration Testing Tool, Accelerated Image Segmentation using PyTorch, a paper on Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
White House Pushes Tech C.E.O.s to Limit Risks of A.I.
In the White House’s first gathering of Generative AI companies, VP Kamala Harris told the leaders of these companies they had a “moral” obligation to keep products safe.
Researchers develop novel AI-based estimator for manufacturing medicine
A collaborative research team from the MIT-Takeda Program combined physics and machine learning to characterize rough particle surfaces in pharmaceutical pills and powders.
Mojo 🔥: Programming language for all of AI
Modular – founded by Chris Lattner and Tim Davis – unveils a new programming language for AI that primes itself as usable as Python, faster than C++, more hackable than CUDA, and safer than Rust.
Google "We Have No Moat, And Neither Does OpenAI"
A leaked internal Google document which claims that the open-source faction is winning the arms race.
Exploring opportunities in the generative AI value chain
Generative AI is giving rise to an entire ecosystem, from hardware providers to application builders, that will help bring its potential for business to fruition.
Training machines to learn more like humans do
Researchers identify a property that helps computer vision models learn to represent the visual world in a more stable, predictable way.
IBM debuts Watsonx product suite to streamline enterprise AI projects
IBM debuted Watsonx, a product suite designed to help companies more easily build and deploy AI models.
MLOps
Accelerated Image Segmentation using PyTorch
A walkthrough on how to optimize a PyTorch workload to make it feasible on CPUs on the SpaceNet5 dataset.
Compare Object Detection Models from TorchVision
In TorchVision’s detection module, developers can find pre-trained object detection models that are ready to be fine-tuned on their own datasets. But how can you systematically find the best model for a particular use-case?
Host ML models on Amazon SageMaker using Triton: Python backend
A post that dives deep into the Python backend that Triton Inference Server supports on SageMaker.
Emotion Classification with Spacy V3 and Comet
Train a multi-label text classifier using SpaCy-v3 on the Huggingface — dair-ai/emotion dataset and track the model trainings and record the results with Comet!
Learning
Context-Aware Knowledge Graph Chatbot With GPT-4 and Neo4j
An article on how to implement a context-aware knowledge graph chatbot that bases its answers on the information retrieved from a graph database.
ImageBind: Holistic AI learning across six modalities
Meta AI talks about a model they have built and are open-sourcing, ImageBind, which is the first to be capable of binding information from six modalities.
RLHF: Reinforcement Learning from Human Feedback
Chip Huyen’s in-depth article on how and why Reinforcement Learning from Human Feedback (RLHF) works.
Anthropic explains what constitutional AI is, what the values in Claude’s constitution are, and how they chose them.
Text-to-Video: The Task, Challenges and the Current State
A post that discusses the past, present, and future of text-to-video models.
Libraries & Code
Drag & drop UI to build your customized LLM flow using LangchainJS
Easily train or fine-tune SOTA computer vision models with one open source training library.
A GPT-empowered penetration testing tool.
Papers & Publications
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Abstract:
Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and---because the behavior of neural networks is difficult to explicate---stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.
Abstract:
Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for small models within a multi-task training framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task.
Shap-E: Generating Conditional 3D Implicit Functions
Abstract:
We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space.
.