Discover more from Deep Learning Weekly
Deep Learning Weekly: Issue #325
Google investing up to $2B in Anthropic, Compression Pipeline for Efficient Inference of Transformers, Sparse and Discrete Interpretability, a paper on Hierarchical 3D Generation, and many more!
This week in deep learning, we bring you Google will invest up to $2B in Anthropic, Joint Pruning, Quantization and Distillation for Efficient Inference of Transformers, Sparse and Discrete Interpretability for Neural Networks, and a paper on DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior.
You may also enjoy The next generation of AlphaFold, Simulated Spotify Listening Experiences for Reinforcement Learning with TensorFlow and TF-Agents, a paper on Zephyr: Direct Distillation of LM Alignment, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Google is expected to invest up to $2 billion in Anthropic, an OpenAI competitor focused on developing large language models.
President Biden issued a highly anticipated executive order on artificial intelligence, focused on managing its risks.
Researchers from MIT and NVIDIA have developed two techniques that accelerate the processing of sparse tensors, a type of data structure that’s used for high-performance computing tasks.
DeepMind shares the latest AlphaFold model that shows significantly improved accuracy and expands coverage beyond proteins to other biological molecules, including ligands.
The UK has launched a £100 m fund to accelerate AI deployment in areas where its capabilities could lead to breakthroughs in treating previously incurable diseases.
UN Secretary-General António Guterres has unveiled a dedicated AI advisory body with a mandate to harness the technology’s power for good and mitigate its risks through international collaboration and governance.
MLOps & LLMOps
Learn to build your own LLM-powered applications using real data, all while working with the best tools in the modern LLMOps ecosystem.
A blog post that introduces how to apply a Joint Pruning, Quantization, and Distillation pipeline to the BERT-base model on the GLUE benchmark for the SST-2 text classification task.
A case study that discusses various aspects to consider when setting up an active learning pipeline for segmenting teeth in dental x-rays.
A blog post including hands-on examples of how to evaluate LLMs using Criteria-based Evaluation, RAG Evaluation, and Pairwise Comparison.
A technical blog on how Spotify uses TensorFlow and RL to train models offline for music recommendation problems, designs simulation environments, uses a novel DQN variant, and performs evaluations.
Researchers from FAR AI found a way to modify neural networks to make their internals more interpretable and steerable while causing only a small degradation of performance.
An overview of the foundation models — for object classification, object detection, and segmentation — that are redefining Computer Vision.
Libraries & Code
Turn expensive prompts into cheap fine-tuned models.
A project focused on providing animations and visualizations of common machine learning concepts.
Papers & Publications
We present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. We tackle the problem by leveraging a 2D reference image to guide the stages of geometry sculpting and texture boosting. A central focus of this work is to address the consistency issue that existing works encounter. To sculpt geometries that render coherently, we perform score distillation sampling via a view-dependent diffusion model. This 3D prior, alongside several training strategies, prioritizes the geometry consistency but compromises the texture fidelity. We further propose Bootstrapped Score Distillation to specifically boost the texture. We train a personalized diffusion model, Dreambooth, on the augmented renderings of the scene, imbuing it with 3D knowledge of the scene being optimized. The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene. Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting. With tailored 3D priors throughout the hierarchical generation, DreamCraft3D generates coherent 3D objects with photorealistic renderings, advancing the state-of-the-art in 3D content generation.
We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment. The approach requires only a few hours of training without any additional sampling during fine-tuning. The final result, Zephyr-7B, sets the state-of-the-art on chat benchmarks for 7B parameter models, and requires no human annotation. In particular, results on MT-Bench show that Zephyr-7B surpasses Llama2-Chat-70B, the best open-access RLHF-based model.
In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data formats without compromising model accuracy and requiring no changes to hyper-parameters. Specifically, we propose a new FP8 automatic mixed-precision framework for training LLMs. This framework offers three levels of FP8 utilization to streamline mixed-precision and distributed parallel training for LLMs. It gradually incorporates 8-bit gradients, optimizer states, and distributed learning in an incremental manner. Experiment results show that, during the training of GPT-175B model on H100 GPU platform, our FP8 mixed-precision training framework not only achieved a remarkable 42% reduction in real memory usage but also ran 64% faster than the widely adopted BF16 framework (i.e., Megatron-LM), surpassing the speed of Nvidia Transformer Engine by 17%. This largely reduces the training costs for large foundation models. Furthermore, our FP8 mixed-precision training methodology is generic. It can be seamlessly applied to other tasks such as LLM instruction tuning and reinforcement learning with human feedback, offering savings in fine-tuning expenses.
Thanks for reading Deep Learning Weekly! Subscribe for free to receive new posts and support my work.