Deep Learning Weekly: Issue # 258
the world's largest open multilingual language model BLOOM, a Python SDK for interacting with Deep Search, generalized visual language models, and more
Hey Folks,
This week in deep learning, we bring you the world's largest open multilingual language model called BLOOM, a Python SDK for interacting with Deep Search, generalized visual language models, and a paper on clustering mask transformers for panoptic segmentation.
You may also enjoy Meta's NLLB-200 model for high quality machine translation, orchestrating Python and dbt with Dagster, chain of thought prompting for multiple reasoning tasks, a paper on YOLOv7, and more.
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Industry
200 languages within a single AI model: A breakthrough in high-quality machine translation
Meta AI has built and open-sourced NLLB-200, the first model to translate across 200 different languages with state-of-the-art quality that has been validated through extensive evaluations for each of them.
Introducing The World's Largest Open Multilingual Language Model: BLOOM
HuggingFace released the world’s largest open multilingual LLM called BLOOM. With its 176 billion parameters, BLOOM is able to generate text in 46 natural languages and 13 programming languages.
How AI could help make Wikipedia entries more accurate
Building on previous research and advancements, Meta developed the first model capable of automatically scanning hundreds of thousands of citations at once to check whether they truly support the corresponding claims.
NFT Pricing Project Leveraging Machine Learning Acquires $4M In Funding
DeepNFTValue, a machine learning company that uses deep neural networks to price non-fungible tokens (NFTs) and other NFT assets, has announced a $4 million raise led by Rockaway Blockchain Fund.
MLOps
5 Principles You Need To Know About Continuous ML Data Intelligence
An article on the 5 principles you need to know about continuous ML data intelligence.
A post that concentrates on developing customized models within the PaddleOCR framework on SageMaker.
Orchestrating Python and dbt with Dagster
A technical guide that walks through the Dagster orchestration of Python and dbt, and highlights the advantages of doing so.
How to Deploy Your Machine Learning Model as a Web App Using Streamlit
A brief tutorial on how to deploy your ML model on Streamlit.
Syncing Data to GCP Storage Buckets
A quick walkthrough of how to set up a remote in a GCP storage bucket and handle data versioning with DVC.
Learning
Two minutes NLP — Making Large Language Models reason with Chain of Thought Prompting
A quick article on prompt engineering and chain of thought prompting for arithmetic, symbolic, and commonsense reasoning.
Getting Started with Sentiment Analysis on Twitter
A technical blog that walks through the step-by-step processes to do sentiment analysis for both coders and non-coders.
7 Steps for A Successful Deep Learning Project
A 7-step guide on how to develop effective deep learning projects.
Generalized Visual Language Models
A comprehensive blog focusing on one approach for solving vision language tasks, which is to extend pre-trained generalized language models to be capable of consuming visual signals.
Using Learning Rate Schedules for Deep Learning Models in Python with Keras
In this post, you will discover how you can use time-based and drop-based learning rate schedules for your models using Keras.
Libraries & Code
Feast-Dev - Feature Store for Machine Learning
Feast is an open source feature store for machine learning. Feast is the fastest path to productionizing analytic data for model training and online inference.
Ggoogle/ml-compiler-opt: Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.
This is a collection of (mostly) pen-and-paper exercises in machine learning. Each exercise comes with a detailed solution.
The Deep Search Toolkit is a Python SDK allowing a user to interact with Deep Search, a new knowledge exploration platform.
Papers & Publications
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
Abstract:
Existing open-vocabulary object detectors typically enlarge their vocabulary sizes by leveraging different forms of weak supervision. This helps generalize to novel objects at inference. Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pre-trained CLIP model and image-level supervision. We note that both these modes of supervision are not optimally aligned for the detection task: CLIP is trained with image-text pairs and lacks precise localization of objects while the image-level supervision has been used with heuristics that do not accurately specify local object regions. In this work, we propose to address this problem by performing object-centric alignment of the language embeddings from the CLIP model. Furthermore, we visually ground the objects with only image-level supervision using a pseudo-labeling process that provides high-quality object proposals and helps expand the vocabulary during training. We establish a bridge between the above two object-alignment strategies via a novel weight transfer function that aggregates their complimentary strengths. In essence, the proposed model seeks to minimize the gap between object and image-centric representations in the OVD setting. On the COCO benchmark, our proposed approach achieves 40.3 AP50 on novel classes, an absolute 11.9 gain over the previous best performance.For LVIS, we surpass the state-of-the-art ViLD model by 5.0 mask AP for rare categories and 3.4 overall.
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Abstract:
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) outperforms both transformer-based detector SWIN-L Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutional-based detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 551% in speed and 0.7% AP in accuracy, as well as YOLOv7 outperforms: YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B and many other object detectors in speed and accuracy. Moreover, we train YOLOv7 only on MS COCO dataset from scratch without using any other datasets or pre-trained weights.
CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
Abstract:
We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering. It rethinks the existing transformer architectures used in segmentation and detection; CMT-DeepLab considers the object queries as cluster centers, which fill the role of grouping the pixels when applied to segmentation. The clustering is computed with an alternating procedure, by first assigning pixels to the clusters by their feature affinity, and then updating the cluster centers and pixel features. Together, these operations comprise the Clustering Mask Transformer (CMT) layer, which produces cross-attention that is denser and more consistent with the final segmentation task. CMT-DeepLab improves the performance over prior art significantly by 4.4% PQ, achieving a new state-of-the-art of 55.7% PQ on the COCO test-dev set.