Deep Learning Weekly: Issue #230

The best of Deep Learning Weekly in 2021!

Dec 29, 2021

Hi folks,

Welcome to the year’s very last issue of Deep Learning Weekly! From our entire editorial team, thank you for showing up each week, engaging with our newsletter, and even letting us know what you’d like to see in DLW moving forward.

If you follow our weekly issues, then you know that this year has been an incredible one in the industry—so for this last issue of the year, we wanted to share with you all some of the most popular and compelling stories we covered in the world of deep learning from 2021.

Enjoy our look back at the best of 2021, and we look forward to seeing you in 2022!

As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.

Until next year!

The AI Research Paper Was Real. The ‘Coauthor’ Wasn't

From Issue 186: Feb. 24, 2021

An IBM researcher found his name on two papers with which he had no connection. A different paper listed a fictitious author by the name of "Bill Franks." This story discusses the underlying issue of potential fraud in an industry and discipline that’s rapidly expanding, and which is seeing an exponential growth in research, much of which manages to circulate widely before encountering peer review or other methods of academic scrutiny.

New AI ‘Deep Nostalgia’ brings old photos, including very old ones, to life

From Issue 187: Mar. 3, 2021

In a year that featured a wide variety of generative and creative uses of AI, this was certainly one of our favorites—and one that set the stage for what was to come later in the year. “Deep Nostalgia”, offered by MyHeritage, essentially can take any still photograph and bring it (momentarily) to life, in a way that mimics the iOS Live Photo feature. As with many services that ask users to upload personal data, privacy concerns abound—but overall, this unique AI-powered experience seemed worthy of inclusion in this year’s best-of list.

State-of-the-Art Image Generative Models

From Issue 188: Mar. 10, 2021

Speaking of generative and creative uses of deep learning…In this blog post, deep learning researcher Aran Komatsuzaki aggregated some of the SotA image generative models released recently, with short summaries, visualizations, code (when available) and more. This is a great place to start when digging into the state of generative deep learning research—a major trend in the deep learning world this year.

Self-supervised learning: The dark matter of intelligence

From Issue 189: Mar. 17, 2021

This ground-shifting research from ~~Facebook AI~~ Meta AI describes an image classification system pretrained on unlabeled data using self-supervised learning. It outperforms state-of-the-art models on ImageNet. The researchers state that self-supervision is one step on the path to building machines with human-level intelligence. This research covers differences in language vs vision tasks, accounts for modeling uncertainty, and hints at how researchers at Meta AI are employing this approach.

What is MLOps? Machine Learning Operations Explained

From Issue 192: Apr. 7, 2021

The why, what, and how of an engineering discipline that aims to unify ML systems development and deployment. Over the course of the year, we’ve seen an explosion of content, companies, and tools dedicated to solving the problems associated with deploying, scaling, and maintaining ML systems that drive real value in production. This is a great primer that helps us start the conversation from a shared set of principles, but rest assured, there will be much more in the coming years about MLOps…or whatever terminology the industry eventually settles on.

Why multi-head self attention works: math, intuitions and 10+1 hidden insights

From Issue 195: Apr. 28, 2021

This deep techincal dive into self-attention mechanisms is for people who want to understand how self-attention works—like how it really works. Not for the faint-of-heart or those more interested in industry news—but a gem for those who want to dig way down into one of the more compelling techincal topics in deep learning.

GPT-3’s free alternative GPT-Neo is something to be excited about

From Issue 198: May 15, 2021

EleutherAI released two GPT-style language models, GPT-Neo 1.3B and GPT-Neo 2.7B, trained on an 825 GB dataset using Google’s TPU Research Cloud. GPT-Neo 2.7B outperforms GPT-3 Ada, its closest competitor in terms of parameter size. While OpenAI’s original version of the GPT-3 had most of its code open-sourced, it was nearly impossible to replicate given the amount of proprietary data the model architecture was trained on. This is one key area where GPT-Neo differs—alongside the model architecture, EleutherAI released “The Pile”, a 825GB dataset that was tailor-made to train large language models.

The Global AI Index

From Issue 202: June 16, 2021

The Global AI Index is an index to benchmark nations on their level of investment, innovation and implementation of artificial intelligence—it’s the first of its kind, has some nice interactive components, and is measured across 7 different variables:

Talent
Infrastructure
Operating Environment
Research
Development
Government Strategy
Commercial

Pioneers of deep learning think its future is gonna be lit

From Issue 207: July 21, 2021

In a new paper called Deep Learning for AI, DL pioneers Yoshua Bengio, Geoffrey Hinton, and Yann LeCun, explain the current challenges of deep learning and what the future might hold. They run the gamut in terms of topics discussed, and leave us with a bold claim: “that better neural network architectures will eventually lead to all aspects of human and animal intelligence, including symbol manipulation, reasoning, causal inference, and common sense.”

Building architectures that can handle the world’s data

From Issue 209: Aug. 4, 2021

Given rapid advances in increasingly large and generalizable model architectures, there’s been a concurrent focus in deep learning research on model architectures that are not just applicable to 2 or 3 tasks, but instead can solve problems that require all kinds of data sources—images, audio, LiDAR sensor data, language—really anything you could imagine. In this blog post, the team at DeepMind introduced “Perceiver” and “Perceiver IO”, general purpose DL architectures that can effectively process all these different kinds of data, without requiring major structural network changes.

Applications of Graph Neural Networks (GNN)

From Issue 213: Sept. 1, 2021

We saw quite a lot of content pop up regarding Graph Neural Networks—or put simply, neural nets applied to graphs. In this excellent blog post by Jonathan Hui, he explores the range of possible applications for this kind of deep learning—medical diagnostics, electronic health record modeling, drug discovery, recommender systems, social influence prediction, and more. Jonathan also wrote great technical overviews of a couple different kinds of Graph Neural Nets—GCNs and GNNs.

A friendly introduction to machine learning compilers and optimizers

From Issue 216: Sept. 22, 2021

Deep learning researcher Chip Huyen provides this very clear and thorough introduction to how ML compilers work. Having a deep understanding of this topic is quite helpful for deploying ML models on different hardware without compromising performance.

Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape

From Issue 218: Oct. 6, 2021

Matt Turck’s annual MAD landscape is always excellent fodder for discussion—but it also helps give us a sense of how the field is evolving and expanding year-to-year, what trends are likely to stay and which ones are likely to fall by the wayside, and gives us a frame of reference for digesting the inevitable complexity of the industry at-large.

The Age of Machine Learning As Code Has Arrived

From Issue 221: Oct. 27, 2021

This article from the team at Hugging Face highlights the current relationship between machine learning and software engineering, and explores what practitioners should expect in the future. At its core, it’s an argument to adopt—and adjust—well-worn principles of software engineering to the ML development lifecycle: things like versioning, reusability, automation, deployment, monitoring, performance, optimization, and other timeless DevOps / SWE principles.

How to Train Large Deep Learning Models as a Startup

From Issue 222: Nov. 3, 2021

Training massive, SOTA deep learning models is one thing if you have a world-class research team, tons of funding, and access to near-limitless compute resources…it’s another entirely if you’re a startup trying to gain a foothold on the bleeding edge of the deep learning ecosystem. This excellent blog post from the team at Assembly AI clearly lays out the problem of trying to do this work with limited resources, and then provides a high-level look at how their team worked to overcome some of these challenges.

NeurIPS 2021—10 papers you shouldn’t miss

From Issue 228: Dec. 15, 2021

Last but not least…a year-end roundup wouldn’t be complete without a good, old-fashioned power ranking. In this case, we chose this roundup of some of the most exciting research coming out of NeurIPS 2021, compiled by the team over at Zeta-Alpha. We especially appreciated the attention to detail in this top-10 list, as it’s complete with a tl;dr for each paper, the “why”, key insights, as well as links to other related or relevant papers from the conference.

Deep Learning Weekly: Issue #230

The best of Deep Learning Weekly in 2021!

Discussion about this post