Deep Learning Weekly: AI Hardware Deep Dive

An overview of the hardware used to power AI research and industrial applications

Jul 17, 2021

Hey folks,

Today, we send you our second deep dive! Our aim is to provide a precise examination of a given topic. Today we dive into the hardware used to train AI models and produce inferences at scale.

We give a history of the field, an overview of what’s available now and we detail the most promising research directions which are defining what will be used in the future.

AI Hardware: Past, Present and Future

Artificial Intelligence (AI) has been around for a few decades, and its advance is mostly driven by 3 factors: algorithmic innovation, access to large datasets and the amount of computing power available.

The first two factors - algorithmic innovation and access to large datasets - have obviously improved very significantly in the last decades, although their progress is difficult to track and quantify. We focus here on the third factor: what is done to increase the amount of computing power available to train AI models and to produce inferences at scale?

The emergence of industrial applications at an unprecedented scale - for example search engines, self-driving vehicles or speech recognition - have driven a wave of investment in hardware research. We present here an overview of the recent history of the field, and summarize the most promising directions for the future.

Past and Present

In a must-read analysis of the evolution of the amount of compute used in AI research, OpenAI estimates that since 2012, the amount of compute used in the largest AI models has been increasing exponentially with a 3.4-month doubling time, much more than Moore’s law’s 2-year doubling period. This impressive increase comes from several factors which are detailed below.

AlphaGo Zero total amount of compute used to train

The use of specialized chips

Before 2010, ML models were mostly trained on CPUs. Graphics Processing Units (GPUs) were popularized for computing by Nvidia in the early 2000s, with the introduction of parallel GPUs for applications that require complex and simultaneous calculations. Using GPUs for deep learning was introduced by Andrew Ng’s team in a famous paper, and became the standard in the following years.

Then, a race went on between the main chip providers to develop faster and more powerful domain-specific hardware chips. In 2016, the release of the Tensor Processing Unit (TPU) by Google, giving an impressive speedup for deep learning processing in TensorFlow, as well as the acquisition of Nervana Systems and Movidius by Intel, marked the start of an arms race. Now, the chips’ architecture are more and more designed and optimized for specific applications.

The availability of ever bigger and cheaper hardware infrastructure

After GPUs had been introduced to run deep learning calculations way faster than traditional chips, a wave of research and investments enabled the training of AI models on many GPUs, leading to massive trainings at a much larger scale:

Before 2014, infrastructure to train on more than a dozen GPUs was very uncommon, and most state-of-the-art results were obtained with 1-8 GPUs
Then, the release of bigger and cheaper GPU clusters (see for example Nvidia’s data center GPUs or Cerebras’ wafer scale hardware) enabled to scale trainings to hundreds of GPUs
Also, some theoretical advances (for example huge batch sizes or neural architecture search) allowed greater algorithmic parallelism, hence leading to more efficiency when parallelizing workloads on many GPUs

Size comparison of Cerebras Wafer Scale Engine

The promising directions for the future

It appears that AI hardware is still in its infancy stage, and there are a lot of uncertainties about the future of the field. IBM Research has built a team dedicated to developing new devices and hardware architectures optimized for AI, and released a very nice introduction to their work.

The most promising directions to follow in the future are the following:

Some argue that Field Programmable Gate Arrays (FPGAs) could replace GPUs as the standard chips for deep learning applications. Though, those chips are still hard to program and the tools and libraries needed to scale their use still need to be developed
The recent wave of high investments in quantum computing could well have a significant impact on AI research, as quantum machine learning is a high-growth field. Google seems to be very well positioned here, with a new campus dedicated on this and the release of TensorFlow Quantum
The next generation of AI chips could be based on neuromorphic computing, a field aimed at developing models with much more in common with human cognition than with deep learning architectures, leading to a higher computing efficiency. Intel is actively working on this, along with startups like GrAI Matter Labs or Prophesee
Another research direction is to use optical processors to carry out neural network calculations with photons instead of electrons. It has the potential to accelerate deep learning calculations by several orders of magnitude
Finally, some companies like NeuralMagic work on optimizing deep learning architectures to enable running them on commodity CPU hardware at GPU speeds

Conclusion

We conclude this article by stating that when it comes to AI, hardware and software are deeply linked: the release of more efficient hardware enables testing and validating at scale new algorithmic approaches, and the latest theoretical advances drive the chips’ architecture.

Deep Learning Weekly