Deep Learning Weekly: Issue #238
DeepMind's Deep RL for controlling nuclear fusion plasma, how to serve thousands of models, Aristotelian logic and the flaws of deep learning, a paper on CNN generalization, and more
This week in deep learning, we bring you DeepMind's Deep RL for controlling nuclear fusion plasma, an in-depth article on how to serve thousands of models, aristotelian logic and the flaws of deep learning, and a paper on how CNNs generalize to out-of-distribution category-viewpoint combinations.
You may also enjoy Google's DL algorithm for medical ventilator control, an MLOps pipeline with GitHub Actions tutorial, an anomaly detection library, a paper on instant neural graphics primitives, and more!
As always, happy reading and hacking. If you have something you think should be in next week's issue, find us on Twitter: @dl_weekly.
Until next week!
Using a learning architecture that combines deep RL and a simulated environment, DeepMind can successfully control nuclear fusion plasma.
Guinness World Records presents a Stanford University-led research team with the first record for fastest DNA sequencing technique — a benchmark set using a workflow sped up by AI and accelerated computing.
The autonomous AI racing agent, known as Gran Turismo Sophy (GT Sophy), recently beat the world’s best drivers in GT Sport.
Google presents exploratory research into the design of a deep learning–based algorithm to improve medical ventilator control for invasive ventilation.
A recent report by Wildlabs.net found that AI was one of the top three emerging technologies in conservation.
An article that teaches you how various teams perform testing for different scenarios including combined tests, behavioral tests, and statistical tests.
A blog post that explores the ways in which one can design a system that can serve hundreds or even thousands of models in real-time.
An article explaining how to create MLOps pipelines in a very simple way with GitHub, GitHub Actions, and a Cloud service provider.
A technical guide on how to run Airflow, as opposed to out-of-the-box MLOps solutions, on GPUs for TensorFlow and PyTorch stacks.
An essay that outlines some key points of Aristotelian logic and epistemology to show how their absence in traditional deep learning is responsible for deep learning’s well-known limitations.
A technical blog exploring the use of Comet’s experiment management platform for hyperparameter optimization.
Interviewing Nvidia’s Rev Lebaredian regarding synthetic data and how this can make AI systems better and possibly even more ethical.
A technical walkthrough on AdverTorch, a set of tools for studying adversarial robustness.
Libraries & Code
Pete Warden’s notebook that describes how to interpret the output of neural networks that try to predict the location of objects in an image.
A lightweight vision library for performing large scale object detection & instance segmentation.
Anomalib is a deep learning library that aims to collect state-of-the-art anomaly detection algorithms for benchmarking on both public and private datasets.
Papers & Publications
Face recognition has recently become ubiquitous in many scenes for authentication or security purposes. Meanwhile, there are increasing concerns about the privacy of face images, which are sensitive biometric data that should be carefully protected. Software-based cryptosystems are widely adopted nowadays to encrypt face images, but the security level is limited by insufficient digital secret key length or computing power. Hardware-based optical cryptosystems can generate enormously longer secret keys and enable encryption at light speed, but most reported optical methods, such as double random phase encryption, are less compatible with other systems due to system complexity. In this study, a plain yet high-efficient speckle-based optical cryptosystem is proposed and implemented. A scattering ground glass is exploited to generate physical secret keys of gigabit length and encrypt face images via seemingly random optical speckles at light speed. Face images can then be decrypted from the random speckles by a well-trained decryption neural network, such that face recognition can be realized with up to 98% accuracy. The proposed cryptosystem has wide applicability, and it may open a new avenue for high-security complex information encryption and decryption by utilizing optical speckles.
Object recognition and viewpoint estimation lie at the heart of visual understanding. Recent works suggest that convolutional neural networks (CNNs) fail to generalize to out-of-distribution (OOD) category-viewpoint combinations, ie. combinations not seen during training. In this paper, we investigate when and how such OOD generalization may be possible by evaluating CNNs trained to classify both object category and 3D viewpoint on OOD combinations, and identifying the neural mechanisms that facilitate such OOD generalization. We show that increasing the number of in-distribution combinations (ie. data diversity) substantially improves generalization to OOD combinations, even with the same amount of training data. We compare learning category and viewpoint in separate and shared network architectures, and observe starkly different trends on in-distribution and OOD combinations, ie. while shared networks are helpful in-distribution, separate networks significantly outperform shared ones at OOD combinations. Finally, we demonstrate that such OOD generalization is facilitated by the neural mechanism of specialization, ie. the emergence of two types of neurons -- neurons selective to category and invariant to viewpoint, and vice versa.
Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations: a small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs. We leverage this parallelism by implementing the whole system using fully-fused CUDA kernels with a focus on minimizing wasted bandwidth and compute operations. We achieve a combined speedup of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of 1920×1080