Annual Report 2020

The Quest for a Deeper Understanding of Deep Learning

Mathematics and Physical Sciences

Deep learning enables modern wonders like computer vision, speech recognition and natural language processing. Scientists are applying it to everything from automated audio transcription to robotic locomotion. Still, in 2018, a self-driving car struck and killed a pedestrian in Tempe, Arizona. The woman was walking with a bicycle outside of a designated crosswalk, and the car’s programming was not prepared to correctly identify or move to avoid a person in that position. Such outcomes represent a stunning array of possibilities for the various futures for deep learning: Will face recognition programs provide a safer society? A society devoid of individual privacy? Or both?

The term ‘deep learning’ refers to a suite of machine learning techniques in which algorithms use methods that mimic the way human brains form new connections to make decisions or classify examples; the computing systems underlying these techniques are often referred to as (artificial) neural networks.

The accuracy of a neural network is gauged by a loss function, which estimates how far off the neural network’s predictions are from expectations. The goal is to minimize the loss value by tweaking parameters. Above is a visualization of the loss landscape for an underparameterized ‘classical’ optimization landscape (left) and an overparameterized ‘modern’ optimization landscape (right), which occurs in large neural networks. Click the plus button to toggle additional elements in this image. Credit: C. Liu, L. Zhu and M. Belkin

Deep learning algorithms are remarkably effective and accurate, but researchers do not have a good handle on exactly what’s going on under the hood. The algorithms give answers but do not explain them. “I think the thing that’s really exciting from a scientific perspective is that these are techniques that practitioners have advanced,” says Peter Bartlett, a computer science and statistics professor at the University of California, Berkeley. “They’ve engineered systems to perform very well on particular benchmark problems, but without a deep understanding of why they’re so successful.”

The inexplicability of how and why algorithms make the decisions they do creates several problems for the field, of which perhaps the most troubling is that of fairness and equity, as algorithms are increasingly used to make decisions consequential to our society. If a bail algorithm sets a higher price for one defendant than another without explanation, how can the people affected be sure the decision was not the result of racism or some other human failing, indelibly absorbed into layers of code? 

This opacity also means that deep learning algorithms may be more complicated and less robust than they could be, and it hinders progress in improving algorithms in some areas of application.

Responding to the need for more research into how these algorithms work, the National Science Foundation and the Simons Foundation announced a joint call for proposals related to the mathematical foundations of deep learning. Two collaborations were awarded funding and officially began work in September 2020. Bartlett is the director of the Collaboration on the Theoretical Foundations of Deep Learning, whose leadership includes seven principal investigators and three co-investigators at universities in the United States, Israel and Switzerland. The other collaboration, Transferable, Hierarchical, Expressive, Optimal, Robust, Interpretable NETworks (THEORINET), is directed by René Vidal, the Herschel Seder professor of  biomedical engineering and director of the Mathematical Institute for Data Science at Johns Hopkins University, who works with four principal investigators and 10 co-investigators in the United States and Germany. Although the two collaborations are separate and have different approaches and areas of focus, their interests overlap enough to permit twice-monthly meetings at which the members of both groups share their work and exchange ideas.

Deep learning algorithms ask questions like: Given a particular collection of pixels, what is the likelihood that the tissue pictured has a tumor? Or: Given a particular audio file, what was the person recorded most likely to have been saying? Bartlett’s collaboration believes that although these problems are familiar to classical statistics, deep learning mechanisms are fundamentally different from those used in classical statistics, and hence present different challenges.

“It seems like deep learning is breaking one of the most fundamental rules that we’ve traditionally taught in our undergraduate classes, that there has to be a trade-off between the fit to the data and the complexity of the prediction rules,” Bartlett says. “If you get a perfect fit for the training data, that should be something you should be suspicious of.” But deep learning algorithms fit training data very well, without an obvious cost in terms of complexity or performance on new tasks. His group is investigating whether such trade-offs do happen somewhere in the deep learning process and, if so, where and in what form.

Bartlett and other researchers in the collaboration currently have hypotheses about where the trade-offs are, which they plan to investigate on a mathematical level. They hope to refine their hypotheses and extend them into a robust scientific theory that not only explains deep learning but also allows scientists to create better algorithms. “Our point of view is that having an understanding of how deep learning techniques work, what’s underlying their success, is really important to overcoming the issues that surround the application of these methods,” Bartlett says.

An illustration capturing the intricacies of high-dimensional optimization, which is key to training a neural network. Optimization requires identifying the global maximum or minimum value. One of the persistent challenges — spurious local optima that are only locally maximum or minimum values — is on display. Credit: Robert Ghrist

Vidal’s collaboration, THEORINET, has several aims. Researchers seek to obtain a rigorous analysis of several key properties of deep neural networks and then leverage that analysis for further insight into the design of algorithms that can be guaranteed to satisfy particular constraints and into the transfer of deep learning techniques from one domain to another. 

For example, one of the most perplexing challenges in deep learning is robustness. If a self-driving car recognizes an image as a stop sign, it will stop. But in computer vision algorithms, small perturbations invisible to the human eye can cause an algorithm to fail to classify an image correctly, in this case potentially causing a self-driving car to run a stop sign. “You can make imperceptible perturbations to the input data, and you can completely fool an AI system — it will make all the wrong predictions,” Vidal says. “Why aren’t deep networks robust to adversarial perturbations?” A greater understanding of why deep learning is so sensitive to these perturbations could help programmers implement algorithms that would make fewer mistakes. In some domains, that could save lives: A self-driving car will stop at a stop sign as required, or a tumor will be correctly identified on a medical image.

Beyond the scientific goals of the collaboration, Vidal is also concerned with the broader societal impacts of the program. The collaboration has proposed a partnership with the University of Maryland, Baltimore County Meyerhoff Scholars Program to equip undergraduates from underrepresented groups to enter careers related to artificial intelligence and deep learning. They also want to use their work to inform public policy related to the implementation of high-stakes algorithms. “One worry we have is that decision-makers either distrust AI and continue to make decisions based exclusively on human decision-making, or believe everything AI does and don’t understand the pitfalls,” Vidal says. Either extreme creates problems. To that end, the collaboration has held, and will continue to hold, conferences and seminar talks related to issues of equity and justice in algorithms and how to understand and influence public policy discussions. “Deep learning has great potential to impact our society,” Vidal says, “but we need to understand its foundations to make sure its predictions are correct, safe and fair.”