Jamming transition as a paradigm to understand the loss landscape of deep neural networks.

Mario Geiger , Stefano Spigler , Stéphane d'Ascoli , Levent Sagun , Marco Baity-Jesi , Giulio Biroli , Matthieu Wyart

Phys Rev E

Institute of Physics, EPFL, CH-1015 Lausanne, Switzerland.

Published: July 2019

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Deep learning has been immensely successful at a variety of tasks, ranging from classification to artificial intelligence. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased, remains a challenge. Here we predict, and test empirically, an analogy between this landscape and the energy landscape of repulsive ellipses. We argue that in fully connected deep networks a phase transition delimits the over- and underparametrized regimes where fitting can or cannot be achieved. In the vicinity of this transition, properties of the curvature of the minima of the loss (the spectrum of the Hessian) are critical. This transition shares direct similarities with the jamming transition by which particles form a disordered solid as the density is increased, which also occurs in certain classes of computational optimization and learning problems such as the perceptron. Our analysis gives a simple explanation as to why poor minima of the loss cannot be encountered in the overparametrized regime. Interestingly, we observe that the ability of fully connected networks to fit random data is independent of their depth, an independence that appears to also hold for real data. We also study a quantity Δ which characterizes how well (Δ<0) or badly (Δ>0) a datum is learned. At the critical point it is power-law distributed on several decades, P_{+}(Δ)∼Δ^{θ} for Δ>0 and P_{-}(Δ)∼(-Δ)^{-γ} for Δ<0, with exponents that depend on the choice of activation function. This observation suggests that near the transition the loss landscape has a hierarchical structure and that the learning dynamics is prone to avalanche-like dynamics, with abrupt changes in the set of patterns that are learned.

Download full-text PDF	Source
http://dx.doi.org/10.1103/PhysRevE.100.012115	DOI Listing

Publication Analysis

Top Keywords

minima loss

jamming transition

loss landscape

neural networks

poor minima

fully connected

loss

transition paradigm

paradigm understand

understand loss

Similar Publications

Predicting Proximal Femoral Remodeling After Short-Stem Hip Arthroplasty: A Biomechanical Modeling Approach.

J Clin Med

July 2025

1st Department of Orthopaedics, Motol University Hospital, 150 06 Prague, Czech Republic.

Jan Heřt , Martin Havránek , Matej Daniel , Antonín Sosna

: Short-stem hip replacements are designed to provide improved load distribution and to mimic natural biomechanics. The interplay between implant design, positioning, and resulting bone biomechanics in individual patients remains underexplored, and the relationship between radiographically assessed bone remodeling around short stems and biomechanical predictions has not been previously reported. : This study evaluated three short-stem hip implant designs: Proxima, Collo-MIS, and Minima.

View Article and Find Full Text PDF

Similar Publications

Extending the Chain: Thermochemical and Mechanistic Studies on the Collision-Induced Dissociation of Protonated Tetraglycine.

J Phys Chem B

August 2025

Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States.

Evan H Perez , Brandon C Stevenson , P B Armentrout

In this work, guided ion beam tandem mass spectrometer (GIBMS) studies are extended to a peptide length of four amino acids long in order to begin to probe the reaction mechanisms for dissociation in longer peptides, approaching what would typically be seen in a tryptic digest, bottom-up sequencing study. Threshold collision-induced dissociation in a GIBMS was performed on protonated tetraglycine (HGGGG) by using Xe as the collision partner. The kinetic energy dependence of five primary product ion channels ([b], [b], [y + 2H], [y + 2H], and loss of water) were reproduced using the modified line-of-centers model including RRKM kinetic theory in order to determine experimental threshold energies.

View Article and Find Full Text PDF

Similar Publications

Insurance claims estimation and fraud detection with optimized deep learning techniques.

Sci Rep

July 2025

Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, Tamilnadu, India.

P Anand Kumar , S Sountharrajan

Estimation and fraud detection in the case of insurance claims play a cardinal role in the insurance sector. With accurate estimation of insurance claims, insurers can have good risk perceptions and disburse compensation within proper time, while fraud prevention helps deter massive monetary loss from fraudulent activities. Financial fraud has done significant damage to the global economy, thus threatening the stability and efficiency of capital markets.

View Article and Find Full Text PDF

Similar Publications

Adaptive gradient scaling: integrating Adam and landscape modification for protein structure prediction.

BMC Bioinformatics

July 2025

Department of Statistics and Data Science, National University of Singapore, 6 Science Drive 2, Singapore, Singapore.

Vitalii Kapitan , Michael Choi

Background: Protein structure prediction is one of the most important scientific problems, on the one hand, it is one of the NP-hard problems, and on the other hand, it has a wide range of applications including drug discovery and biotechnology development. Since experimental methods for structure determination remain expensive and time-consuming, computational structure prediction offers a scalable and cost-effective alternative and application of machine learning in structural biology has revolutionized protein structure prediction. Despite their success, machine learning methods face fundamental limitations in optimizing complex high-dimensional energy landscapes, which motivates research into new methods to improve the robustness and performance of optimization algorithms.

View Article and Find Full Text PDF

Similar Publications

A hyper-heuristic enhanced neuro-evolutionary algorithm with self-adaptive operators and various activation functions for classification problems.

Neural Netw

October 2025

Department of Industrial Engineering, İzmir Bakırçay University, İzmir 35665, Türkiye. Electronic address:

Fehmi Burcin Ozsoydan , İlker Gölcük , Esra Duygu Durmaz

Due to their remarkable generalization capabilities, Artificial Neural Networks (ANNs) grab attention of researchers and practitioners. ANNs have two main stages, namely training and testing. The training stage aims to find optimum synapse values.

View Article and Find Full Text PDF

Similar Publications