Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Deep learning has been immensely successful at a variety of tasks, ranging from classification to artificial intelligence. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased, remains a challenge. Here we predict, and test empirically, an analogy between this landscape and the energy landscape of repulsive ellipses. We argue that in fully connected deep networks a phase transition delimits the over- and underparametrized regimes where fitting can or cannot be achieved. In the vicinity of this transition, properties of the curvature of the minima of the loss (the spectrum of the Hessian) are critical. This transition shares direct similarities with the jamming transition by which particles form a disordered solid as the density is increased, which also occurs in certain classes of computational optimization and learning problems such as the perceptron. Our analysis gives a simple explanation as to why poor minima of the loss cannot be encountered in the overparametrized regime. Interestingly, we observe that the ability of fully connected networks to fit random data is independent of their depth, an independence that appears to also hold for real data. We also study a quantity Δ which characterizes how well (Δ<0) or badly (Δ>0) a datum is learned. At the critical point it is power-law distributed on several decades, P_{+}(Δ)∼Δ^{θ} for Δ>0 and P_{-}(Δ)∼(-Δ)^{-γ} for Δ<0, with exponents that depend on the choice of activation function. This observation suggests that near the transition the loss landscape has a hierarchical structure and that the learning dynamics is prone to avalanche-like dynamics, with abrupt changes in the set of patterns that are learned.

Download full-text PDF

Source
http://dx.doi.org/10.1103/PhysRevE.100.012115DOI Listing

Publication Analysis

Top Keywords

minima loss
12
jamming transition
8
loss landscape
8
neural networks
8
poor minima
8
fully connected
8
loss
6
transition paradigm
4
paradigm understand
4
understand loss
4

Similar Publications

: Short-stem hip replacements are designed to provide improved load distribution and to mimic natural biomechanics. The interplay between implant design, positioning, and resulting bone biomechanics in individual patients remains underexplored, and the relationship between radiographically assessed bone remodeling around short stems and biomechanical predictions has not been previously reported. : This study evaluated three short-stem hip implant designs: Proxima, Collo-MIS, and Minima.

View Article and Find Full Text PDF

In this work, guided ion beam tandem mass spectrometer (GIBMS) studies are extended to a peptide length of four amino acids long in order to begin to probe the reaction mechanisms for dissociation in longer peptides, approaching what would typically be seen in a tryptic digest, bottom-up sequencing study. Threshold collision-induced dissociation in a GIBMS was performed on protonated tetraglycine (HGGGG) by using Xe as the collision partner. The kinetic energy dependence of five primary product ion channels ([b], [b], [y + 2H], [y + 2H], and loss of water) were reproduced using the modified line-of-centers model including RRKM kinetic theory in order to determine experimental threshold energies.

View Article and Find Full Text PDF

Insurance claims estimation and fraud detection with optimized deep learning techniques.

Sci Rep

July 2025

Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, Tamilnadu, India.

Estimation and fraud detection in the case of insurance claims play a cardinal role in the insurance sector. With accurate estimation of insurance claims, insurers can have good risk perceptions and disburse compensation within proper time, while fraud prevention helps deter massive monetary loss from fraudulent activities. Financial fraud has done significant damage to the global economy, thus threatening the stability and efficiency of capital markets.

View Article and Find Full Text PDF

Adaptive gradient scaling: integrating Adam and landscape modification for protein structure prediction.

BMC Bioinformatics

July 2025

Department of Statistics and Data Science, National University of Singapore, 6 Science Drive 2, Singapore, Singapore.

Background: Protein structure prediction is one of the most important scientific problems, on the one hand, it is one of the NP-hard problems, and on the other hand, it has a wide range of applications including drug discovery and biotechnology development. Since experimental methods for structure determination remain expensive and time-consuming, computational structure prediction offers a scalable and cost-effective alternative and application of machine learning in structural biology has revolutionized protein structure prediction. Despite their success, machine learning methods face fundamental limitations in optimizing complex high-dimensional energy landscapes, which motivates research into new methods to improve the robustness and performance of optimization algorithms.

View Article and Find Full Text PDF

Due to their remarkable generalization capabilities, Artificial Neural Networks (ANNs) grab attention of researchers and practitioners. ANNs have two main stages, namely training and testing. The training stage aims to find optimum synapse values.

View Article and Find Full Text PDF