Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step. In this paper, we try to analyze the convergence rate of AdaSAM in the stochastic non-convex setting. We theoretically show that AdaSAM admits a O(1/bT) convergence rate, which achieves linear speedup property with respect to mini-batch size b. Specifically, to decouple the stochastic gradient steps with the adaptive learning rate and perturbed gradient, we introduce the delayed second-order momentum term to decompose them to make them independent while taking an expectation during the analysis. Then we bound them by showing the adaptive learning rate has a limited range, which makes our analysis feasible. To the best of our knowledge, we are the first to provide the non-trivial convergence rate of SAM with an adaptive learning rate and momentum acceleration. At last, we conduct several experiments on several NLP tasks and the synthetic task, which show that AdaSAM could achieve superior performance compared with SGD, AMSGrad, and SAM optimizers.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2023.10.044DOI Listing

Publication Analysis

Top Keywords

adaptive learning
24
learning rate
24
rate momentum
16
deep neural
12
neural networks
12
convergence rate
12
rate
9
training deep
8
sam adaptive
8
momentum acceleration
8

Similar Publications

Background: Mental health problems are common in the working-age population. More knowledge is needed on how to support work participation and reduce sickness absence. The objective of the study was to estimate the distribution of mental well-being and work capacity in women and men in a working population and assess the association between mental well-being and work capacity, while adjusting for sociodemographic characteristics, health status, and working positions.

View Article and Find Full Text PDF

Pulse diagnosis holds a pivotal role in traditional Chinese medicine (TCM) diagnostics, with pulse characteristics serving as one of the critical bases for its assessment. Accurate classification of these pulse pattern is paramount for the objectification of TCM. This study proposes an enhanced SMOTE approach to achieve data augmentation, followed by multi-domain feature extraction.

View Article and Find Full Text PDF

Reduction in reward-driven behaviour depends on the basolateral but not central nucleus of the amygdala in female rats.

J Neurosci

September 2025

Center for Studies in Behavioural Neurobiology, Department of Psychology, Concordia University, Montreal, QC, Canada, H4B 1R6

Adaptive behavior depends on a dynamic balance between acquisition and extinction memories. Male and female rodents differ in extinction learning rates, suggestion potential sex-based differences in this balance. In males, deletion of extinction-recruited neurons in the central nucleus (CN) of the amygdala impairs extinction retrieval, shifting behavior toward acquisition (Lay et al.

View Article and Find Full Text PDF

Background: Informal caregivers of home-dwelling people with dementia experience significant unmet needs. However, family physician teams as primary health care gatekeepers for aging populations in China remain an underused resource for structured caregiver support.

Objective: This hybrid effectiveness-implementation study aimed to evaluate a policy-aligned integration of the World Health Organization's iSupport web-based program with China's family physician contract services for informal dementia caregivers while systematically assessing implementation determinants using the Consolidated Framework for Implementation Research (CFIR).

View Article and Find Full Text PDF

Multi-component collaborative design yields robust hydrogel sensors with superior environmental adaptability for machine learning-assisted gesture recognition.

J Colloid Interface Sci

September 2025

Key Laboratory of Urban Rail Transit Intelligent Operation and Maintenance Technology & Equipment of Zhejiang Province, College of Engineering, Zhejiang Normal University, Jinhua 321004, China. Electronic address:

Developing high-performance wearable flexible sensors that can adapt well to complex environments has become a hotspot. Herein, a polyvinyl alcohol based composite hydrogel sensor with high mechanical strength, desirable frost/swelling resistance, and highly sensitive sensing performance was proposed by a multi-component collaborative design strategy. Meanwhile, an intelligent gesture recognition system was established by combining machine learning algorithm.

View Article and Find Full Text PDF