ScaleFormer architecture for scale invariant human pose estimation with enhanced mixed features.

Sci Rep

College of Physical Education, Beibu Gulf University, Qinzhou, 535011, Guangxi, China.

Published: July 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Human pose estimation is a fundamental task in computer vision. However, existing methods face performance fluctuation challenges when processing human targets at different scales, especially in outdoor scenes where target distances and viewing angles frequently change. This paper proposes ScaleFormer, a novel scale-invariant pose estimation framework that effectively addresses multi-scale pose estimation problems by innovatively combining the hierarchical feature extraction capabilities of Swin Transformer with the fine-grained feature enhancement mechanisms of ConvNeXt. We design an adaptive feature representation mechanism that enables the model to maintain consistent performance across different scales. Extensive experiments on the MPII human pose dataset demonstrate that ScaleFormer significantly outperforms existing methods on multiple metrics including PCKh, scale consistency score, and keypoint mean average precision. Notably, under extreme scaling conditions (scaling factor 2.0), ScaleFormer's scale consistency score exceeds the baseline model by 48.8 percentage points. Under 30% random occlusion conditions, keypoint detection accuracy improves by 20.5 percentage points. Experiments further verify the complementary contributions of the two core components. These results indicate that ScaleFormer has significant advantages in practical application scenarios and provides new research directions for the field of pose estimation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12311106PMC
http://dx.doi.org/10.1038/s41598-025-12620-4DOI Listing

Publication Analysis

Top Keywords

pose estimation
20
human pose
12
existing methods
8
scale consistency
8
consistency score
8
percentage points
8
pose
6
estimation
5
scaleformer
4
scaleformer architecture
4

Similar Publications

Polar protic and aprotic solvents can effectively simulate the maturation of breast carcinoma cells. Herein, the influence of polar protic solvents (water and ethanol) and aprotic solvents (acetone and DMSO) on the properties of 3-(dimethylaminomethyl)-5-nitroindole (DAMNI) was investigated using density functional theory (DFT) computations. Thermodynamic parameters retrieved from the vibrational analysis indicated that the DAMNI's entropy, heat capacity, and enthalpy increased with rising temperature.

View Article and Find Full Text PDF

A single-laboratory ultra-performance liquid chromatography-mass spectrometry method was developed and validated for the estimation of tetraniliprole, BCS CQ 63359, spirotetramat and its metabolites from chilli and brinjal (commonly known as eggplant, aubergine or guinea squash) samples for assessing the dissipation kinetics and dietary risk assessment of field-applied tetraniliprole and spirotetramat. The control samples of brinjal and chilli fortified with tetraniliprole, spirotetramat and their metabolites at three levels of 0.01, 0.

View Article and Find Full Text PDF

Transformers have been successfully applied in the field of video-based 3D human pose estimation. However, the high computational costs of these video pose transformers (VPTs) make them impractical on resource-constrained devices. In this paper, we present a hierarchical plug-and-play pruning-and-recovering framework, called Hierarchical Hourglass Tokenizer (HOT), for efficient transformer-based 3D human pose estimation from videos.

View Article and Find Full Text PDF

Background: Pelvic and acetabular fractures, often resulting from high-impact trauma, pose significant challenges due to extensive blood loss and complex surgical procedures. Tranexamic acid (TXA), widely used in elective orthopedic surgeries, offers a potential strategy for managing blood loss. However, its efficacy and safety in pelvic-acetabular trauma surgeries have shown inconsistent results in prior studies.

View Article and Find Full Text PDF

Composite endpoints amalgamate multiple clinical outcomes into a single measure, offering efficiency gains in clinical trials through increased event rates and reduced sample sizes, thus accelerating clinical development and regulatory approval. However, employing composite endpoints introduces complexities into health technology assessments (HTAs), particularly in economic modeling, due to the varying clinical significance and cost implications of the components. In this paper, we explore best modeling practice for HTAs that are based on clinical trials that employ composite endpoints.

View Article and Find Full Text PDF