Smart City Infrastructure Monitoring with a Hybrid Vision Transformer for Micro-Crack Detection.

Sensors (Basel)

Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of Korea.

Published: August 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Innovative and reliable structural health monitoring (SHM) is indispensable for ensuring the safety, dependability, and longevity of urban infrastructure. However, conventional methods lack full efficiency, remain labor-intensive, and are susceptible to errors, particularly in detecting subtle structural anomalies such as micro-cracks. To address this issue, this study proposes a novel deep-learning framework based on a modified Detection Transformer (DETR) architecture. The framework is enhanced by integrating a Vision Transformer (ViT) backbone and a specially designed Local Feature Extractor (LFE) module. The proposed ViT-based DETR model leverages ViT's capability to capture global contextual information through its self-attention mechanism. The introduced LFE module significantly enhances the extraction and clarification of complex local spatial features in images. The LFE employs convolutional layers with residual connections and non-linear activations, facilitating efficient gradient propagation and reliable identification of micro-level defects. Thorough experimental validation conducted on the benchmark SDNET2018 dataset and a custom dataset of damaged bridge images demonstrates that the proposed Vision-Local Feature Detector (ViLFD) model outperforms existing approaches, including DETR variants and YOLO-based models (versions 5-9), thereby establishing a new state-of-the-art performance. The proposed model achieves superior accuracy (95.0%), precision (0.94), recall (0.93), F1-score (0.93), and mean Average Precision (mAP@0.5 = 0.89), confirming its capability to accurately and reliably detect subtle structural defects. The introduced architecture represents a significant advancement toward automated, precise, and reliable SHM solutions applicable in complex urban environments.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12389922PMC
http://dx.doi.org/10.3390/s25165079DOI Listing

Publication Analysis

Top Keywords

vision transformer
8
subtle structural
8
lfe module
8
smart city
4
city infrastructure
4
infrastructure monitoring
4
monitoring hybrid
4
hybrid vision
4
transformer micro-crack
4
micro-crack detection
4

Similar Publications

Deep feature engineering for accurate sperm morphology classification using CBAM-enhanced ResNet50.

PLoS One

September 2025

School of Computer Science, CHART Laboratory, University of Nottingham, Nottingham, United Kingdom.

Background And Objective: Male fertility assessment through sperm morphology analysis remains a critical component of reproductive health evaluation, as abnormal sperm morphology is strongly correlated with reduced fertility rates and poor assisted reproductive technology outcomes. Traditional manual analysis performed by embryologists is time-intensive, subjective, and prone to significant inter-observer variability, with studies reporting up to 40% disagreement between expert evaluators. This research presents a novel deep learning framework combining Convolutional Block Attention Module (CBAM) with ResNet50 architecture and advanced deep feature engineering (DFE) techniques for automated, objective sperm morphology classification.

View Article and Find Full Text PDF

Vision Transformer (ViT) applied to structural magnetic resonance images has demonstrated success in the diagnosis of Alzheimer's disease (AD) and mild cognitive impairment (MCI). However, three key challenges have yet to be well addressed: 1) ViT requires a large labeled dataset to mitigate overfitting while most of the current AD-related sMRI data fall short in the sample sizes. 2) ViT neglects the within-patch feature learning, e.

View Article and Find Full Text PDF

Flexible and robust cell-type annotation for highly multiplexed tissue images.

Cell Syst

September 2025

Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA. Electronic address:

Identifying cell types in highly multiplexed images is essential for understanding tissue spatial organization. Current cell-type annotation methods often rely on extensive reference images and manual adjustments. In this work, we present a tool, the Robust Image-Based Cell Annotator (RIBCA), that enables accurate, automated, unbiased, and fine-grained cell-type annotation for images with a wide range of antibody panels without requiring additional model training or human intervention.

View Article and Find Full Text PDF

Temporal modeling plays an important role in the effective adaption of the powerful pretrained text-image foundation model into text-video retrieval. However, existing methods often rely on additional heavy trainable modules, such as transformer or BiLSTM, which are inefficient. In contrast, we avoid introducing such heavy components by leveraging frozen foundation models.

View Article and Find Full Text PDF

AI Model Based on Diaphragm Ultrasound to Improve the Predictive Performance of Invasive Mechanical Ventilation Weaning: Prospective Cohort Study.

JMIR Form Res

September 2025

Department of Critical Care Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangdong Provincial Geriatrics Institute, No. 106, Zhongshaner Rd, Guangzhou, 510080, China, 86 15920151904.

Background: Point-of-care ultrasonography has become a valuable tool for assessing diaphragmatic function in critically ill patients receiving invasive mechanical ventilation. However, conventional diaphragm ultrasound assessment remains highly operator-dependent and subjective. Previous research introduced automatic measurement of diaphragmatic excursion and velocity using 2D speckle-tracking technology.

View Article and Find Full Text PDF