YOLO-TARC: YOLOv10 with Token Attention and Residual Convolution for Small Void Detection in Root Canal X-Ray Images.

Yin Pan , Zhenpeng Zhang , Xueyang Zhang , Zhi Zeng , Yibin Tian

Sensors (Basel)

College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China.

Published: May 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

The detection of small voids or defects in X-ray images of tooth root canals still faces challenges. To address the issue, this paper proposes an improved YOLOv10 that combines Token Attention with Residual Convolution (ResConv), termed YOLO-TARC. To overcome the limitations of existing deep learning models in effectively retaining key features of small objects and their insufficient focusing capabilities, we introduce three improvements. First, ResConv is designed to ensure the transmission of discriminative features of small objects during feature propagation, leveraging the ability of residual connections to transmit information from one layer to the next. Second, to tackle the issue of weak focusing capabilities on small targets, a Token Attention module is introduced before the third small object detection head. By tokenizing feature maps and enhancing local focusing, it enables the model to pay closer attention to small targets. Additionally, to optimize the training process, a bounding box loss function is adopted to achieve faster and more accurate bounding box predictions. YOLO-TARC simultaneously enhances the ability to retain detailed information of small targets and improves their focusing capabilities, thereby increasing detection accuracy. Experimental results on a private root canal X-ray image dataset demonstrate that YOLO-TARC outperforms other state-of-the-art object detection models, achieving a 7.5% improvement to 80.8% in mAP50 and a 6.2% increase to 80.0% in Recall. YOLO-TARC can contribute to more accurate and efficient objective postoperative evaluation of root canal treatments.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12115233	PMC
http://dx.doi.org/10.3390/s25103036	DOI Listing

Publication Analysis

Top Keywords

token attention

root canal

focusing capabilities

small targets

attention residual

residual convolution

small

canal x-ray

x-ray images

features small

Similar Publications

PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference.

IEEE Trans Pattern Anal Mach Intell

September 2025

Ye Li , Chen Tang , Yuan Meng , Jiajun Fan , Zenghao Chai

The troublesome model size and quadratic computational complexity associated with token quantity pose significant deployment challenges for Vision Transformers (ViTs) in practical applications. Despite recent advancements in model pruning and token reduction techniques speed up the inference speed of ViTs, these approaches either adopt a fixed sparsity ratio or overlook the meaningful interplay between architectural optimization and token selection. Consequently, this static and single-dimension compression often leads to pronounced accuracy degradation under aggressive compression rates, as they fail to fully explore redundancies across these two orthogonal dimensions.

View Article and Find Full Text PDF

Similar Publications

WaveSeekerNet: accurate prediction of influenza A virus subtypes and host source using attention-based deep learning.

Gigascience

January 2025

National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, Manitoba R3E 3M4, Canada.

Hoang-Hai Nguyen , Josip Rudar , Nathaniel Lesperance , Oksana Vernygora , Graham W Taylor

Background: Influenza A virus (IAV) poses a significant threat to animal health globally, with its ability to overcome species barriers and cause pandemics. Rapid and accurate IAV subtypes and host source prediction is crucial for effective surveillance and pandemic preparedness. Deep learning has emerged as a powerful tool for analyzing viral genomic sequences, offering new ways to uncover hidden patterns associated with viral characteristics and host adaptation.

View Article and Find Full Text PDF

Similar Publications

ESM2_AMP: an interpretable framework for protein-protein interactions prediction and biological mechanism discovery.

Brief Bioinform

July 2025

College of Life Science, Chongqing Normal University, No. 37 University Town Road, high-tech District, Chongqing 401331, P.R. China.

Yawen Sun , Rui Wang , Zeyu Luo , Lejia Tan , Junhao Liu

The prediction of binary protein-protein interactions (PPIs) is essential for protein engineering, but a major challenge in deep learning-based methods is the unknown decision-making process of the model. To address this challenge, we propose the ESM2_AMP framework, which utilizes the ESM2 protein language model for extracting segment features from actual amino acid sequences and integrates the Transformer model for feature fusion in binary PPIs prediction. Further, the two distinct models, ESM2_AMPS and ESM2_AMP_CSE are developed to systematically explore the contributions of segment features and combine with special tokens features in the decision-making process.

View Article and Find Full Text PDF

Similar Publications

RefSAM: Efficiently adapting segmenting anything model for referring video object segmentation.

Neural Netw

August 2025

College of Computer Science and Technology, National University of Defense Technology, Changsha, 410073, China. Electronic address:

Yonglin Li , Jing Zhang , Xiao Teng , Haoyu Zhang , Xinwang Liu

The Segment Anything Model (SAM) has gained significant attention for its impressive performance in image segmentation. However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision. This paper presents the RefSAM model, which explores the potential of SAM for RVOS by incorporating multi-view information from diverse modalities and successive frames at different timestamps in an online manner.

View Article and Find Full Text PDF

Similar Publications

Rehearsal-Free and Efficient Continual Learning for Cross-Domain Face Anti-Spoofing.

IEEE Trans Pattern Anal Mach Intell

August 2025

Rizhao Cai , Yawen Cui , Zitong Yu , Xun Lin , Changsheng Chen

Face Anti-Spoofing (FAS) is constantly challenged by new attack types and mediums, and thus it is crucial for a FAS model to not only mitigate Catastrophic Forgetting (CF) of previously learned spoofing knowledge on the training data during continual learning but also enhance the model's generalization ability to potential spoofing attacks. In this paper, we first highlight that current strategies for catastrophic forgetting are not well-suited to the imperceptible nature of spoofing information in FAS and lack the focus on improving generalization capability. Then, the instance-wise dynamic central difference convolutional adapter module with the weighted ensemble strategy for Vision Transformer (ViT) is proposed for efficiently fine-tuning with low-shot data by extracting generalized spoofing texture information.

View Article and Find Full Text PDF

Similar Publications