98%
921
2 minutes
20
The detection of small voids or defects in X-ray images of tooth root canals still faces challenges. To address the issue, this paper proposes an improved YOLOv10 that combines Token Attention with Residual Convolution (ResConv), termed YOLO-TARC. To overcome the limitations of existing deep learning models in effectively retaining key features of small objects and their insufficient focusing capabilities, we introduce three improvements. First, ResConv is designed to ensure the transmission of discriminative features of small objects during feature propagation, leveraging the ability of residual connections to transmit information from one layer to the next. Second, to tackle the issue of weak focusing capabilities on small targets, a Token Attention module is introduced before the third small object detection head. By tokenizing feature maps and enhancing local focusing, it enables the model to pay closer attention to small targets. Additionally, to optimize the training process, a bounding box loss function is adopted to achieve faster and more accurate bounding box predictions. YOLO-TARC simultaneously enhances the ability to retain detailed information of small targets and improves their focusing capabilities, thereby increasing detection accuracy. Experimental results on a private root canal X-ray image dataset demonstrate that YOLO-TARC outperforms other state-of-the-art object detection models, achieving a 7.5% improvement to 80.8% in mAP50 and a 6.2% increase to 80.0% in Recall. YOLO-TARC can contribute to more accurate and efficient objective postoperative evaluation of root canal treatments.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12115233 | PMC |
http://dx.doi.org/10.3390/s25103036 | DOI Listing |
IEEE Trans Pattern Anal Mach Intell
September 2025
The troublesome model size and quadratic computational complexity associated with token quantity pose significant deployment challenges for Vision Transformers (ViTs) in practical applications. Despite recent advancements in model pruning and token reduction techniques speed up the inference speed of ViTs, these approaches either adopt a fixed sparsity ratio or overlook the meaningful interplay between architectural optimization and token selection. Consequently, this static and single-dimension compression often leads to pronounced accuracy degradation under aggressive compression rates, as they fail to fully explore redundancies across these two orthogonal dimensions.
View Article and Find Full Text PDFGigascience
January 2025
National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, Manitoba R3E 3M4, Canada.
Background: Influenza A virus (IAV) poses a significant threat to animal health globally, with its ability to overcome species barriers and cause pandemics. Rapid and accurate IAV subtypes and host source prediction is crucial for effective surveillance and pandemic preparedness. Deep learning has emerged as a powerful tool for analyzing viral genomic sequences, offering new ways to uncover hidden patterns associated with viral characteristics and host adaptation.
View Article and Find Full Text PDFBrief Bioinform
July 2025
College of Life Science, Chongqing Normal University, No. 37 University Town Road, high-tech District, Chongqing 401331, P.R. China.
The prediction of binary protein-protein interactions (PPIs) is essential for protein engineering, but a major challenge in deep learning-based methods is the unknown decision-making process of the model. To address this challenge, we propose the ESM2_AMP framework, which utilizes the ESM2 protein language model for extracting segment features from actual amino acid sequences and integrates the Transformer model for feature fusion in binary PPIs prediction. Further, the two distinct models, ESM2_AMPS and ESM2_AMP_CSE are developed to systematically explore the contributions of segment features and combine with special tokens features in the decision-making process.
View Article and Find Full Text PDFNeural Netw
August 2025
College of Computer Science and Technology, National University of Defense Technology, Changsha, 410073, China. Electronic address:
The Segment Anything Model (SAM) has gained significant attention for its impressive performance in image segmentation. However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision. This paper presents the RefSAM model, which explores the potential of SAM for RVOS by incorporating multi-view information from diverse modalities and successive frames at different timestamps in an online manner.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
August 2025
Face Anti-Spoofing (FAS) is constantly challenged by new attack types and mediums, and thus it is crucial for a FAS model to not only mitigate Catastrophic Forgetting (CF) of previously learned spoofing knowledge on the training data during continual learning but also enhance the model's generalization ability to potential spoofing attacks. In this paper, we first highlight that current strategies for catastrophic forgetting are not well-suited to the imperceptible nature of spoofing information in FAS and lack the focus on improving generalization capability. Then, the instance-wise dynamic central difference convolutional adapter module with the weighted ensemble strategy for Vision Transformer (ViT) is proposed for efficiently fine-tuning with low-shot data by extracting generalized spoofing texture information.
View Article and Find Full Text PDF