Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

We exploit the potential of the large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes visual prompt learning and cross-attention in CLIP to extract image and text-based prior knowledge. Using predefined and learnable prompts, FastTCM-CR50 introduces an instance-language matching process to enhance the synergy between image and text embeddings, thereby refining text regions. Our Bimodal Similarity Matching (BSM) module facilitates dynamic language prompt generation, enabling offline computations and improving performance. FastTCM-CR50 offers several advantages: 1) It can enhance existing text detectors and spotters, improving performance by an average of 1.6% and 1.5%, respectively. 2) It outperforms the previous TCM-CR50 backbone, yielding an average improvement of 0.2% and 0.55% in text detection and spotting tasks, along with a 47.1% increase in inference speed. 3) It showcases robust few-shot training capabilities. Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26.5% and 4.7% for text detection and spotting tasks, respectively. 4) It consistently enhances performance on out-of-distribution text detection and spotting datasets, particularly the NightTime-ArT subset from ICDAR2019-ArT and the DOTA dataset for oriented object detection.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2024.3379828DOI Listing

Publication Analysis

Top Keywords

text detection
16
detection spotting
16
spotting tasks
12
clip model
8
text
8
scene text
8
improving performance
8
performance average
8
detection
5
turning clip
4

Similar Publications

Microbial quality assessment of Niger seed (Guizotia abyssinica (Linnaeus f.) Cassini) oil in Gondar city: A laboratory-based cross-sectional study.

PLoS One

September 2025

Department of Environmental and Occupational Health and Safety, Institute of Public Health, College of Medicine and Other Health Sciences, University of Gondar, Gondar, Ethiopia.

Foodborne diseases pose a significant public health challenge worldwide. The increasing availability of edible oils in the market, combined with Ethiopia's lack of stringent quality control and regulatory oversight, raises concerns about their safety. This inadequacy in regulation may contribute to microbial contamination, leading to potential public health risks.

View Article and Find Full Text PDF

Sentence-level semantics plays a key role in language understanding. There exist subtle relations and dependencies among sentence-level samples, which is to be exploited. For example, in relational triple extraction, existing models overemphasize extraction modules, ignoring the sentence-level semantics and relation information, which causes (1) the semantics fed to extraction modules is relation-unaware; (2) each sample is trained individually without considering inter-sample dependency.

View Article and Find Full Text PDF

Deepfakes pose critical threats to digital media integrity and societal trust. This paper presents a hybrid deepfake detection framework combining Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) to address challenges in scalability, generalizability, and adversarial robustness. The framework integrates adversarial training, a temporal decay analysis model, and multimodal detection across audio, video, and text domains.

View Article and Find Full Text PDF

Otitis media is a major health issue that usually results from adenoid hypertrophy. Diagnosis is based on symptoms, such as mouth breathing, and imaging studies, including lateral neck radiography (LNR). The adenoid-nasopharyngeal ratio (ANR) is one of the most important and widely used criteria in LNR studies.

View Article and Find Full Text PDF

Health Implications of Microplastic Exposure in Pregnancy and Early Childhood: A Systematic Review.

Int J Womens Health

September 2025

Department of Medical Biochemistry, Faculty of Allied Health Sciences, Mahayogi Gorakhnath University, Gorakhpur, UP, India.

Microplastics (MPs), defined as plastic particles smaller than 5 mm, have emerged as a significant environmental pollutant, raising concerns about their potential health risks. Emerging evidence shows that MPs can reach human tissues, including the placenta, causing oxidative stress, inflammation, and endocrine disruption These issues are particularly concerning for vulnerable populations like pregnant women and infants, where exposure could negatively impact fetal development and health outcomes. This systematic review, adhering to PRISMA guidelines, aimed to identify and evaluate studies on the impact of microplastic exposure on pregnancy outcomes and early childhood development.

View Article and Find Full Text PDF