Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Scene text recognition (STR) methods have struggled to attain high accuracy and fast inference speed. Auto-Regressive (AR)-based models implement the recognition in a character-by-character manner, showing superiority in accuracy but with slow inference speed. Alternatively, Parallel Decoding (PD)-based models infer all characters in a single decoding pass, offering faster inference speed but generally worse accuracy. To realize the dual goals of "AR-level accuracy and PD-level speed", we propose a Context Perception Parallel Decoder (CPPD) to perceive the related context and predict the character sequence in a PD pass. CPPD devises a character counting module to infer the occurrence count of each character, and a character ordering module to deduce the content-free reading order and positions. Meanwhile, the character prediction task associates the positions with characters. They together build a comprehensive recognition context, which benefits the decoder to focus accurately on characters with the attention mechanism, thereby improving the recognition accuracy. We construct a series of CPPD models and also plug the proposed modules into existing STR decoders. Experiments on both English and Chinese benchmarks demonstrate that the CPPD models achieve highly competitive accuracy while running much faster than existing leading models. Moreover, the plugged models achieve significant accuracy improvements.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2025.3545453DOI Listing

Publication Analysis

Top Keywords

inference speed
12
context perception
8
perception parallel
8
parallel decoder
8
scene text
8
text recognition
8
cppd models
8
models achieve
8
accuracy
7
models
6

Similar Publications

YOLOv10-kiwi: a YOLOv10-based lightweight kiwifruit detection model in trellised orchards.

Front Plant Sci

August 2025

College of Mathematics and Computer Science, Yan'an University, Yan'an, Shaanxi, China.

To address the challenge of real-time kiwifruit detection in trellised orchards, this paper proposes YOLOv10-Kiwi, a lightweight detection model optimized for resource-constrained devices. First, a more compact network is developed by adjusting the scaling factors of the YOLOv10n architecture. Second, to further reduce model complexity, a novel C2fDualHet module is proposed by integrating two consecutive Heterogeneous Kernel Convolution (HetConv) layers as a replacement for the traditional Bottleneck structure.

View Article and Find Full Text PDF

A bias correction method for hazard ratio estimation and its inference in a multiple-arm clinical trial.

J Biopharm Stat

September 2025

Biostatistics and Research Decision Sciences, Merck & Co. Inc., North Wales, Pennsylvania, USA.

A randomized clinical trial with multiple experimental groups and one common control group is often used to speed up development to select the best experimental regimen or to increase the chance of success of clinical trials. Most of the time, multiple dose levels of an experimental drug or multiple combinations of one experimental drug with other drugs comprise multiple experimental groups. Because the experimental drug appears in multiple comparisons with a shared control group, multiple testing adjustments to control the family-wise type I error rate are needed.

View Article and Find Full Text PDF

Introduction: Accurate identification of cherry maturity and precise detection of harvestable cherry contours are essential for the development of cherry-picking robots. However, occlusion, lighting variation, and blurriness in natural orchard environments present significant challenges for real-time semantic segmentation.

Methods: To address these issues, we propose a machine vision approach based on the PIDNet real-time semantic segmentation framework.

View Article and Find Full Text PDF

Objective: Cataract surgery is among the most frequently performed procedures worldwide. Accurate, real-time segmentation of the cornea and surgical instruments is vital for intraoperative guidance and surgical education. However, most existing deep learning-based segmentation methods depend on pixel-level annotations, which are time-consuming and limit practical deployment.

View Article and Find Full Text PDF

Introduction: CT-based classification of distal ulnar-radius fractures requires precise detection of subtle features for surgical planning, yet existing methods struggle to balance accuracy with clinical efficiency. This study aims to develop a lightweight architecture that achieves accurate AO (Arbeitsgemeinschaft für Osteosynthesefragen) typing[an internationally recognized fracture classification system based on fracture location, degree of joint surface involvement, and comminution, divided into three major categories: A (extra-articular), B (partially intra-articular), and C (completely intra-articular)] while maintaining real-time performance. In this task, the major challenges are capturing complex fracture morphologies without compromising detection speed and ensuring precise identification of small articular fragments critical for surgical decision-making.

View Article and Find Full Text PDF