Context Perception Parallel Decoder for Scene Text Recognition.

Yongkun Du , Zhineng Chen , Caiyan Jia , Xiaoting Yin , Chenxia Li , Yuning Du , Yu-Gang Jiang

IEEE Trans Pattern Anal Mach Intell

Published: June 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Scene text recognition (STR) methods have struggled to attain high accuracy and fast inference speed. Auto-Regressive (AR)-based models implement the recognition in a character-by-character manner, showing superiority in accuracy but with slow inference speed. Alternatively, Parallel Decoding (PD)-based models infer all characters in a single decoding pass, offering faster inference speed but generally worse accuracy. To realize the dual goals of "AR-level accuracy and PD-level speed", we propose a Context Perception Parallel Decoder (CPPD) to perceive the related context and predict the character sequence in a PD pass. CPPD devises a character counting module to infer the occurrence count of each character, and a character ordering module to deduce the content-free reading order and positions. Meanwhile, the character prediction task associates the positions with characters. They together build a comprehensive recognition context, which benefits the decoder to focus accurately on characters with the attention mechanism, thereby improving the recognition accuracy. We construct a series of CPPD models and also plug the proposed modules into existing STR decoders. Experiments on both English and Chinese benchmarks demonstrate that the CPPD models achieve highly competitive accuracy while running much faster than existing leading models. Moreover, the plugged models achieve significant accuracy improvements.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2025.3545453	DOI Listing

Publication Analysis

Top Keywords

inference speed

context perception

perception parallel

parallel decoder

scene text

text recognition

cppd models

models achieve

accuracy

models

Similar Publications

YOLOv10-kiwi: a YOLOv10-based lightweight kiwifruit detection model in trellised orchards.

Front Plant Sci

August 2025

College of Mathematics and Computer Science, Yan'an University, Yan'an, Shaanxi, China.

Jie Ren , Wendong Wang , Yuan Tian , Jinrong He

To address the challenge of real-time kiwifruit detection in trellised orchards, this paper proposes YOLOv10-Kiwi, a lightweight detection model optimized for resource-constrained devices. First, a more compact network is developed by adjusting the scaling factors of the YOLOv10n architecture. Second, to further reduce model complexity, a novel C2fDualHet module is proposed by integrating two consecutive Heterogeneous Kernel Convolution (HetConv) layers as a replacement for the traditional Bottleneck structure.

View Article and Find Full Text PDF

Similar Publications

A bias correction method for hazard ratio estimation and its inference in a multiple-arm clinical trial.

J Biopharm Stat

September 2025

Biostatistics and Research Decision Sciences, Merck & Co. Inc., North Wales, Pennsylvania, USA.

Liji Shen , Ziwen Wei , Xuan Deng

A randomized clinical trial with multiple experimental groups and one common control group is often used to speed up development to select the best experimental regimen or to increase the chance of success of clinical trials. Most of the time, multiple dose levels of an experimental drug or multiple combinations of one experimental drug with other drugs comprise multiple experimental groups. Because the experimental drug appears in multiple comparisons with a shared control group, multiple testing adjustments to control the family-wise type I error rate are needed.

View Article and Find Full Text PDF

Similar Publications

Cherry-Net: real-time segmentation algorithm of cherry maturity based on improved PIDNet.

Front Plant Sci

September 2025

College of Big Data, Yunnan Agricultural University, Kunming, China.

Jie Cui , Lilian Zhang , Lutao Gao , Chunhui Bai , Linnan Yang

Introduction: Accurate identification of cherry maturity and precise detection of harvestable cherry contours are essential for the development of cherry-picking robots. However, occlusion, lighting variation, and blurriness in natural orchard environments present significant challenges for real-time semantic segmentation.

Methods: To address these issues, we propose a machine vision approach based on the PIDNet real-time semantic segmentation framework.

View Article and Find Full Text PDF

Similar Publications

Real-time corneal image segmentation for cataract surgery based on detection framework.

Int J Comput Assist Radiol Surg

September 2025

School of Life and Environmental Sciences, Guilin University of Electronic Technology, Guilin, China.

Xueyi Shi , Dexun Zhang , Shenwen Liang , Wenjing Meng , Huoling Luo

Objective: Cataract surgery is among the most frequently performed procedures worldwide. Accurate, real-time segmentation of the cornea and surgical instruments is vital for intraoperative guidance and surgical education. However, most existing deep learning-based segmentation methods depend on pixel-level annotations, which are time-consuming and limit practical deployment.

View Article and Find Full Text PDF

Similar Publications

A multi-module enhanced YOLOv8 framework for accurate AO classification of distal radius fractures: SCFAST-YOLO.

Front Med (Lausanne)

August 2025

Department of Orthopaedics, The First Affiliated Hospital of Soochow University, Suzhou, China.

Yu Wang , Haifu Sun , Tiankai Jiang , JunFeng Shi , Qin Wang

Introduction: CT-based classification of distal ulnar-radius fractures requires precise detection of subtle features for surgical planning, yet existing methods struggle to balance accuracy with clinical efficiency. This study aims to develop a lightweight architecture that achieves accurate AO (Arbeitsgemeinschaft für Osteosynthesefragen) typing[an internationally recognized fracture classification system based on fracture location, degree of joint surface involvement, and comminution, divided into three major categories: A (extra-articular), B (partially intra-articular), and C (completely intra-articular)] while maintaining real-time performance. In this task, the major challenges are capturing complex fracture morphologies without compromising detection speed and ensuring precise identification of small articular fragments critical for surgical decision-making.

View Article and Find Full Text PDF

Similar Publications