HCG: Streaming DCNN Accelerator With a Hybrid Computational Granularity Scheme on FPGA.

Wenjin Huang , Conghui Luo , Baoze Zhao , Han Jiao , Yihua Huang

IEEE Trans Neural Netw Learn Syst

Published: July 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

With the growth of field-programmable gate array (FPGA) hardware resources, streaming DCNN accelerators leverage interconvolutional-layer parallelism to enhance throughput. In existing streaming accelerators, convolution nodes typically adopt layer- or column-based tiling methods, where the tiled input feature map (Ifmap) encompasses all input channels. This approach facilitates the comprehensive calculation of the output feature map (Ofmap) and maximizes interlayer parallelism. The computational granularity, defined in this study as the calculated rows or columns of Ofmap based on each tiled Ifmap data, significantly influences on-chip Ifmap storage and off-chip weight bandwidth (BW). The uniform application of computational granularity across all nodes inevitably impacts the memory-BW tradeoff. This article introduces a novel streaming accelerator with a hybrid computational granularity (HCG) scheme. Each node employs an independently optimized computational granularity, enabling a more flexible memory-BW tradeoff and more effective utilization of FPGA resources. However, this hybrid scheme can introduce pipeline bubbles and increase system pipeline complexity and control logic. To address these challenges, this article theoretically analyzes the impact of computational granularity on individual computing nodes and the overall system, aiming to establish a seamless system pipeline without pipeline bubbles and simplify system design. Furthermore, the article develops a hardware overhead model and employs a heuristic algorithm to optimize computational granularity for each computing node, achieving optimal memory-BW tradeoff and higher throughput. Finally, the effectiveness of the proposed design and optimization methodology is validated through the implementation of a 3-TOPS ResNet-18 accelerator on the Alveo U250 development board under BW constraints of 25, 20, and 15 GB/s. Additionally, accelerators for 4-TOPS VGG-16, 4-TOPS ResNet-34, 5-TOPS ResNet-50, 3-TOPS MobileNetV1, 4-TOPS ConvNeXt-T, and 4-TOPS ResNeXt-50 are implemented, surpassing the performance of most existing works.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2025.3587694	DOI Listing

Publication Analysis

Top Keywords

computational granularity

memory-bw tradeoff

streaming dcnn

accelerator hybrid

hybrid computational

feature map

pipeline bubbles

system pipeline

computational

granularity

Similar Publications

Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations.

IEEE Trans Pattern Anal Mach Intell

September 2025

Cheng Lei , Jie Fan , Xinran Li , Tian-Zhu Xiang , Ao Li

Camouflaged Object Segmentation (COS) faces significant challenges due to the scarcity of annotated data, where meticulous pixel-level annotation is both labor-intensive and costly, primarily due to the intricate object-background boundaries. Addressing the core question, "Can COS be effectively achieved in a zero-shot manner without manual annotations for any camouflaged object?", we propose an affirmative solution. We analyze the learned attention patterns for camouflaged objects and introduce a robust zero-shot COS framework.

View Article and Find Full Text PDF

Similar Publications

CF-DTI: coarse-to-fine feature extraction for enhanced drug-target interaction prediction.

Health Inf Sci Syst

December 2025

School of Information Science and Automation, Northeastern University, Shenyang, 110819 China.

Yining Qian , Qingjie Wang , Libang Yin , An-Yang Lu

Accurate prediction of drug-target interactions (DTIs) is crucial for improving the efficiency and success rate of drug development. Despite recent advancements, existing methods often fail to leverage interaction features at multiple granular levels, resulting in suboptimal data utilization and limited predictive performance. To address these challenges, we propose CF-DTI, a coarse-to-fine drug-target interaction model that integrates both coarse-grained and fine-grained features to enhance predictive accuracy.

View Article and Find Full Text PDF

Similar Publications

Integrating large language models and active inference to understand eye movements in reading and dyslexia.

Phys Life Rev

September 2025

Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy. Electronic address:

Francesco Donnarumma , Mirco Frosolone , Giovanni Pezzulo

We present a novel computational model employing hierarchical active inference to simulate reading and eye movements. The model characterizes linguistic processing as inference over a hierarchical generative model, facilitating predictions and inferences at various levels of granularity, from syllables to sentences. Our approach combines the strengths of large language models for realistic textual predictions and active inference for guiding eye movements to informative textual information, enabling the testing of predictions.

View Article and Find Full Text PDF

Similar Publications

MOSAIC: A Multi-Granularity Cross-Modal Framework for Predicting Synergistic Drug Combinations in Personalized Healthcare.

IEEE J Biomed Health Inform

September 2025

Licai Zhang , Xiao Kang , Xinxing Yang , Lin Wang , Genke Yang

The personalization of cancer treatment through drug combinations is critical for improving healthcare outcomes, increasing effectiveness, and reducing side effects. Computational methods have become increasingly important to prioritize synergistic drug pairs because of the vast search space of possible chemicals. However, existing approaches typically rely solely on global molecular structures, neglecting information exchange between different modality representations and interactions between molecular and fine-grained fragments, leading to limited understanding of drug synergy mechanisms for personalized treatment.

View Article and Find Full Text PDF

Similar Publications

CAGCNet: generalized contrastive learning for person identification based on channel aggregated EEG features.

Cogn Neurodyn

December 2025

Department of Molecular Medicine, University of Rome Sapienza, Piazzale Aldo Moro 5, Rome, 00185 Lazio region Italy.

Xinran Wang , Xuanyu Jin , Wanzeng Kong , Fabio Babiloni

Person identification method based on electroencephalograms (EEG) signals, or so called brainprint recognition is a novel way to distinguish identities with advantages of high security. However, existing methods neglect the distribution difference between training and test data, and the large distance between projected features in the latent space makes the performance of the model degrade in the unseen domain data. In this paper, we propose channel aggregated based generalized contrastive learning framework, which combines multiple modules to overcome this challenge.

View Article and Find Full Text PDF

Similar Publications