Deep learning-based super-resolution method for projection image compression in radiotherapy.

Quant Imaging Med Surg

Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Published: September 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Cone-beam computed tomography (CBCT) is a three-dimensional (3D) imaging method designed for routine target verification of cancer patients during radiotherapy. The images are reconstructed from a sequence of projection images obtained by the on-board imager attached to a radiotherapy machine. CBCT images are usually stored in a health information system, but the projection images are mostly abandoned due to their massive volume. To store them economically, in this study, a deep learning (DL)-based super-resolution (SR) method for compressing the projection images was investigated.

Methods: In image compression, low-resolution (LR) images were down-sampled by a factor from the high-resolution (HR) projection images and then encoded to the video file. In image restoration, LR images were decoded from the video file and then up-sampled to HR projection images via the DL network. Three SR DL networks, convolutional neural network (CNN), residual network (ResNet), and generative adversarial network (GAN), were tested along with three video coding-decoding (CODEC) algorithms: Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), and AOMedia Video 1 (AV1). Based on the two databases of the natural and projection images, the performance of the SR networks and video codecs was evaluated with the compression ratio (CR), peak signal-to-noise ratio (PSNR), video quality metric (VQM), and structural similarity index measure (SSIM).

Results: The codec AV1 achieved the highest CR among the three codecs. The CRs of AV1 were 13.91, 42.08, 144.32, and 289.80 for the down-sampling factor (DSF) 0 (non-SR) 2, 4, and 6, respectively. The SR network, ResNet, achieved the best restoration accuracy among the three SR networks. Its PSNRs were 69.08, 41.60, 37.08, and 32.44 dB for the four DSFs, respectively; its VQMs were 0.06%, 3.65%, 6.95%, and 13.03% for the four DSFs, respectively; and its SSIMs were 0.9984, 0.9878, 0.9798, and 0.9518 for the four DSFs, respectively. As the DSF increased, the CR increased proportionally with the modest degradation of the restored images.

Conclusions: The application of the SR model can further improve the CR based on the current result achieved by the video encoders. This compression method is not only effective for the two-dimensional (2D) projection images, but also applicable to the 3D images used in radiotherapy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12397698PMC
http://dx.doi.org/10.21037/qims-2024-2962DOI Listing

Publication Analysis

Top Keywords

projection images
28
images
12
video
9
super-resolution method
8
projection
8
image compression
8
video file
8
three networks
8
network resnet
8
video coding
8

Similar Publications

Toward universal immunofluorescence normalization for multiplex tissue imaging with UniFORM.

Cell Rep Methods

August 2025

Department of Biomedical Engineering and Computational Biology Program, OHSU, Portland, OR, USA; Knight Cancer Institute, OHSU, Portland, OR, USA. Electronic address:

We present UniFORM, a non-parametric, Python-based pipeline for normalizing multiplex tissue imaging (MTI) data at both the feature and pixel levels. UniFORM employs an automated rigid landmark registration method tailored to the distributional characteristics of MTI, with UniFORM operating without prior distributional assumptions and handling both unimodal and bimodal patterns. By aligning the biologically invariant negative populations, UniFORM removes technical variation while preserving tissue-specific expression patterns in positive populations.

View Article and Find Full Text PDF

Temporal modeling plays an important role in the effective adaption of the powerful pretrained text-image foundation model into text-video retrieval. However, existing methods often rely on additional heavy trainable modules, such as transformer or BiLSTM, which are inefficient. In contrast, we avoid introducing such heavy components by leveraging frozen foundation models.

View Article and Find Full Text PDF

Facial feminization surgery (FFS) reshapes masculine facial attributes to align with feminine norms, yet normative anthropometric data for Asian populations remain sparse. We therefore quantified sex-related 3-dimensional (3D) facial metrics in healthy Asian adults to delineate dimorphic benchmarks for surgical planning. We prospectively recruited 40 healthy Asian adults (20 males, 20 females; age 18 to 45 years, mean 28.

View Article and Find Full Text PDF

Background: Emotion recognition from electroencephalography (EEG) can play a pivotal role in the advancement of brain-computer interfaces (BCIs). Recent developments in deep learning, particularly convolutional neural networks (CNNs) and hybrid models, have significantly enhanced interest in this field. However, standard convolutional layers often conflate characteristics across various brain rhythms, complicating the identification of distinctive features vital for emotion recognition.

View Article and Find Full Text PDF

Purpose: The combination of multi-layer flat panel detector (FPDT) X-ray imaging and physics-based material decomposition algorithms allows for the removal of anatomical structures. However, the reliability of these algorithms may be compromised by unaccounted materials or scattered radiation.

Approach: We investigated the two-material decomposition performance of a multi-layer FPDT in the context of 2D chest radiography without and with a 13:1 anti-scatter grid employed.

View Article and Find Full Text PDF