nnMobileNet: Rethinking CNN for Retinopathy Research.

Conf Comput Vis Pattern Recognit Workshops

School of Computing and Augmented Intelligence, Arizona State University.

Published: June 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12068684PMC
http://dx.doi.org/10.1109/CVPRW63382.2024.00234DOI Listing

Publication Analysis

Top Keywords

vit-based models
12
nnmobilenet rethinking
4
rethinking cnn
4
cnn retinopathy
4
retinopathy decades
4
decades convolutional
4
convolutional neural
4
neural networks
4
networks cnns
4
cnns forefront
4

Similar Publications

Objectives: Accurate diagnosis of biliary strictures remains challenging. This study aimed to develop an artificial intelligence (AI) system for peroral cholangioscopy (POCS) using a Vision Transformer (ViT) architecture and to evaluate its performance compared to different vendor devices, conventional convolutional neural networks (CNNs), and endoscopists.

Methods: We retrospectively analyzed 125 patients with indeterminate biliary strictures who underwent POCS between 2012 and 2024.

View Article and Find Full Text PDF

Pulmonary embolism (PE) represents a severe, life-threatening cardiovascular condition and is notably the third leading cause of cardiovascular mortality, after myocardial infarction and stroke. This pathology occurs when blood clots obstruct the pulmonary arteries, impeding blood flow and oxygen exchange in the lungs. Prompt and accurate detection of PE is critical for appropriate clinical decision-making and patient survival.

View Article and Find Full Text PDF

Smart City Infrastructure Monitoring with a Hybrid Vision Transformer for Micro-Crack Detection.

Sensors (Basel)

August 2025

Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of Korea.

Innovative and reliable structural health monitoring (SHM) is indispensable for ensuring the safety, dependability, and longevity of urban infrastructure. However, conventional methods lack full efficiency, remain labor-intensive, and are susceptible to errors, particularly in detecting subtle structural anomalies such as micro-cracks. To address this issue, this study proposes a novel deep-learning framework based on a modified Detection Transformer (DETR) architecture.

View Article and Find Full Text PDF

Accurate segmentation in medical imaging is essential for disease diagnosis and monitoring, particularly in lung imaging using proton and hyperpolarized gas MRI. However, image degradation due to noise and artifacts-especially in hyperpolarized gas MRI, where scans are acquired during breath-holds-poses challenges for conventional segmentation algorithms. This study evaluates the robustness of deep learning segmentation models under varying Gaussian noise levels, comparing traditional convolutional neural networks (CNNs) with modern Vision Transformer (ViT)-based models.

View Article and Find Full Text PDF

Enhancing detection of common bean diseases using Fast Gradient Sign Method-trained Vision Transformers.

Front Artif Intell

August 2025

Computational and Communication Science and Engineering (CoCSE), The Nelson Mandela African Institution of Science and Technology (NM-AIST), Arusha, Tanzania.

Common bean production in Tanzania is threatened by diseases such as bean rust and bean anthracnose, with early detection critical for effective management. This study presents a Vision Transformer (ViT)-based deep learning model enhanced with adversarial training to improve disease detection robustness under real-world farm conditions. A dataset of 100,000 annotated images augmented with geometric, color, and FGSM-based perturbations, simulating field variability.

View Article and Find Full Text PDF