nnMobileNet: Rethinking CNN for Retinopathy Research.

Wenhui Zhu , Peijie Qiu , Xiwen Chen , Xin Li , Natasha Lepore , Oana M Dumitrascu , Yalin Wang

Conf Comput Vis Pattern Recognit Workshops

School of Computing and Augmented Intelligence, Arizona State University.

Published: June 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12068684	PMC
http://dx.doi.org/10.1109/CVPRW63382.2024.00234	DOI Listing

Publication Analysis

Top Keywords

vit-based models

nnmobilenet rethinking

rethinking cnn

cnn retinopathy

retinopathy decades

decades convolutional

convolutional neural

neural networks

networks cnns

cnns forefront

Similar Publications

Vendor-Agnostic Vision Transformer-Based Artificial Intelligence for Peroral Cholangioscopy: Diagnostic Performance in Biliary Strictures Compared With Convolutional Neural Networks and Endoscopists.

Dig Endosc

September 2025

Department of Gastroenterology and Hepatology, Okayama University Hospital, Okayama, Japan.

Ryosuke Sato , Kazuyuki Matsumoto , Masahiro Tomiya , Takayoshi Tanimoto , Akimitsu Ohto

Objectives: Accurate diagnosis of biliary strictures remains challenging. This study aimed to develop an artificial intelligence (AI) system for peroral cholangioscopy (POCS) using a Vision Transformer (ViT) architecture and to evaluate its performance compared to different vendor devices, conventional convolutional neural networks (CNNs), and endoscopists.

Methods: We retrospectively analyzed 125 patients with indeterminate biliary strictures who underwent POCS between 2012 and 2024.

View Article and Find Full Text PDF

Similar Publications

Improved pulmonary embolism detection in CT pulmonary angiogram scans with hybrid vision transformers and deep learning techniques.

Sci Rep

August 2025

Electronics and Communications Engineering Department, Faculty of Engineering, Mansoura University, Mansoura, 35516, Egypt.

Abeer Abdelhamid , Amir El-Ghamry , Ehab H Abdelhay , Mohammed M Abo-Zahhad , Hossam El-Din Moustafa

Pulmonary embolism (PE) represents a severe, life-threatening cardiovascular condition and is notably the third leading cause of cardiovascular mortality, after myocardial infarction and stroke. This pathology occurs when blood clots obstruct the pulmonary arteries, impeding blood flow and oxygen exchange in the lungs. Prompt and accurate detection of PE is critical for appropriate clinical decision-making and patient survival.

View Article and Find Full Text PDF

Similar Publications

Smart City Infrastructure Monitoring with a Hybrid Vision Transformer for Micro-Crack Detection.

Sensors (Basel)

August 2025

Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of Korea.

Rashid Nasimov , Young Im Cho

Innovative and reliable structural health monitoring (SHM) is indispensable for ensuring the safety, dependability, and longevity of urban infrastructure. However, conventional methods lack full efficiency, remain labor-intensive, and are susceptible to errors, particularly in detecting subtle structural anomalies such as micro-cracks. To address this issue, this study proposes a novel deep-learning framework based on a modified Detection Transformer (DETR) architecture.

View Article and Find Full Text PDF

Similar Publications

Robust Segmentation of Lung Proton and Hyperpolarized Gas MRI with Vision Transformers and CNNs: A Comparative Analysis of Performance Under Artificial Noise.

Bioengineering (Basel)

July 2025

School of Biomedical Engineering, Faculty of Engineering, The University of Western Ontario, London, ON N6A 3K7, Canada.

Ramtin Babaeipour , Matthew S Fox , Grace Parraga , Alexei Ouriadov

Accurate segmentation in medical imaging is essential for disease diagnosis and monitoring, particularly in lung imaging using proton and hyperpolarized gas MRI. However, image degradation due to noise and artifacts-especially in hyperpolarized gas MRI, where scans are acquired during breath-holds-poses challenges for conventional segmentation algorithms. This study evaluates the robustness of deep learning segmentation models under varying Gaussian noise levels, comparing traditional convolutional neural networks (CNNs) with modern Vision Transformer (ViT)-based models.

View Article and Find Full Text PDF

Similar Publications

Enhancing detection of common bean diseases using Fast Gradient Sign Method-trained Vision Transformers.

Front Artif Intell

August 2025

Computational and Communication Science and Engineering (CoCSE), The Nelson Mandela African Institution of Science and Technology (NM-AIST), Arusha, Tanzania.

Upendo Mwaibale , Neema Mduma , Hudson Laizer , Bonny Mgawe

Common bean production in Tanzania is threatened by diseases such as bean rust and bean anthracnose, with early detection critical for effective management. This study presents a Vision Transformer (ViT)-based deep learning model enhanced with adversarial training to improve disease detection robustness under real-world farm conditions. A dataset of 100,000 annotated images augmented with geometric, color, and FGSM-based perturbations, simulating field variability.

View Article and Find Full Text PDF

Similar Publications