Automated Quality Evaluation of Large-Scale Benchmark Datasets for Vision-Language Tasks.

Ruibin Zhao , Zhiwei Xie , Yipeng Zhuang , Philip L H Yu

Int J Neural Syst

Department of Mathematics and Information Technology, The Education University of Hong Kong, Hong Kong SAR, P. R. China.

Published: March 2024

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Large-scale benchmark datasets are crucial in advancing research within the computer science communities. They enable the development of more sophisticated AI models and serve as "golden" benchmarks for evaluating their performance. Thus, ensuring the quality of these datasets is of utmost importance for academic research and the progress of AI systems. For the emerging vision-language tasks, some datasets have been created and frequently used, such as Flickr30k, COCO, and NoCaps, which typically contain a large number of images paired with their ground-truth textual descriptions. In this paper, an automatic method is proposed to assess the quality of large-scale benchmark datasets designed for vision-language tasks. In particular, a new cross-modal matching model is developed, which is capable of automatically scoring the textual descriptions of visual images. Subsequently, this model is employed to evaluate the quality of vision-language datasets by automatically assigning a score to each 'ground-truth' description for every image picture. With a good agreement between manual and automated scoring results on the datasets, our findings reveal significant disparities in the quality of the ground-truth descriptions included in the benchmark datasets. Even more surprising, it is evident that a small portion of the descriptions are unsuitable for serving as reliable ground-truth references. These discoveries emphasize the need for careful utilization of these publicly accessible benchmark databases.

Download full-text PDF	Source
http://dx.doi.org/10.1142/S0129065724500096	DOI Listing

Publication Analysis

Top Keywords

benchmark datasets

large-scale benchmark

vision-language tasks

datasets

benchmark

automated quality

quality evaluation

evaluation large-scale

vision-language

datasets vision-language

Similar Publications

Learning ecosystem-scale dynamics from microbiome data with MDSINE2.

Nat Microbiol

September 2025

Division of Computational Pathology, Brigham and Women's Hospital, Boston, MA, USA.

Travis E Gibson , Younhun Kim , Sawal Acharya , David E Kaplan , Nicholas DiBenedetto

Although dynamical systems models are a powerful tool for analysing microbial ecosystems, challenges in learning these models from complex microbiome datasets and interpreting their outputs limit use. We introduce the Microbial Dynamical Systems Inference Engine 2 (MDSINE2), a Bayesian method that learns compact and interpretable ecosystems-scale dynamical systems models from microbiome timeseries data. Microbial dynamics are modelled as stochastic processes driven by interaction modules, or groups of microbes with similar interaction structure and responses to perturbations, and additionally, noise characteristics of data are modelled.

View Article and Find Full Text PDF

Similar Publications

Toward universal immunofluorescence normalization for multiplex tissue imaging with UniFORM.

Cell Rep Methods

August 2025

Department of Biomedical Engineering and Computational Biology Program, OHSU, Portland, OR, USA; Knight Cancer Institute, OHSU, Portland, OR, USA. Electronic address:

Kunlun Wang , Kaoutar Ait-Ahmad , Sam Kupp , Zachary Sims , Eric Cramer

We present UniFORM, a non-parametric, Python-based pipeline for normalizing multiplex tissue imaging (MTI) data at both the feature and pixel levels. UniFORM employs an automated rigid landmark registration method tailored to the distributional characteristics of MTI, with UniFORM operating without prior distributional assumptions and handling both unimodal and bimodal patterns. By aligning the biologically invariant negative populations, UniFORM removes technical variation while preserving tissue-specific expression patterns in positive populations.

View Article and Find Full Text PDF

Similar Publications

Learning from history for personalized federated learning.

Neural Netw

September 2025

College of Information Science, North China University of Technology, Beijing, China. Electronic address:

Yingxun Fu , Shulan Yin , Li Ma , Jie Liu

Personalized Federated Learning (pFL) has received extensive attentions, due to its ability to effectively process non-IID data distributed among different clients. However, most of the existing pFL methods focus on the collaboration between global and local models to enrich the personalization process, but ignoring a lot of valuable historical information, which represents the unique learning trajectory of each client. In this paper, we propose a pFL method called FedLFH, which introduces a tracking variable that allows each client to preserve historical information to facilitate personalization.

View Article and Find Full Text PDF

Similar Publications

BISON: Bi-clustering of spatial omics data with feature selection.

Bioinformatics

September 2025

Department of Mathematical Sciences, The University of Texas at Dallas, TX United States.

Bencong Zhu , Alberto Cassese , Marina Vannucci , Michele Guindani , Qiwei Li

Motivation: The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Understanding gene functions and interactions in different spatial domains is crucial, as it can enhance our comprehension of biological mechanisms, such as cancer-immune interactions and cell differentiation in various regions. It is necessary to cluster tissue regions into distinct spatial domains and identify discriminating genes that elucidate the clustering result, referred to as spatial domain-specific discriminating genes (DGs).

View Article and Find Full Text PDF

Similar Publications

Unsupervised Visible-Infrared ReID via Pseudo-Label Correction and Modality-Level Alignment.

IEEE Trans Neural Netw Learn Syst

September 2025

Yexin Liu , Weiming Zhang , Athanasios V Vasilakos , Lin Wang

Unsupervised visible-infrared person reidentification (UVI-ReID) has recently gained great attention due to its potential for enhancing human detection in diverse environments without labeling. Previous methods utilize intramodality clustering and cross-modality feature matching to achieve UVI-ReID. However, there exist two challenges: 1) noisy pseudo-labels might be generated in the clustering process and 2) the cross-modality feature alignment via matching the marginal distribution of visible and infrared modalities may misalign the different identities from the two modalities.

View Article and Find Full Text PDF

Similar Publications