98%
921
2 minutes
20
Large-scale benchmark datasets are crucial in advancing research within the computer science communities. They enable the development of more sophisticated AI models and serve as "golden" benchmarks for evaluating their performance. Thus, ensuring the quality of these datasets is of utmost importance for academic research and the progress of AI systems. For the emerging vision-language tasks, some datasets have been created and frequently used, such as Flickr30k, COCO, and NoCaps, which typically contain a large number of images paired with their ground-truth textual descriptions. In this paper, an automatic method is proposed to assess the quality of large-scale benchmark datasets designed for vision-language tasks. In particular, a new cross-modal matching model is developed, which is capable of automatically scoring the textual descriptions of visual images. Subsequently, this model is employed to evaluate the quality of vision-language datasets by automatically assigning a score to each 'ground-truth' description for every image picture. With a good agreement between manual and automated scoring results on the datasets, our findings reveal significant disparities in the quality of the ground-truth descriptions included in the benchmark datasets. Even more surprising, it is evident that a small portion of the descriptions are unsuitable for serving as reliable ground-truth references. These discoveries emphasize the need for careful utilization of these publicly accessible benchmark databases.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1142/S0129065724500096 | DOI Listing |
Nat Microbiol
September 2025
Division of Computational Pathology, Brigham and Women's Hospital, Boston, MA, USA.
Although dynamical systems models are a powerful tool for analysing microbial ecosystems, challenges in learning these models from complex microbiome datasets and interpreting their outputs limit use. We introduce the Microbial Dynamical Systems Inference Engine 2 (MDSINE2), a Bayesian method that learns compact and interpretable ecosystems-scale dynamical systems models from microbiome timeseries data. Microbial dynamics are modelled as stochastic processes driven by interaction modules, or groups of microbes with similar interaction structure and responses to perturbations, and additionally, noise characteristics of data are modelled.
View Article and Find Full Text PDFCell Rep Methods
August 2025
Department of Biomedical Engineering and Computational Biology Program, OHSU, Portland, OR, USA; Knight Cancer Institute, OHSU, Portland, OR, USA. Electronic address:
We present UniFORM, a non-parametric, Python-based pipeline for normalizing multiplex tissue imaging (MTI) data at both the feature and pixel levels. UniFORM employs an automated rigid landmark registration method tailored to the distributional characteristics of MTI, with UniFORM operating without prior distributional assumptions and handling both unimodal and bimodal patterns. By aligning the biologically invariant negative populations, UniFORM removes technical variation while preserving tissue-specific expression patterns in positive populations.
View Article and Find Full Text PDFNeural Netw
September 2025
College of Information Science, North China University of Technology, Beijing, China. Electronic address:
Personalized Federated Learning (pFL) has received extensive attentions, due to its ability to effectively process non-IID data distributed among different clients. However, most of the existing pFL methods focus on the collaboration between global and local models to enrich the personalization process, but ignoring a lot of valuable historical information, which represents the unique learning trajectory of each client. In this paper, we propose a pFL method called FedLFH, which introduces a tracking variable that allows each client to preserve historical information to facilitate personalization.
View Article and Find Full Text PDFBioinformatics
September 2025
Department of Mathematical Sciences, The University of Texas at Dallas, TX United States.
Motivation: The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Understanding gene functions and interactions in different spatial domains is crucial, as it can enhance our comprehension of biological mechanisms, such as cancer-immune interactions and cell differentiation in various regions. It is necessary to cluster tissue regions into distinct spatial domains and identify discriminating genes that elucidate the clustering result, referred to as spatial domain-specific discriminating genes (DGs).
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
September 2025
Unsupervised visible-infrared person reidentification (UVI-ReID) has recently gained great attention due to its potential for enhancing human detection in diverse environments without labeling. Previous methods utilize intramodality clustering and cross-modality feature matching to achieve UVI-ReID. However, there exist two challenges: 1) noisy pseudo-labels might be generated in the clustering process and 2) the cross-modality feature alignment via matching the marginal distribution of visible and infrared modalities may misalign the different identities from the two modalities.
View Article and Find Full Text PDF