Bayesian Gower agreement for categorical data.

Sci Rep

Lehigh University, Bethlehem, PA, 18015, USA.

Published: February 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

In this work I present two methods for measuring agreement in nominal and ordinal data. The measures, which employ Gower-type distances, are simple, intuitive, and easy to compute for any number of units and any number of coders. Influential units and/or coders are easily identified. I consider both one-way and two-way random sampling designs, and develop an approach to Bayesian inference for each. I apply the methods to simulated data and to two real datasets, the first from a one-way radiological study of congenital diaphragmatic hernia, and the second from a two-way study of psychiatric diagnosis. Finally, I consider agreement scales and suggest that Gaussian mutual information can perhaps provide a scale that is more useful than the scale most commonly used. The methods I propose are supported by my open source R package, goweragreement, which is available on the Comprehensive R Archive Network.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11850839PMC
http://dx.doi.org/10.1038/s41598-025-90873-9DOI Listing

Publication Analysis

Top Keywords

bayesian gower
4
gower agreement
4
agreement categorical
4
categorical data
4
data work
4
work methods
4
methods measuring
4
measuring agreement
4
agreement nominal
4
nominal ordinal
4

Similar Publications

The multispecies coalescent (MSC) model accounts for genealogical fluctuations across the genome and provides a framework for analyzing genomic data from closely related species to estimate species phylogenies and divergence times, infer interspecific gene flow, and delineate species boundaries. As the MSC model assumes correct sequences, sequencing and genotyping errors at low read depths may be a serious concern. Here, we use computer simulation to assess the impact of genotyping errors in phylogenomic data on Bayesian inference of the species tree and population parameters such as species split times, population sizes, and the rate of gene flow.

View Article and Find Full Text PDF

Unlabelled: Environmental DNA (eDNA) surveys offer a revolutionary approach to species monitoring by detecting DNA traces left by organisms in environmental samples, such as water and soil. These surveys provide a cost-effective, non-invasive, and highly sensitive alternative to traditional methods that rely on direct observation of species, especially for protected or invasive species. Quantitative PCR (qPCR) is a technique used to amplify and quantify a targeted DNA molecule, making it a popular tool for monitoring focal species.

View Article and Find Full Text PDF

Thanks to genomic data, interspecific gene flow is increasingly recognized as a major evolutionary force that shapes biodiversity. Two models have been developed in the multispecies coalescent (MSC) framework to infer gene flow from genomic data, assuming either constant-rate continuous migration (MSC-M) or discrete introgression/hybridization (MSC-I). The extreme simplicity of these models raises concerns about their usefulness as they represent misspecified models when applied to real data.

View Article and Find Full Text PDF

Analysis of genomic data in the past two decades has highlighted the prevalence of introgression as an important evolutionary force in both plants and animals. The genus Drosophila has received much attention recently, with an analysis of genomic sequence data revealing widespread introgression across the species phylogeny for the genus. However, the methods used in the study are based on data summaries for species triplets and are unable to infer gene flow between sister lineages or to identify the direction of gene flow.

View Article and Find Full Text PDF

Whether non-avian dinosaurs were in decline prior to their extinction 66 million years ago remains a contentious topic. This uncertainty arises from spatiotemporal sampling inconsistency and data absence, which cause challenges in distinguishing between genuine biological trends and sampling artifacts. Consequently, there is an inherent interest in better quantifying the quality of the data and concomitant biases of the dinosaur fossil record.

View Article and Find Full Text PDF