Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Motivation: Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks.

Results: In this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation: a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery.

Availability And Implementation: The GitHub R package for this work is available at https://github.com/anjalisilva/mixMVPLN and is released under the open source MIT license.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10159656PMC
http://dx.doi.org/10.1093/bioinformatics/btad167DOI Listing

Publication Analysis

Top Keywords

matrix variate
20
three-way data
16
rna sequencing
12
mixtures matrix
8
variate poisson-log
8
poisson-log normal
8
normal distributions
8
data
8
data structures
8
conditions occasions
8

Similar Publications

Mode-wise principal subspace pursuit and matrix spiked covariance model.

J R Stat Soc Series B Stat Methodol

February 2025

Departments of Biostatistics & Bioinformatics and Computer Science, Duke University, Durham, NC, USA.

This paper introduces a novel framework called Mode-wise Principal Subspace Pursuit (MOP-UP) to extract hidden variations in both the row and column dimensions for matrix data. To enhance the understanding of the framework, we introduce a class of matrix-variate spiked covariance models that serve as inspiration for the development of the MOP-UP algorithm. The MOP-UP algorithm consists of two steps: Average Subspace Capture (ASC) and Alternating Projection.

View Article and Find Full Text PDF

Cervical cancer is the second most prevalent disease among Ethiopian women of reproductive age and a serious gynecological malignancy affecting women regionally. About, 3235 deaths and 4648 new cases are reported nationwide each year. Precancerous cervical screening programs face many difficulties in settings with limited resources, despite their severity, such as a lack of medical supplies and equipment, poorly trained healthcare workers, a heavy workload for current staff, low professional compliance, and insufficient support from medical facilities.

View Article and Find Full Text PDF

Bayesian thresholded modeling for integrating brain node and network predictors.

Biostatistics

December 2024

Department of Biostatistics, Yale University, 300 George St, New Haven, CT 06511, United States.

Progress in neuroscience has provided unprecedented opportunities to advance our understanding of brain alterations and their correspondence to phenotypic profiles. With data collected from various imaging techniques, studies have integrated different types of information ranging from brain structure, function, or metabolism. More recently, an emerging way to categorize imaging traits is through a metric hierarchy, including localized node-level measurements and interactive network-level metrics.

View Article and Find Full Text PDF

[Geographical origin authentication of Gongju at different spatial scales based on hyperspectral technology].

Zhongguo Zhong Yao Za Zhi

November 2024

National Key Laboratory for Quality Ensurance and Sustainable Use of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences Beijing 100700, China.

Gongju(Chrysanthemum morifolium) is one of the five major medicinal Chrysanthemum varieties included in the Chinese Pharmacopoeia. In recent years, its cultivation areas have changed significantly, resulting in mixed quality of the medicinal herbs. In this study, Gongju cultivated in Anhui, Yunnan, Chongqing, and other places were selected as research objects.

View Article and Find Full Text PDF

Quality detection is critical in the development of prepared dishes, with distributional uniformity playing a significant role. This study used hyperspectral imaging (HSI) and Moran's I to quantify distributional uniformity, employing pizza as case. Pizza ingredients' spectra were collected, pre-processed with Detrended Fluctuation Analysis (DFA), Savitzky-Golay (SG) and Standard Normal Variate (SNV), and down-scaled with Principal Component Analysis (PCA).

View Article and Find Full Text PDF