PheWAS analysis on large-scale biobank data with PheTK.

Bioinformatics

National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, United States.

Published: December 2024


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Summary: With the rapid growth of genetic data linked to electronic health record (EHR) data in huge cohorts, large-scale phenome-wide association study (PheWAS) have become powerful discovery tools in biomedical research. PheWAS is an analysis method to study phenotype associations utilizing longitudinal EHR data. Previous PheWAS packages were developed mostly with smaller datasets and with earlier PheWAS approaches. PheTK was designed to simplify analysis and efficiently handle biobank-scale data. PheTK uses multithreading and supports a full PheWAS workflow including extraction of data from OMOP databases and Hail matrix tables as well as PheWAS analysis for both phecode version 1.2 and phecodeX. Benchmarking results showed PheTK took 64% less time than the R PheWAS package to complete the same workflow. PheTK can be run locally or on cloud platforms such as the All of Us Researcher Workbench (All of Us) or the UK Biobank (UKB) Research Analysis Platform (RAP).

Availability And Implementation: The PheTK package is freely available on the Python Package Index, on GitHub under GNU General Public License (GPL-3) at https://github.com/nhgritctran/PheTK, and on Zenodo, DOI 10.5281/zenodo.14217954, at https://doi.org/10.5281/zenodo.14217954. PheTK is implemented in Python and platform independent.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11709244PMC
http://dx.doi.org/10.1093/bioinformatics/btae719DOI Listing

Publication Analysis

Top Keywords

phewas analysis
12
phewas
8
data phetk
8
ehr data
8
phetk
7
data
6
analysis large-scale
4
large-scale biobank
4
biobank data
4
phetk summary
4

Similar Publications

Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) is an emerging global health concern, and its presence increases the risk of multi-system diseases. This study aimed to investigate the multimorbidity trajectories of chronic diseases in people living with MASLD.

Methods: We identified 137 859 MASLD patients in UK Biobank and used 'propensity score matching' to match an equal number of non-MASLD controls.

View Article and Find Full Text PDF

[Phenome-wide Mendelian randomization identifies causal exposures for nonsyndromic cleft lip with or without cleft palate].

Zhonghua Kou Qiang Yi Xue Za Zhi

September 2025

Department of Orthodontics, The Affiliated Stomatological Hospital of Nanjing Medical University & State Key Laboratory Cultivation Base of Research, Prevention and Treatment for Oral Diseases (Nanjing Medical University) & Jiangsu Province Engineering Research Center of Stomatological Translational

To systematically investigate the causal effects of exposure factors on nonsyndromic cleft lip with or without cleft palate (NSCL/P) using a phenome-wide Mendelian randomization (MR-PheWAS) framework and identify pleiotropic loci. This study integrated genome-wide association study (GWAS) data for NSCL/P, including 1 069 cases and 1 724 controls, and systematically evaluated causal associations between exposures and NSCL/P using the MR-PheWAS framework. GWAS summary data for 2 106 Asian population-exposure phenotypes were obtained from the IEU OpenGWAS database.

View Article and Find Full Text PDF

Background: Elevated serum levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST) are biomarkers of liver dysfunction and predictors of cirrhosis and liver cancer. While European-ancestry GWAS have identified hundreds of loci influencing these enzymes and driven drug discovery and personalized interventions, comparable genetic studies in Han Taiwanese and other East Asian populations remain lacking.

Methods: We performed GWAS of ALT (n = 137,312) and AST (n = 111,527) in Han Taiwanese to characterize liver enzyme genetics.

View Article and Find Full Text PDF

Neuropathic pain is a common and debilitating symptom with limited treatment options. Genetic studies, which can provide vital evidence for drug development, have identified only 3 genome-wide significant signals for neuropathic pain traits. To address this, we performed the largest genome-wide association study (GWAS) to date of all-cause neuropathic pain and neuropathic pain subtypes.

View Article and Find Full Text PDF

Evaluating the causal effect of circulating proteome on the risk of Juvenile idiopathic arthritis: an omics pipeline study.

Pediatr Rheumatol Online J

September 2025

Department of Rheumatology and Immunology, Children's Hospital of Chongqing Medical University, No. 136, Zhongshan 2nd Road, Yuzhong District, Chongqing, 400014, China.

Background: Genome-wide association studies (GWAS) have pinpointed a multitude of risk loci associated with Juvenile Idiopathic Arthritis (JIA), but it is challenging to decipher novel plasma proteins. To address this, we applied an integrative omics pipeline to uncover novel proteins associated with JIA risk.

Methods: In this research, we utilized an integrative omics method to identify new plasma proteins associated with JIA.

View Article and Find Full Text PDF