Publications by authors named "Tak-Wah Lam"

Background: Cardiovascular disease (CVD) is the leading cause of mortality and morbidity in China and worldwide while we are lacking in validated primary prevention model specifically for Chinese. To identify CVD high-risk individuals for early intervention, we created and validated a primary prevention risk prediction model, Personalized CARdiovascular DIsease risk Assessment for Chinese (1°P-CARDIAC), in contemporary Chinese cohorts in Hong Kong.

Methods: Patients without any history of CVD was categorized as derivation and validation cohorts based on their different geographical location of residence in Hong Kong.

View Article and Find Full Text PDF

Motivation: Rare diseases affect over 300 million people worldwide and are often caused by genetic variants. While variant detection has become cost-effective, interpreting these variants-particularly collecting literature-based evidence like ACMG/AMP PM3-remains complex and time-consuming.

Results: We present AutoPM3, a method that automates PM3 evidence extraction from literatures using open-source large language models (LLMs).

View Article and Find Full Text PDF
Article Synopsis
  • Ensuring a unified representation of genetic variants is crucial for accurate downstream analysis, but current methods often treat this unification as a later step, which can lead to inconsistencies.
  • Repun is a new algorithm designed to align variant representations before variant calling, improving the reliability of training models for deep learning while also assessing alignment quality more effectively.
  • This approach uses haplotype information to streamline the unification process, achieving over 99.99% precision and more than 99.5% recall in tests across multiple sequencing platforms, and is available as an open-source tool.
View Article and Find Full Text PDF

Aims: Cardiovascular disease (CVD) is a leading cause of mortality, especially in developing countries. This study aimed to develop and validate a CVD risk prediction model, Personalized CARdiovascular DIsease risk Assessment for Chinese (P-CARDIAC), for recurrent cardiovascular events using machine learning technique.

Methods And Results: Three cohorts of Chinese patients with established CVD were included if they had used any of the public healthcare services provided by the Hong Kong Hospital Authority (HA) since 2004 and categorized by their geographical locations.

View Article and Find Full Text PDF

Summary: Third-generation long-read sequencing is an increasingly utilized technique for profiling human immunodeficiency virus (HIV) quasispecies and detecting drug resistance mutations due to its ability to cover the entire viral genome in individual reads. Recently, the ClusterV tool has demonstrated accurate detection of HIV quasispecies from Nanopore long-read sequencing data. However, the need for scripting skills and a computational environment may act as a barrier for many potential users.

View Article and Find Full Text PDF

Aims: Dissecting complex interactions among transcription factors (TFs), microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) are central for understanding heart development and function. Although computational approaches and platforms have been described to infer relationships among regulatory factors and genes, current approaches do not adequately account for how highly diverse, interacting regulators that include noncoding RNAs (ncRNAs) control cardiac gene expression dynamics over time.

Methods: To overcome this limitation, we devised an integrated framework, cardiac gene regulatory modeling (CGRM) that integrates LogicTRN and regulatory component analysis bioinformatics modeling platforms to infer complex regulatory mechanisms.

View Article and Find Full Text PDF

An increasing number of patients are being diagnosed with lung adenocarcinoma, but there remains limited progress in enhancing prognostic outcomes and improving survival rates for these patients. Genome instability is considered a contributing factor, as it enables other hallmarks of cancer to acquire functional capabilities, thus allowing cancer cells to survive, proliferate, and disseminate. Despite the importance of genome instability in cancer development, few studies have explored the prognostic signature associated with genome instability for lung adenocarcinoma.

View Article and Find Full Text PDF

Background: With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data.

Results: We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform).

View Article and Find Full Text PDF

Sensitive detection of Mycobacterium tuberculosis (TB) in small percentages in metagenomic samples is essential for microbial classification and drug resistance prediction. However, traditional methods, such as bacterial culture and microscopy, are time-consuming and sometimes have limited TB detection sensitivity. Oxford nanopore technologies (ONT) MinION sequencing allows rapid and simple sample preparation for sequencing.

View Article and Find Full Text PDF

Background: Very low-coverage (0.1 to 1×) whole genome sequencing (WGS) has become a promising and affordable approach to discover genomic variants of human populations for genome-wide association study (GWAS). To support genetic screening using preimplantation genetic testing (PGT) in a large population, the sequencing coverage goes below 0.

View Article and Find Full Text PDF

Deep learning-based variant callers are becoming the standard and have achieved superior single nucleotide polymorphisms calling performance using long reads. Here we present Clair3, which leverages two major method categories: pileup calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 runs faster than any of the other state-of-the-art variant callers and demonstrates improved performance, especially at lower coverage.

View Article and Find Full Text PDF

Background: Whole genome sequencing using the long-read Oxford Nanopore Technologies (ONT) MinION sequencer provides a cost-effective option for structural variant (SV) detection in clinical applications. Despite the advantage of using long reads, however, accurate SV calling and phasing are still challenging.

Results: We introduce Duet, an SV detection tool optimized for SV calling and phasing using ONT data.

View Article and Find Full Text PDF

DNA sequences that are absent in the human reference genome are classified as novel sequences. The discovery of these missed sequences is crucial for exploring the genomic diversity of populations and understanding the genetic basis of human diseases. However, various DNA lengths of reads generated from different sequencing technologies can significantly affect the results of novel sequences.

View Article and Find Full Text PDF

Accurate identification of genetic variants from family child-mother-father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing data. For better trio variant calling, we introduce Clair3-Trio, the first variant caller tailored for family trio data from Nanopore long-reads.

View Article and Find Full Text PDF

Structural variation (SV) is a major cause of genetic disorders. In this paper, we show that low-depth (specifically, 4×) whole-genome sequencing using a single Oxford Nanopore MinION flow cell suffices to support sensitive detection of SV, particularly pathogenic SV for supporting clinical diagnosis. When using 4× ONT WGS data, existing SV calling software often fails to detect pathogenic SV, especially in the form of long deletion, terminal deletion, duplication, and unbalanced translocation.

View Article and Find Full Text PDF

Background: The application of long-read sequencing using the Oxford Nanopore Technologies (ONT) MinION sequencer is getting more diverse in the medical field. Having a high sequencing error of ONT and limited throughput from a single MinION flowcell, however, limits its applicability for accurate variant detection. Medical exome sequencing (MES) targets clinically significant exon regions, allowing rapid and comprehensive screening of pathogenic variants.

View Article and Find Full Text PDF

HKG is the first fully accessible variant database for Hong Kong Cantonese, constructed from 205 novel whole-exome sequencing data. There has long been a research gap in the understanding of the genetic architecture of southern Chinese subgroups, including Hong Kong Cantonese. HKG detected 196 325 high-quality variants with 5.

View Article and Find Full Text PDF

In this paper, we explore using the data-centric approach to tackle the Multiple Sequence Alignment (MSA) construction problem. Unlike the algorithm-centric approach, which reduces the construction problem to a combinatorial optimization problem based on an abstract mathematical model, the data-centric approach explores using classification models trained from existing benchmark data to guide the construction. We identified two simple classifications to help us choose a better alignment tool and determine whether and how much to carry out realignment.

View Article and Find Full Text PDF

Pan-genome sequence analysis of human population ancestry is critical for expanding and better defining human genome sequence diversity. However, the amount of genetic variation still missing from current human reference sequences is still unknown. Here, we used 486 deep-sequenced Han Chinese genomes to identify 276 Mbp of DNA sequences that, to our knowledge, are absent in the current human reference.

View Article and Find Full Text PDF

Relation extraction (RE) is a fundamental task for extracting gene-disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene-disease associations only from single sentences or abstract texts. A few studies have explored extracting gene-disease associations from full-text articles, but there exists a large room for improvements.

View Article and Find Full Text PDF

Background: Next-generation sequencing (NGS) enables unbiased detection of pathogens by mapping the sequencing reads of a patient sample to the known reference sequence of bacteria and viruses. However, for a new pathogen without a reference sequence of a close relative, or with a high load of mutations compared to its predecessors, read mapping fails due to a low similarity between the pathogen and reference sequence, which in turn leads to insensitive and inaccurate pathogen detection outcomes.

Results: We developed MegaPath, which runs fast and provides high sensitivity in detecting new pathogens.

View Article and Find Full Text PDF

The pathogenesis of diabetic nephropathy (DN) is accompanied by alterations in biological function and signaling pathways regulated through complex molecular mechanisms. A number of regulatory factors, including transcription factors (TFs) and non-coding RNAs (ncRNAs, including lncRNAs and miRNAs), have been implicated in DN; however, it is unclear how the interactions among these regulatory factors contribute to the development of DN pathogenesis. In this study, we developed a network-based analysis to decipher interplays between TFs and ncRNAs regulating progression of DN by combining omics data with regulatory factor-target information.

View Article and Find Full Text PDF

Objective: We designed and tested a Nanopore sequencing panel for direct tuberculosis drug resistance profiling. The panel targeted 10 resistance-associated loci. We assessed the feasibility of amplifying and sequencing these loci from 23 clinical specimens with low bacillary burden.

View Article and Find Full Text PDF

Single-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. A computationally intensive consensus step is needed to resolve the discrepancies in the reads.

View Article and Find Full Text PDF

To explore potential utility of metagenomic sequencing for improving etiologic diagnosis of infective endocarditis (IE) caused by fastidious bacteria. Plasma and heart valves of two patients, who were diagnosed with IE caused by and species, were sequenced by using Illumina MiSeq and Nanopore MinION. For patient 1, was detected in the plasma pool collected 4 days before valvular replacement surgery.

View Article and Find Full Text PDF