Background: Accurate data resources are essential for impactful medical research, but available structured datasets are often incomplete or inaccurate. Recent advances in open-weight large language models (LLMs) enable more accurate data extraction from unstructured text in electronic health records (EHRs) but have not yet been thoroughly validated for challenging diagnoses such as inflammatory bowel disease (IBD)-related neoplasia.
Objective: Create a validated approach using LLMs for identifying histopathologic diagnoses in pathology reports from the nationwide Veterans Health Administration database, including patients with genotype data within the Million Veteran Program (MVP) biobank.
Background: The risk of developing advanced neoplasia (AN; colorectal cancer and/or high-grade dysplasia) in ulcerative colitis (UC) patients with a low-grade dysplasia (LGD) lesion is variable and difficult to predict. This is a major challenge for effective clinical management.
Objective: We aimed to provide accurate AN risk stratification in UC patients with LGD.
Mathematical modeling of somatic evolution, a process impacting both host cells and microbial communities in the human body, can capture important dynamics driving carcinogenesis. Here we considered models for esophageal adenocarcinoma (EAC), a cancer that has dramatically increased in incidence over the past few decades in Western populations, with high case fatality rates due to late-stage diagnoses. Despite advancements in genomic analyses of the precursor Barrett's esophagus (BE), prevention of late-stage EAC remains a significant clinical challenge.
View Article and Find Full Text PDFAs next-generation sequencing technologies produce deeper genome coverages at lower costs, there is a critical need for reliable computational host DNA removal in metagenomic data. We find that insufficient host filtration using prior human genome references can introduce false sex biases and inadvertently permit flow-through of host-specific DNA during bioinformatic analyses, which could be exploited for individual identification. To address these issues, we introduce and benchmark three host filtration methods of varying throughput, with concomitant applications across low biomass samples such as skin and high microbial biomass datasets including fecal samples.
View Article and Find Full Text PDFAs next-generation sequencing technologies produce deeper genome coverages at lower costs, there is a critical need for reliable computational host DNA removal in metagenomic data. We find that insufficient host filtration using prior human genome references can introduce false sex biases and inadvertently permit flow-through of host-specific DNA during bioinformatic analyses, which could be exploited for individual identification. To address these issues, we introduce and benchmark three host filtration methods of varying throughput, with concomitant applications across low biomass samples such as skin and high microbial biomass datasets including fecal samples.
View Article and Find Full Text PDFNat Rev Gastroenterol Hepatol
November 2024
Digital twins provide a framework to advance the field of personalized medicine by generating clinically actionable strategies that leverage individualized data as well as current and emerging research. Strong interdisciplinary teamwork, specific funding mechanisms and integration of key biological details such as somatic evolution are necessary for the effective adoption of digital twins in medicine.
View Article and Find Full Text PDFBioinformatics
September 2023
Motivation: While evolutionary approaches to medicine show promise, measuring evolution itself is difficult due to experimental constraints and the dynamic nature of body systems. In cancer evolution, continuous observation of clonal architecture is impossible, and longitudinal samples from multiple timepoints are rare. Increasingly available DNA sequencing datasets at single-cell resolution enable the reconstruction of past evolution using mutational history, allowing for a better understanding of dynamics prior to detectable disease.
View Article and Find Full Text PDFAutoimmunity and cancer represent two different aspects of immune dysfunction. Autoimmunity is characterized by breakdowns in immune self-tolerance, while impaired immune surveillance can allow for tumorigenesis. The class I major histocompatibility complex (MHC-I), which displays derivatives of the cellular peptidome for immune surveillance by CD8 T cells, serves as a common genetic link between these conditions.
View Article and Find Full Text PDFBackground: Long-term pouch surveillance outcomes for familial adenomatous polyposis (FAP) are unknown. We aimed to quantify surveillance outcomes and to determine which of selected possible predictive factors are associated with pouch dysplasia.
Methods: Retrospective analysis of collected data on 249 patients was performed, analyzing potential risk factors for the development of adenomas or advanced lesions ( ≥ 10 mm/high grade dysplasia (HGD)/cancer) in the pouch body and cuff using Cox proportional hazards models.
Clinical archives of patient material near-exclusively consist of formalin-fixed and paraffin-embedded (FFPE) blocks. The ability to precisely characterise mutational signatures from FFPE-derived DNA has tremendous translational potential. However, sequencing of DNA derived from FFPE material is known to be riddled with artefacts.
View Article and Find Full Text PDFAliment Pharmacol Ther
April 2022
Background: Lynch syndrome (LS) is an autosomal dominant familial condition caused by a pathogenic variant (PV) in a DNA mismatch repair gene, which then predisposes carriers to various cancers.
Aim: To review the pathogenesis, clinical presentation, differential diagnosis and clinical strategies for detection and management of LS.
Methods: A narrative review synthesising knowledge from published literature, as well as current National Comprehensive Cancer Network guidelines for management of LS was conducted.
The presence and role of microbes in human cancers has come full circle in the last century. Tumors are no longer considered aseptic, but implications for cancer biology and oncology remain underappreciated. Opportunities to identify and build translational diagnostics, prognostics, and therapeutics that exploit cancer's second genome-the metagenome-are manifold, but require careful consideration of microbial experimental idiosyncrasies that are distinct from host-centric methods.
View Article and Find Full Text PDFEsophageal adenocarcinoma (EAC) claims the lives of half of patients within the first year of diagnosis, and its incidence has rapidly increased since the 1970s despite extensive research into etiological factors. The changes in the microbiome within the distal esophagus in modern populations may help explain the growth in cases that other common EAC risk factors together cannot fully explain. The precursor to EAC is Barrett's esophagus (BE), a metaplasia adapted to a reflux-mediated microenvironment that can be challenging to diagnose in patients who do not undergo endoscopic screening.
View Article and Find Full Text PDFPatterns of cancer incidence, viewed over extended time periods, reveal important aspects of multistage carcinogenesis. Here we show how a multistage clonal expansion (MSCE) model for cancer can be harnessed to identify biological processes that shape the surprisingly dynamic and disparate incidence patterns of esophageal squamous cell carcinoma (ESCC) in the US population. While the dramatic rise in esophageal adenocarcinoma (EAC) in the US has been largely attributed to reflux related increases in the prevalence of Barrett's esophagus (BE), the premalignant field in which most EAC are thought to arise, only scant evidence exists for field cancerization contributing to ESCC.
View Article and Find Full Text PDFCancer screening and early detection efforts have been partially successful in reducing incidence and mortality, but many improvements are needed. Although current medical practice is informed by epidemiologic studies and experts, the decisions for guidelines are ultimately . We propose here that quantitative optimization of protocols can potentially increase screening success and reduce overdiagnosis.
View Article and Find Full Text PDFObjective: Barrett's oesophagus (BE) is a known precursor to oesophageal adenocarcinoma (OAC) but current clinical data have not been consolidated to address whether BE is the origin of all incident OAC, which would reinforce evidence for BE screening efforts. We aimed to answer whether all expected prevalent BE, diagnosed and undiagnosed, could account for all incident OACs in the US cancer registry data.
Design: We used a multiscale computational model of OAC that includes the evolutionary process from normal oesophagus through BE in individuals from the US population.
Chromosomal instability (CIN) comprises continual gain and loss of chromosomes or parts of chromosomes and occurs in the majority of cancers, often conferring poor prognosis. Because of a scarcity of functional studies and poor understanding of how genetic or gene expression landscapes connect to specific CIN mechanisms, causes of CIN in most cancer types remain unknown. High-grade serous ovarian carcinoma (HGSC), the most common subtype of ovarian cancer, is the major cause of death due to gynecologic malignancy in the Western world, with chemotherapy resistance developing in almost all patients.
View Article and Find Full Text PDFThe desire to analyse limited amounts of biological material, historic samples and rare cell populations has collectively driven the need for efficient methods for whole genome sequencing (WGS) of limited amounts of poor quality DNA. Most protocols are designed to recover double-stranded DNA (dsDNA) by ligating sequencing adaptors to dsDNA with or without subsequent polymerase chain reaction amplification of the library. While this is sufficient for many applications, limited DNA requires a method that can recover both single-stranded DNA (ssDNA) and dsDNA.
View Article and Find Full Text PDFPhys Biol
June 2019
Whether the nom de guerre is Mathematical Oncology, Computational or Systems Biology, Theoretical Biology, Evolutionary Oncology, Bioinformatics, or simply Basic Science, there is no denying that mathematics continues to play an increasingly prominent role in cancer research. Mathematical Oncology-defined here simply as the use of mathematics in cancer research-complements and overlaps with a number of other fields that rely on mathematics as a core methodology. As a result, Mathematical Oncology has a broad scope, ranging from theoretical studies to clinical trials designed with mathematical models.
View Article and Find Full Text PDF