Publications by Zachary B Abrams | LitMetric

Publications by authors named "Zachary B Abrams"

Page 1 of 2

CeRTS: certainty retrieval token search in large language model clinical information extraction.

Lars E Schimmelpfennig , Kriti Bhattarai , Inez Y Oh , Jake Lever , Obi L Griffith , Zachary B Abrams

J Biomed Inform

August 2025

Objective: Large language models (LLMs) must effectively communicate their uncertainty to be viable in clinical settings. As such, the need for reliable uncertainty estimation grows increasingly urgent with the expanding use of LLMs for information extraction from electronic health records. Previous token-level uncertainty estimators have only used token probabilities within a single output sequence.

View Article and Find Full Text PDF

Novel Recurrent Cytogenetic Abnormalities Predict Overall Survival in Tetraploid/Near-Tetraploid Myelodysplastic Syndrome and Acute Myeloid Leukemia.

Matthew R Avenarius , Zachary B Abrams , Ling Guo , James S Blachly , Cecelia R Miller

Cancers (Basel)

April 2025

Background/objectives: Tetraploidy (4n = 92 chromosomes) and near-tetraploidy (81-103 chromosomes) (T/NT) are uncommon cytogenetic events in MDS/AML (~1%). Abnormalities reported to be associated with T/NT MDS/AML include -5/del(5q), -7/del(7q), +8, and +21. However, other clinically relevant abnormalities likely remain "hidden" in long strings of ISCN cytogenetic nomenclature when evaluated visually.

View Article and Find Full Text PDF

Transcriptome Complexity Disentangled: A Regulatory Molecules Approach.

Amir Asiaee , Zachary B Abrams , Heather H Pua , Kevin R Coombes

Int J Mol Sci

March 2025

Transcription factors (TFs) and microRNAs (miRNAs) are fundamental regulators of gene expression, cell state, and biological processes. This study investigated whether a small subset of TFs and miRNAs could accurately predict genome-wide gene expression. We analyzed 8895 samples across 31 cancer types from The Cancer Genome Atlas and identified 28 miRNA and 28 TF clusters using unsupervised learning.

View Article and Find Full Text PDF

SillyPutty: Improved clustering by optimizing the silhouette width.

Polina Bombina , Dwayne Tally , Zachary B Abrams , Kevin R Coombes

PLoS One

March 2025

Clustering is an important task in biomedical science, and it is widely believed that different data sets are best clustered using different algorithms. When choosing between clustering algorithms on the same data set, reseachers typically rely on global measures of quality, such as the mean silhouette width, and overlook the fine details of clustering. However, the silhouette width actually computes scores that describe how well each individual element is clustered.

View Article and Find Full Text PDF

SillyPutty: Improved clustering by optimizing the silhouette width.

Polina Bombina , Dwayne Tally , Zachary B Abrams , Kevin R Coombes

bioRxiv

November 2023

Unsupervised clustering is an important task in biomedical science. We developed a new clustering method, called SillyPutty, for unsupervised clustering. As test data, we generated a series of datasets using the Umpire R package.

View Article and Find Full Text PDF

Leveraging GPT-4 for Identifying Cancer Phenotypes in Electronic Health Records: A Performance Comparison between GPT-4, GPT-3.5-turbo, Flan-T5 and spaCy's Rule-based & Machine Learning-based methods.

Kriti Bhattarai , Inez Y Oh , Jonathan Moran Sierra , Jonathan Tang , Philip R O Payne , Zachary B Abrams

bioRxiv

April 2024

Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.

View Article and Find Full Text PDF

Electronic health record data quality assessment and tools: a systematic review.

Abigail E Lewis , Nicole Weiskopf , Zachary B Abrams , Randi Foraker , Albert M Lai

J Am Med Inform Assoc

September 2023

Objective: We extended a 2013 literature review on electronic health record (EHR) data quality assessment approaches and tools to determine recent improvements or changes in EHR data quality assessment methodologies.

Materials And Methods: We completed a systematic review of PubMed articles from 2013 to April 2023 that discussed the quality assessment of EHR data. We screened and reviewed papers for the dimensions and methods defined in the original 2013 manuscript.

View Article and Find Full Text PDF

Transcriptome Complexity Disentangled: A Regulatory Molecules Approach.

Amir Asiaee , Zachary B Abrams , Heather H Pua , Kevin R Coombes

bioRxiv

March 2025

Transcription factors (TFs) and microRNAs (miRNAs) are fundamental regulators of gene expression, cell state, and biological processes. This study investigated whether a small subset of TFs and miRNAs could accurately predict genome-wide gene expression. We analyzed 8895 samples across 31 cancer types from The Cancer Genome Atlas and identified 28 miRNA and 28 TF clusters using unsupervised learning.

View Article and Find Full Text PDF

RCytoGPS: an R package for reading and visualizing cytogenetics data.

Zachary B Abrams , Dwayne G Tally , Lynne V Abruzzo , Kevin R Coombes

Bioinformatics

December 2021

Summary: Cytogenetics data, or karyotypes, are among the most common clinically used forms of genetic data. Karyotypes are stored as standardized text strings using the International System for Human Cytogenomic Nomenclature (ISCN). Historically, these data have not been used in large-scale computational analyses due to limitations in the ISCN text format and structure.

View Article and Find Full Text PDF

Simulation-derived best practices for clustering clinical data.

Caitlin E Coombes , Xin Liu , Zachary B Abrams , Kevin R Coombes , Guy Brock

J Biomed Inform

June 2021

Introduction: Clustering analyses in clinical contexts hold promise to improve the understanding of patient phenotype and disease course in chronic and acute clinical medicine. However, work remains to ensure that solutions are rigorous, valid, and reproducible. In this paper, we evaluate best practices for dissimilarity matrix calculation and clustering on mixed-type, clinical data.

View Article and Find Full Text PDF

Pattern recognition in lymphoid malignancies using CytoGPS and Mercator.

Zachary B Abrams , Dwayne G Tally , Lin Zhang , Caitlin E Coombes , Philip R O Payne

BMC Bioinformatics

March 2021

Background: There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model. However, such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data.

View Article and Find Full Text PDF

Mercator: a pipeline for multi-method, unsupervised visualization and distance generation.

Zachary B Abrams , Caitlin E Coombes , Suli Li , Kevin R Coombes

Bioinformatics

September 2021

Summary: Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses.

View Article and Find Full Text PDF

Spatial cell type composition in normal and Alzheimers human brains is revealed using integrated mouse and human single cell RNA sequencing.

Travis S Johnson , Shunian Xiang , Bryan R Helm , Zachary B Abrams , Peter Neidecker

Sci Rep

October 2020

Single-cell RNA sequencing (scRNA-seq) resolves heterogenous cell populations in tissues and helps to reveal single-cell level function and dynamics. In neuroscience, the rarity of brain tissue is the bottleneck for such study. Evidence shows that, mouse and human share similar cell type gene markers.

View Article and Find Full Text PDF

CytoGPS: A large-scale karyotype analysis of CML data.

Zachary B Abrams , Suli Li , Lin Zhang , Caitlin E Coombes , Philip R O Payne

Cancer Genet

October 2020

Karyotyping, the practice of visually examining and recording chromosomal abnormalities, is commonly used to diagnose diseases of genetic origin, including cancers. Karyotypes are recorded as text written in the International System for Human Cytogenetic Nomenclature (ISCN). Downstream analysis of karyotypes is conducted manually, due to the visual nature of analysis and the linguistic structure of the ISCN.

View Article and Find Full Text PDF

Unsupervised machine learning and prognostic factors of survival in chronic lymphocytic leukemia.

Caitlin E Coombes , Zachary B Abrams , Suli Li , Lynne V Abruzzo , Kevin R Coombes

J Am Med Inform Assoc

July 2020

Objective: Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes.

View Article and Find Full Text PDF

A protocol to evaluate RNA sequencing normalization methods.

Zachary B Abrams , Travis S Johnson , Kun Huang , Philip R O Payne , Kevin Coombes

BMC Bioinformatics

December 2019

Background: RNA sequencing technologies have allowed researchers to gain a better understanding of how the transcriptome affects disease. However, sequencing technologies often unintentionally introduce experimental error into RNA sequencing data. To counteract this, normalization methods are standardly applied with the intent of reducing the non-biologically derived variability inherent in transcriptomic measurements.

View Article and Find Full Text PDF

Explaining Gene Expression Using Twenty-One MicroRNAs.

Amir Asiaee , Zachary B Abrams , Samantha Nakayiza , Deepa Sampath , Kevin R Coombes

J Comput Biol

July 2020

The transcriptome of a tumor contains detailed information about the disease. Although advances in sequencing technologies have generated larger data sets, there are still many questions about exactly how the transcriptome is regulated. One class of regulatory elements consists of microRNAs (or miRs), many of which are known to be associated with cancer.

View Article and Find Full Text PDF

Time-to-progression after front-line fludarabine, cyclophosphamide, and rituximab chemoimmunotherapy for chronic lymphocytic leukaemia: a retrospective, multicohort study.

Carmen D Herling , Kevin R Coombes , Axel Benner , Johannes Bloehdorn , Lynn L Barron , Zachary B Abrams

Lancet Oncol

November 2019

Background: Fludarabine, cyclophosphamide, and rituximab (FCR) has become a gold-standard chemoimmunotherapy regimen for patients with chronic lymphocytic leukaemia. However, the question remains of how to treat treatment-naive patients with IGHV-unmutated chronic lymphocytic leukaemia. We therefore aimed to develop and validate a gene expression signature to identify which of these patients are likely to achieve durable remissions with FCR chemoimmunotherapy.

View Article and Find Full Text PDF

CytoGPS: a web-enabled karyotype analysis tool for cytogenetics.

Zachary B Abrams , Lin Zhang , Lynne V Abruzzo , Nyla A Heerema , Suli Li

Bioinformatics

December 2019

Summary: Karyotype data are the most common form of genetic data that is regularly used clinically. They are collected as part of the standard of care in many diseases, particularly in pediatric and cancer medicine contexts. Karyotypes are represented in a unique text-based format, with a syntax defined by the International System for human Cytogenetic Nomenclature (ISCN).

View Article and Find Full Text PDF

Inferring clonal heterogeneity in cancer using SNP arrays and whole genome sequencing.

Mark R Zucker , Lynne V Abruzzo , Carmen D Herling , Lynn L Barron , Michael J Keating , Zachary B Abrams

Bioinformatics

September 2019

View Article and Find Full Text PDF

Inferring clonal heterogeneity in cancer using SNP arrays and whole genome sequencing.

Mark R Zucker , Lynne V Abruzzo , Carmen D Herling , Lynn L Barron , Michael J Keating , Zachary B Abrams

Bioinformatics

September 2019

Motivation: Clonal heterogeneity is common in many types of cancer, including chronic lymphocytic leukemia (CLL). Previous research suggests that the presence of multiple distinct cancer clones is associated with clinical outcome. Detection of clonal heterogeneity from high throughput data, such as sequencing or single nucleotide polymorphism (SNP) array data, is important for gaining a better understanding of cancer and may improve prediction of clinical outcome or response to treatment.

View Article and Find Full Text PDF

Thirty biologically interpretable clusters of transcription factors distinguish cancer type.

Zachary B Abrams , Mark Zucker , Min Wang , Amir Asiaee Taheri , Lynne V Abruzzo

BMC Genomics

October 2018

Background: Transcription factors are essential regulators of gene expression and play critical roles in development, differentiation, and in many cancers. To carry out their regulatory programs, they must cooperate in networks and bind simultaneously to sites in promoter or enhancer regions of genes. We hypothesize that the mRNA co-expression patterns of transcription factors can be used both to learn how they cooperate in networks and to distinguish between cancer types.

View Article and Find Full Text PDF

Thresher: determining the number of clusters while removing outliers.

Min Wang , Zachary B Abrams , Steven M Kornblau , Kevin R Coombes

BMC Bioinformatics

January 2018

Background: Cluster analysis is the most common unsupervised method for finding hidden groups in data. Clustering presents two main challenges: (1) finding the optimal number of clusters, and (2) removing "outliers" among the objects being clustered. Few clustering algorithms currently deal directly with the outlier problem.

View Article and Find Full Text PDF

Lack of human cytomegalovirus expression in single cells from glioblastoma tumors and cell lines.

Travis S Johnson , Zachary B Abrams , Xiaokui Mo , Yan Zhang , Kun Huang

J Neurovirol

October 2017

The relationship between human cytomegalovirus (HCMV) and glioblastoma (GBM) is an ongoing debate with extensive evidence supporting or refuting its existence through molecular assays, pre-clinical studies, and clinical trials. We focus primarily on the crux of the debate, detection of HCMV in GBM samples using molecular assays. We propose that these differences in detection could be affected by cellular heterogeneity.

View Article and Find Full Text PDF

Plasma MicroRNA Levels Following Resection of Metastatic Melanoma.

Nicholas Latchana , Zachary B Abrams , J Harrison Howard , Kelly Regan , Naduparambil Jacob

Bioinform Biol Insights

February 2017

Melanoma remains the leading cause of skin cancer-related deaths. Surgical resection and adjuvant therapies can result in disease-free intervals for stage III and stage IV disease; however, recurrence is common. Understanding microRNA (miR) dynamics following surgical resection of melanomas is critical to accurately interpret miR changes suggestive of melanoma recurrence.

View Article and Find Full Text PDF