Publications by authors named "Jignesh R Parikh"

Article Synopsis
  • Effective genetic diagnosis relies on linking genetic data to detailed clinical information, but manual data entry is time-consuming and prone to bias.
  • Natural language processing (NLP) can streamline this process, but variations in physician notes pose challenges; our methods improve NLP outputs for more accurate automatic diagnosis.
  • We developed a filtering system that enhances gene prioritization by using optimized extracted terms, showing that in 92% of cases, NLP could replace manual extraction, and in 75% of cases, we ranked the correct gene higher with filters applied.
View Article and Find Full Text PDF

Background: Application of novel machine learning approaches to electronic health record (EHR) data could provide valuable insights into disease processes. We utilized this approach to build predictive models for progression to prediabetes and type 2 diabetes (T2D).

Methods: Using a novel analytical platform (Reverse Engineering and Forward Simulation [REFS]), we built prediction model ensembles for progression to prediabetes or T2D from an aggregated EHR data sample.

View Article and Find Full Text PDF

Alternative RNA splicing (AS) regulates proteome diversity, including isoform-specific expression of several pluripotency genes. Here, we integrated global gene expression and proteomic analyses and identified a molecular signature suggesting a central role for AS in maintaining human pluripotent stem cell (hPSC) self-renewal. We demonstrate that the splicing factor SFRS2 is an OCT4 target gene required for pluripotency.

View Article and Find Full Text PDF

Etiologies for many inner ear disorders, including autoimmune inner ear disease, sudden sensorineural hearing loss, and Meniere's disease, remain unknown. Indirect evidence suggests an immune-mediated process involving an allergic or autoimmune mechanism. We examined whether known immunogenic proteins share sequence similarity with inner ear proteins, which may lead to cross-reactivity and detrimental immune activation.

View Article and Find Full Text PDF

Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships.

View Article and Find Full Text PDF

Mass spectrometry has become the method of choice for proteome characterization, including multicomponent protein complexes (typically tens to hundreds of proteins) and total protein expression (up to tens of thousands of proteins), in biological samples. Qualitative sequence assignment based on MS/MS spectra is relatively well-defined, while statistical metrics for relative quantification have not completely stabilized. Nonetheless, proteomics studies have progressed to the point whereby various gene-, pathway-, or network-oriented computational frameworks may be used to place mass spectrometry data into biological context.

View Article and Find Full Text PDF

Background: Motivated by the precarious state of the world's coral reefs, there is currently a keen interest in coral transcriptomics. By identifying changes in coral gene expression that are triggered by particular environmental stressors, we can begin to characterize coral stress responses at the molecular level, which should lead to the development of more powerful diagnostic tools for evaluating the health of corals in the field. Furthermore, the identification of genetic variants that are more or less resilient in the face of particular stressors will help us to develop more reliable prognoses for particular coral populations.

View Article and Find Full Text PDF

High-throughput gene-expression studies result in lists of differentially expressed genes. Most current meta-analyses of these gene lists include searching for significant membership of the translated proteins in various signaling pathways. However, such membership enrichment algorithms do not provide insight into which pathways caused the genes to be differentially expressed in the first place.

View Article and Find Full Text PDF

Background: Efficient analysis of results from mass spectrometry-based proteomics experiments requires access to disparate data types, including native mass spectrometry files, output from algorithms that assign peptide sequence to MS/MS spectra, and annotation for proteins and pathways from various database sources. Moreover, proteomics technologies and experimental methods are not yet standardized; hence a high degree of flexibility is necessary for efficient support of high- and low-throughput data analytic tasks. Development of a desktop environment that is sufficiently robust for deployment in data analytic pipelines, and simultaneously supports customization for programmers and non-programmers alike, has proven to be a significant challenge.

View Article and Find Full Text PDF

Proteomics-based analysis of signaling cascades relies on a growing suite of affinity resins and methods aimed at efficient enrichment of phosphorylated peptides from complex biological mixtures. Given the heterogeneity of phosphopeptides and the overlap in chemical properties between phospho- and unmodified peptides, it is likely that the use of multiple resins will provide the best combination of specificity, yield, and coverage for large-scale proteomics studies. Recently titanium and zirconium dioxides have been used successfully for enrichment of phosphopeptides.

View Article and Find Full Text PDF