How can natural language processing help model informed drug development?: a review.

Roopal Bhatnagar , Sakshi Sardar , Maedeh Beheshti , Jagdeep T Podichetty

JAMIA Open

Quantitative Medicine, Critical Path Institute, Tucson, Arizona, USA.

Published: July 2022

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Objective: To summarize applications of natural language processing (NLP) in model informed drug development (MIDD) and identify potential areas of improvement.

Materials And Methods: Publications found on PubMed and Google Scholar, websites and GitHub repositories for NLP libraries and models. Publications describing applications of NLP in MIDD were reviewed. The applications were stratified into 3 stages: drug discovery, clinical trials, and pharmacovigilance. Key NLP functionalities used for these applications were assessed. Programming libraries and open-source resources for the implementation of NLP functionalities in MIDD were identified.

Results: NLP has been utilized to aid various processes in drug development lifecycle such as gene-disease mapping, biomarker discovery, patient-trial matching, adverse drug events detection, etc. These applications commonly use NLP functionalities of named entity recognition, word embeddings, entity resolution, assertion status detection, relation extraction, and topic modeling. The current state-of-the-art for implementing these functionalities in MIDD applications are transformer models that utilize transfer learning for enhanced performance. Various libraries in python, R, and Java like huggingface, sparkNLP, and KoRpus as well as open-source platforms such as DisGeNet, DeepEnroll, and Transmol have enabled convenient implementation of NLP models to MIDD applications.

Discussion: Challenges such as reproducibility, explainability, fairness, limited data, limited language-support, and security need to be overcome to ensure wider adoption of NLP in MIDD landscape. There are opportunities to improve the performance of existing models and expand the use of NLP in newer areas of MIDD.

Conclusions: This review provides an overview of the potential and pitfalls of current NLP approaches in MIDD.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188322	PMC
http://dx.doi.org/10.1093/jamiaopen/ooac043	DOI Listing

Publication Analysis

Top Keywords

nlp functionalities

nlp

natural language

language processing

model informed

informed drug

drug development

nlp midd

implementation nlp

functionalities midd

Similar Publications

A natural language processing pipeline for identifying pediatric long COVID symptoms and functional impacts in freeform clinical notes: a RECOVER study.

JAMIA Open

October 2025

Applied Clinical Research Center, Children's Hospital of Philadelphia, Philadelphia, PA 19104, United States.

H Timothy Bunnell , Cara Reedy , Vitaly Lorman , Ravi Jhaveri , Andrea Rivera-Sepulveda

Objective: To develop a natural language processing (NLP) pipeline for unstructured electronic health record (EHR) data to identify symptoms and functional impacts associated with Long COVID in children.

Materials And Methods: We analyzed 48 287 outpatient progress notes from 10 618 pediatric patients from 12 institutions. We evaluated notes obtained 28 to 179 days after a COVID-19 diagnosis or positive test.

View Article and Find Full Text PDF

Similar Publications

OSNRT1.1B-OSCNGC14/16-CA-OSNLP3 Pathway: Phosphorylation-Mediated Maintenance of Nitrogen Homeostasis.

Adv Sci (Weinh)

September 2025

Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.

Xiaohan Wang , Yongqiang Liu , Weiwei Li , Xiaojun Ma , Wei Wang

Nitrate, a crucial nutrient and signaling molecule, is extensively studied across plants. While the NRT1.1-NLP-centered pathway dominates nitrate signaling in Arabidopsis and rice, however, whether there is functional interaction or co-regulation between the primary nitrate response (PNR) and long-term nitrogen utilization remains unclear.

View Article and Find Full Text PDF

Similar Publications

Identification and expression characteristics of NIN-like protein (NLP) gene family in cucumber plant (Cucumis sativus L.).

BMC Plant Biol

August 2025

Henan Agricultural University, Zhengzhou, Henan, 450002, China.

Yang Li , Yijing Xing , Ruohan Jin , Luyu Li , Mengwei Huang

Background: Nodule inception-like Protein (NLP) family genes, as transcription factors, play an important role in regulating physiological responses of plants to adapt to external nitrogen environment changes. However, there are few reports on NLP genes in cucumber.

Results: In this study, we identified members of the cucumber family, conducted a comprehensive bioinformatics analysis of them, analyzed their evolutionary relationships, and predicted their potential functions.

View Article and Find Full Text PDF

Similar Publications

Enhancing and Not Replacing Clinical Expertise: Improving Named-Entity Recognition in Colonoscopy Reports Through Mixed Real-Synthetic Training Sources.

J Pers Med

July 2025

Department M4-Clinical Sciences, Gastroenterology Medical VII, George Emil Palade University of Medicine, Pharmacy, Science, and Technology of Targu Mures, 540139 Targu Mures, Romania.

Andrei-Constantin Ioanovici , Andrei-Marian Feier , Marius-Ștefan Mărușteri , Alina-Dia Trâmbițaș-Miron , Daniela-Ecaterina Dobru

: In routine practice, colonoscopy findings are saved as unstructured free text, limiting secondary use. Accurate named-entity recognition (NER) is essential to unlock these descriptions for quality monitoring, personalized medicine and research. We compared named-entity recognition (NER) models trained on real, synthetic, and mixed data to determine whether privacy preserving synthetic reports can boost clinical information extraction.

View Article and Find Full Text PDF

Similar Publications

Healthcare professional classification of "poor glucose control" and perinatal outcomes in pregnancies with diabetes: a retrospective cohort study.

BMJ Open Diabetes Res Care

August 2025

Obstetrics, Gynecology and Women's Health, University of Minnesota, Minneapolis, Minnesota, USA

Anwei Gwan , Isai Ortiz , Katelyn M Tessier , Renee Mahr , Anna Ayers Looby

Introduction: Early birth is often recommended for "poorly controlled" diabetes; however, no guidelines define the glycemic threshold that necessitates delivery. We use natural language processing (NLP) of electronic health records to identify individuals described by healthcare professionals as having "poor glucose control" and to examine the factors and outcomes associated with this categorization RESEARCH DESIGN AND METHODS: We completed a retrospective cohort study of pregnant individuals with pre-existing and gestational diabetes mellitus from 2018 to 2019. NLP identified prespecified terms indicating "poor glucose control" in clinical notes, and a cohort analysis compared those with and without "poor glucose control" language.

View Article and Find Full Text PDF

Similar Publications