Background: Mortality is a critical variable in health care research, especially for evaluating medical product safety and effectiveness. However, inconsistencies in the availability and timeliness of death date and cause of death (CoD) information present significant challenges. Conventional sources such as the National Death Index and electronic health records often experience data lags, missing fields, or incomplete coverage, limiting their utility in time-sensitive or large-scale studies.
View Article and Find Full Text PDFBackground: The Kansas City Cardiomyopathy Questionnaire-12 (KCCQ-12), a patient-reported outcome measure for adults with heart failure, is associated with hospitalizations and mortality in clinical trials. Curated data sets from controlled trials differ substantially from pragmatic data collected from real-world settings, however, and few data exist on the KCCQ-12's predictive utility in clinical practice.
Objectives: This study sought to evaluate the predictive utility of the KCCQ-12 for hospitalizations and mortality when administered during outpatient heart failure care.
JAMA Netw Open
August 2024
Importance: The Sentinel System is a key component of the US Food and Drug Administration (FDA) postmarketing safety surveillance commitment and uses clinical health care data to conduct analyses to inform drug labeling and safety communications, FDA advisory committee meetings, and other regulatory decisions. However, observational data are frequently deemed insufficient for reliable evaluation of safety concerns owing to limitations in underlying data or methodology. Advances in large language models (LLMs) provide new opportunities to address some of these limitations.
View Article and Find Full Text PDFJ Am Med Inform Assoc
October 2024
Objectives: Large language models (LLMs) have demonstrated remarkable success in natural language processing (NLP) tasks. This study aimed to evaluate their performances on social media-based health-related text classification tasks.
Materials And Methods: We benchmarked 1 Support Vector Machine (SVM), 3 supervised pretrained language models (PLMs), and 2 LLMs-based classifiers across 6 text classification tasks.
Stud Health Technol Inform
January 2024
Few-shot learning (FSL) is a category of machine learning models that are designed with the intent of solving problems that have small amounts of labeled data available for training. FSL research progress in natural language processing (NLP), particularly within the medical domain, has been notably slow, primarily due to greater difficulties posed by domain-specific characteristics and data sparsity problems. We explored the use of novel methods for text representation and encoding combined with distance-based measures for improving FSL entity detection.
View Article and Find Full Text PDFJ Biomed Inform
August 2023
Background: Few-shot learning (FSL) is a class of machine learning methods that require small numbers of labeled instances for training. With many medical topics having limited annotated text-based data in practical settings, FSL-based natural language processing (NLP) holds substantial promise. We aimed to conduct a review to explore the current state of FSL methods for medical NLP.
View Article and Find Full Text PDFBackground The Fontan operation is associated with significant morbidity and premature mortality. Fontan cases cannot always be identified by () codes, making it challenging to create large Fontan patient cohorts. We sought to develop natural language processing-based machine learning models to automatically detect Fontan cases from free texts in electronic health records, and compare their performances with code-based classification.
View Article and Find Full Text PDFBackground: Social media has served as a lucrative platform for spreading misinformation and for promoting fraudulent products for the treatment, testing, and prevention of COVID-19. This has resulted in the issuance of many warning letters by the US Food and Drug Administration (FDA). While social media continues to serve as the primary platform for the promotion of such fraudulent products, it also presents the opportunity to identify these products early by using effective social media mining methods.
View Article and Find Full Text PDFIntimate partner violence (IPV) increased during the COVID-19 pandemic. Collecting actionable IPV-related data from conventional sources (e.g.
View Article and Find Full Text PDFIntimate partner violence (IPV) is a preventable public health problem that affects millions of people worldwide. Approximately one in four women are estimated to be or have been victims of severe violence at some point in their lives, irrespective of age, ethnicity, and economic status. Victims often report IPV experiences on social media, and automatic detection of such reports via machine learning may enable improved surveillance and targeted distribution of support and/or interventions for those in need.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
February 2023
Traditional substance use (SU) surveillance methods, such as surveys, incur substantial lags. Due to the continuously evolving trends in SU, insights obtained via such methods are often outdated. Social media-based sources have been proposed for obtaining timely insights, but methods leveraging such data cannot typically provide fine-grained statistics about subpopulations, unlike traditional approaches.
View Article and Find Full Text PDFAmericans bear a high chronic stress burden, particularly during the COVID-19 pandemic. Although social media have many strengths to complement the weaknesses of conventional stress measures, including surveys, they have been rarely utilized to detect individuals self-reporting chronic stress. Thus, this study aimed to develop and evaluate an automatic system on Twitter to identify users who have self-reported chronic stress experiences.
View Article and Find Full Text PDFThe COVID-19 pandemic is the most devastating public health crisis in at least a century and has affected the lives of billions of people worldwide in unprecedented ways. Compared to pandemics of this scale in the past, societies are now equipped with advanced technologies that can mitigate the impacts of pandemics if utilized appropriately. However, opportunities are currently not fully utilized, particularly at the intersection of data science and health.
View Article and Find Full Text PDFEur J Public Health
November 2022
Illicit or 'designer' benzodiazepines are a growing contributor to overdose deaths. We employed natural language processing (NLP) to study benzodiazepine mentions over 10 years on 270 online drug forums (subreddits) on Reddit. Using NLP, we automatically detected mentions of illicit and prescription benzodiazepines, including their misspellings and non-standard names, grouping relative mentions by quarter.
View Article and Find Full Text PDFHealthcare (Basel)
August 2022
Pretrained contextual language models proposed in the recent past have been reported to achieve state-of-the-art performances in many natural language processing (NLP) tasks, including those involving health-related social media data. We sought to evaluate the effectiveness of different pretrained transformer-based models for social media-based health-related text classification tasks. An additional objective was to explore and propose effective pretraining strategies to improve machine learning performance on such datasets and tasks.
View Article and Find Full Text PDFAMIA Jt Summits Transl Sci Proc
June 2023
We investigated the utility of Twitter for conducting multi-faceted geolocation-centric pandemic surveillance, using India as an example. We collected over 4 million COVID19-related tweets related to the Indian outbreak between January and July 2021. We geolocated the tweets, applied natural language processing to characterize the tweets (eg.
View Article and Find Full Text PDFProc (IEEE Int Conf Healthc Inform)
June 2022
Many research problems involving medical texts have limited amounts of annotated data available (., expressions of rare diseases). Traditional supervised machine learning algorithms, particularly those based on deep neural networks, require large volumes of annotated data, and they underperform when only small amounts of labeled data are available.
View Article and Find Full Text PDFBackground: Despite recent rises in fatal overdoses involving multiple substances, there is a paucity of knowledge about stimulant co-use patterns among people who use opioids (PWUO) or people being treated with medications for opioid use disorder (PTMOUD). A better understanding of the timing and patterns in stimulant co-use among PWUO based on mentions of these substances on social media can help inform prevention programs, policy, and future research directions. This study examines stimulant co-mention trends among PWUO/PTMOUD on social media over multiple years.
View Article and Find Full Text PDFBackground: The behaviors and emotions associated with and reasons for nonmedical prescription drug use (NMPDU) are not well-captured through traditional instruments such as surveys and insurance claims. Publicly available NMPDU-related posts on social media can potentially be leveraged to study these aspects unobtrusively and at scale.
Methods: We applied a machine learning classifier to detect self-reports of NMPDU on Twitter and extracted all public posts of the associated users.
Front Digit Health
December 2020
As the volume of published medical research continues to grow rapidly, staying up-to-date with the best-available research evidence regarding specific topics is becoming an increasingly challenging problem for medical experts and researchers. The current COVID19 pandemic is a good example of a topic on which research evidence is rapidly evolving. Automatic query-focused text summarization approaches may help researchers to swiftly review research evidence by presenting salient and query-relevant information from newly-published articles in a condensed manner.
View Article and Find Full Text PDFThe capabilities of natural language processing (NLP) methods have expanded significantly in recent years, and progress has been particularly driven by advances in data science and machine learning. However, NLP is still largely underused in patient-oriented clinical research and care (POCRC). A key reason behind this is that clinical NLP methods are typically developed, optimized, and evaluated with narrowly focused data sets and tasks (eg, those for the detection of specific symptoms in free texts).
View Article and Find Full Text PDFThe spread of COVID-19 worldwide continues despite multidimensional efforts to curtail its spread and provide treatment. Efforts to contain the COVID-19 pandemic have triggered partial or full lockdowns across the globe. This paper presents a novel framework that intelligently combines machine learning models and the Internet of Things (IoT) technology specifically to combat COVID-19 in smart cities.
View Article and Find Full Text PDFObjective: Biomedical research involving social media data is gradually moving from population-level to targeted, cohort-level data analysis. Though crucial for biomedical studies, social media user's demographic information (eg, gender) is often not explicitly known from profiles. Here, we present an automatic gender classification system for social media and we illustrate how gender information can be incorporated into a social media-based health-related study.
View Article and Find Full Text PDF