Importance: Limited qualitative studies exist evaluating ambient artificial intelligence (AI) scribe tools. Such studies can provide deeper insights into ambient AI implementations by capturing lived experiences.
Objective: To evaluate physician perspectives on ambient AI scribes.
With rapidly evolving artificial intelligence solutions, healthcare organizations need an implementation roadmap. A "clinical trials" informed approach can promote safe and impactful implementation of artificial intelligence. This framework includes four phases: (1) Safety; (2) Efficacy; (3) Effectiveness and comparison to an existing standard; and (4) Monitoring.
View Article and Find Full Text PDFBackground: Generative AI, particularly large language models (LLMs), holds great potential for improving patient care and operational efficiency in healthcare. However, the use of LLMs is complicated by regulatory concerns around data security and patient privacy. This study aimed to develop and evaluate a secure infrastructure that allows researchers to safely leverage LLMs in healthcare while ensuring HIPAA compliance and promoting equitable AI.
View Article and Find Full Text PDFJ Am Med Inform Assoc
February 2025
Objectives: To quantify utilization and impact on documentation time of a large language model-powered ambient artificial intelligence (AI) scribe.
Materials And Methods: This prospective quality improvement study was conducted at a large academic medical center with 45 physicians from 8 ambulatory disciplines over 3 months. Utilization and documentation times were derived from electronic health record (EHR) use measures.
J Am Med Inform Assoc
February 2025
Objective: This study evaluates the pilot implementation of ambient AI scribe technology to assess physician perspectives on usability and the impact on physician burden and burnout.
Materials And Methods: This prospective quality improvement study was conducted at Stanford Health Care with 48 physicians over a 3-month period. Outcome measures included burden, burnout, usability, and perceived time savings.
Introduction: Influenza causes significant mortality and morbidity in the U.S., yet less than half of adults receive influenza vaccination.
View Article and Find Full Text PDFImportance: Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas.
Objective: To summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language processing (NLP) and natural language understanding (NLU) tasks, (4) dimension of evaluation, and (5) medical specialty.
Data Sources: A systematic search of PubMed and Web of Science was performed for studies published between January 1, 2022, and February 19, 2024.
The increasing interest in leveraging generative AI models in healthcare necessitates secure infrastructure at academic medical centers. Without an all-encompassing secure system, researchers may create their own insecure microprocesses, risking the exposure of protected health information (PHI) to the public internet or its inadvertent incorporation into AI model training. To address these challenges, our institution implemented a secure pathway to the Azure OpenAI Service using our own private OpenAI instance which we fully control to facilitate high-throughput, secure LLM queries.
View Article and Find Full Text PDFJAMA Netw Open
March 2024
Importance: The emergence and promise of generative artificial intelligence (AI) represent a turning point for health care. Rigorous evaluation of generative AI deployment in clinical practice is needed to inform strategic decision-making.
Objective: To evaluate the implementation of a large language model used to draft responses to patient messages in the electronic inbox.
Background: Despite the ubiquitous utilization of central venous catheters in clinical practice, their use commonly provokes thromboembolism. No prophylactic strategy has shown sufficient efficacy to justify routine use. Coagulation factors FXI (factor XI) and FXII (factor XII) represent novel targets for device-associated thrombosis, which may mitigate bleeding risk.
View Article and Find Full Text PDFObjectives: Tertiary and quaternary (TQ) care refers to complex cases requiring highly specialized health services. Our study aimed to compare the ability of a natural language processing (NLP) model to an existing human workflow in predictively identifying TQ cases for transfer requests to an academic health center.
Materials And Methods: Data on interhospital transfers were queried from the electronic health record for the 6-month period from July 1, 2020 to December 31, 2020.
Importance: There is increased interest in and potential benefits from using large language models (LLMs) in medicine. However, by simply wondering how the LLMs and the applications powered by them will reshape medicine instead of getting actively involved, the agency in shaping how these tools can be used in medicine is lost.
Observations: Applications powered by LLMs are increasingly used to perform medical tasks without the underlying language model being trained on medical records and without verifying their purported benefit in performing those tasks.
Objective: To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research.
Materials And Methods: The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to create, search, retrieve, and analyze patient data, as well as services for data deidentification, linkage, and processing to extract high-value information from healthcare IT systems. Data are made available via self-service and concierge access, on HIPAA compliant secure computing infrastructure supported by in-depth user training.
The success of foundation models such as ChatGPT and AlphaFold has spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. In this narrative review, we examine 84 foundation models trained on non-imaging EMR data (i.
View Article and Find Full Text PDFAlthough considered "benign," mild blood count abnormalities, genetic factors imparting inconsequential thrombotic risk, and low-risk premalignant blood disorders can have significant psychological and financial impact on our patients. Several studies have demonstrated that patients with noncancerous conditions have increased levels of anxiety with distress similar to those with malignancy. Additionally, referral to a classical hematologist can be a daunting process for many patients due to uncertainties surrounding the reason for referral or misconstrued beliefs in a cancer diagnosis ascribed to the pairing of oncology and hematology in medical practice.
View Article and Find Full Text PDFFront Digit Health
September 2022
Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we conducted a reliability audit of two models based on model performance and calibration as well as a fairness audit based on summary statistics, subgroup performance and subgroup calibration.
View Article and Find Full Text PDFImportance: Various model reporting guidelines have been proposed to ensure clinical prediction models are reliable and fair. However, no consensus exists about which model details are essential to report, and commonalities and differences among reporting guidelines have not been characterized. Furthermore, how well documentation of deployed models adheres to these guidelines has not been studied.
View Article and Find Full Text PDFAmong 880 healthcare workers with a positive severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) test, 264 (30.0%) infections were identified following receipt of at least 1 vaccine dose. Median SARS-CoV-2 cycle threshold values were highest among individuals receiving 2 vaccine doses, corresponding to lower viral shedding.
View Article and Find Full Text PDF