Data-Driven Safe Policy Optimization for Black-Box Dynamical Systems With Temporal Logic Specifications.

Chenlin Zhang , Shijun Lin , Hao Wang , Ziyang Chen , Shaochen Wang , Zhen Kan

IEEE Trans Neural Netw Learn Syst

Published: February 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

Learning-based policy optimization methods have shown great potential for building general-purpose control systems. However, existing methods still struggle to achieve complex task objectives while ensuring policy safety during learning and execution phases for black-box systems. To address these challenges, we develop data-driven safe policy optimization (D2SPO), a novel reinforcement learning (RL)-based policy improvement method that jointly learns a control barrier function (CBF) for system safety and a linear temporal logic (LTL) guided RL algorithm for complex task objectives. Unlike many existing works that assume known system dynamics, by carefully constructing the data sets and redesigning the loss functions of D2SPO, a provably safe CBF is learned for black-box dynamical systems, which continuously evolves for improved system safety as RL interacts with the environment. To deal with complex task objectives, we take advantage of the capability of LTL in representing the task progress and develop LTL-guided RL policy for efficient completion of various tasks with LTL objectives. Extensive numerical and experimental studies demonstrate that D2SPO outperforms most state-of-the-art (SOTA) baselines and can achieve over 95% safety rate and nearly 100% task completion rates. The experiment video is available at https://youtu.be/2RgaH-zcmkY.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2023.3339885	DOI Listing

Publication Analysis

Top Keywords

policy optimization

complex task

task objectives

data-driven safe

safe policy

black-box dynamical

dynamical systems

temporal logic

system safety

policy

Similar Publications

Evaluating the Implementation of Online Postal Self-Sampling for Sexually Transmitted Infections in England: Multisite Qualitative Study.

J Med Internet Res

September 2025

University College London, London, United Kingdom.

Tommer Spence , Jo Gibbs , Geoff Wong , Alison Howarth , Andrew Copas

Background: Online postal self-sampling (OPSS) allows service users to screen for sexually transmitted infections (STIs) by ordering a self-sampling kit online, taking their own samples, returning them to a laboratory for testing, and receiving their results remotely. OPSS availability and use has increased in both the United Kingdom and globally the past decade but has been adopted in different regions of England at different times, with different models of delivery. It is not known why certain models were decided on or how implementation strategies have influenced outcomes, including the sustainability of OPSS in sexual health service delivery.

View Article and Find Full Text PDF

Similar Publications

Nationwide Insights on Immunotherapy in a Low- and Middle-Income Country: Armenia's Struggle for Equitable Cancer Care in an Out-of-Pocket System.

JCO Glob Oncol

May 2025

Yeolyan Hematology and Oncology Center, Yerevan, Armenia.

Amalya Sargsyan , Gevorg Tamamyan , Arman Oganisian , Davit Zohrabyan , Liana Safaryan

Purpose: In Armenia, a lower-middle-income country, cancer causes 21% of all deaths, with over half of cases diagnosed at advanced stages. Without universal health insurance, patients rely on out-of-pocket payments or black-market channels for costly immunotherapies, underscoring the need for real-world data to inform equitable policy reforms.

Methods: We conducted a multicenter, retrospective cohort study of patients who received at least one dose of an immune checkpoint inhibitor (ICI) between January 2017 and December 2023 across six Armenian oncology centers.

View Article and Find Full Text PDF

Similar Publications

Prevalence and predictors of viral load non-suppression among adolescents on dolutegravir-based antiretroviral therapy: A cross-sectional study from three urban clinics, Soroti City.

PLoS One

September 2025

School of Public Health, College of Health Sciences, Makerere University, Kampala, Uganda.

Connie Nait , Simple Ouma , Saadick Mugerwa Ssentongo , Boniface Oryokot , Abraham Ignatius Oluka

Background: Despite advances in HIV care, viral load suppression (VLS) among adolescents living with HIV (ALHIV) in Uganda continue to lag behind that of adults, even with the introduction of dolutegravir (DTG)-based regimens, the Youth and Adolescent Peer Supporter (YAPS) model, and community-based approaches. Understanding factors associated with HIV viral load non-suppression in this population is critical to inform HIV treatment policy. This study assessed the prevalence and predictors of viral load non-suppression among ALHIV aged 10-19 years on DTG-based ART in Soroti City, Uganda.

View Article and Find Full Text PDF

Similar Publications

Assessing Soil Pollution Potential through Spatial Heavy Metal Bioaccessibility for Health Risk Evaluation.

Integr Environ Assess Manag

September 2025

School of Public Health, Taipei Medical University, New Taipei City, 235040Taiwan.

Yen-Tzu Fan , Ying-Lin Wang , Ming-Chien Tsou , Zeng-Yei Hseu , Hsing-Cheng Hsi

Incorporating bioaccessibility into health risk assessments enhances the accuracy of exposure estimates for heavy metal (HM) pollution, supports targeted remediation, and informs public health and policy decisions, particularly for vulnerable populations. Because HM bioaccessibility depends on local soil and geographic characteristics, identifying its relationship with soil properties is crucial for assessing soil pollution potential. Although HM concentrations can be measured relatively easily, bioaccessibility requires complex laboratory procedures, limiting routine applications in regulatory contexts.

View Article and Find Full Text PDF

Similar Publications

Multiagent Inductive Policy Optimization.

IEEE Trans Neural Netw Learn Syst

September 2025

Yubo Huang , Xiaowei Zhao

Policy optimization methods are promising to tackle high-complexity reinforcement learning (RL) tasks with multiple agents. In this article, we derive a general trust region for policy optimization methods by considering the effect of subpolicy combinations among agents in multiagent environments. Based on this trust region, we propose an inductive objective to train the policy function, which can ensure agents learn monotonically improving policies.

View Article and Find Full Text PDF

Similar Publications