Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Learning-based policy optimization methods have shown great potential for building general-purpose control systems. However, existing methods still struggle to achieve complex task objectives while ensuring policy safety during learning and execution phases for black-box systems. To address these challenges, we develop data-driven safe policy optimization (D2SPO), a novel reinforcement learning (RL)-based policy improvement method that jointly learns a control barrier function (CBF) for system safety and a linear temporal logic (LTL) guided RL algorithm for complex task objectives. Unlike many existing works that assume known system dynamics, by carefully constructing the data sets and redesigning the loss functions of D2SPO, a provably safe CBF is learned for black-box dynamical systems, which continuously evolves for improved system safety as RL interacts with the environment. To deal with complex task objectives, we take advantage of the capability of LTL in representing the task progress and develop LTL-guided RL policy for efficient completion of various tasks with LTL objectives. Extensive numerical and experimental studies demonstrate that D2SPO outperforms most state-of-the-art (SOTA) baselines and can achieve over 95% safety rate and nearly 100% task completion rates. The experiment video is available at https://youtu.be/2RgaH-zcmkY.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2023.3339885DOI Listing

Publication Analysis

Top Keywords

policy optimization
12
complex task
12
task objectives
12
data-driven safe
8
safe policy
8
black-box dynamical
8
dynamical systems
8
temporal logic
8
system safety
8
policy
6

Similar Publications

Background: Online postal self-sampling (OPSS) allows service users to screen for sexually transmitted infections (STIs) by ordering a self-sampling kit online, taking their own samples, returning them to a laboratory for testing, and receiving their results remotely. OPSS availability and use has increased in both the United Kingdom and globally the past decade but has been adopted in different regions of England at different times, with different models of delivery. It is not known why certain models were decided on or how implementation strategies have influenced outcomes, including the sustainability of OPSS in sexual health service delivery.

View Article and Find Full Text PDF

Purpose: In Armenia, a lower-middle-income country, cancer causes 21% of all deaths, with over half of cases diagnosed at advanced stages. Without universal health insurance, patients rely on out-of-pocket payments or black-market channels for costly immunotherapies, underscoring the need for real-world data to inform equitable policy reforms.

Methods: We conducted a multicenter, retrospective cohort study of patients who received at least one dose of an immune checkpoint inhibitor (ICI) between January 2017 and December 2023 across six Armenian oncology centers.

View Article and Find Full Text PDF

Background: Despite advances in HIV care, viral load suppression (VLS) among adolescents living with HIV (ALHIV) in Uganda continue to lag behind that of adults, even with the introduction of dolutegravir (DTG)-based regimens, the Youth and Adolescent Peer Supporter (YAPS) model, and community-based approaches. Understanding factors associated with HIV viral load non-suppression in this population is critical to inform HIV treatment policy. This study assessed the prevalence and predictors of viral load non-suppression among ALHIV aged 10-19 years on DTG-based ART in Soroti City, Uganda.

View Article and Find Full Text PDF

Incorporating bioaccessibility into health risk assessments enhances the accuracy of exposure estimates for heavy metal (HM) pollution, supports targeted remediation, and informs public health and policy decisions, particularly for vulnerable populations. Because HM bioaccessibility depends on local soil and geographic characteristics, identifying its relationship with soil properties is crucial for assessing soil pollution potential. Although HM concentrations can be measured relatively easily, bioaccessibility requires complex laboratory procedures, limiting routine applications in regulatory contexts.

View Article and Find Full Text PDF

Multiagent Inductive Policy Optimization.

IEEE Trans Neural Netw Learn Syst

September 2025

Policy optimization methods are promising to tackle high-complexity reinforcement learning (RL) tasks with multiple agents. In this article, we derive a general trust region for policy optimization methods by considering the effect of subpolicy combinations among agents in multiagent environments. Based on this trust region, we propose an inductive objective to train the policy function, which can ensure agents learn monotonically improving policies.

View Article and Find Full Text PDF