Improving intelligent perception and decision optimization of pedestrian crossing scenarios in autonomous driving environments through large visual language models.

Xiao Teng , Lin Huang , Zhenjiang Shen , Wankai Li

Sci Rep

Faculty of Transdisciplinary Sciences, Institute of Philosophy in Interdisciplinary Sciences, Kanazawa University, Kanazawa, 920-1192, Japan.

Published: August 2025

Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

This study leverages large Visual Language Models (VLM) to develop an intelligent pedestrian crossing scenario system within autonomous driving environments. By establishing standardized checklists and prompts, the system minimizes the risks of misjudgment and omission through multimodal data processing. It offers data-driven decision-making support, presenting an innovative approach to integrating autonomous driving technology with intelligent transportation systems. The study begins by classifying pedestrian crossing scenarios based on international autonomous driving standards, distinguishing between pedestrian crossings and autonomous vehicle crossings, as well as dynamic and static entities. Next, standardized prompts derived from these standards are fed into the VLM, generating structured scenario checklists of dynamic and static entities, outputted in JSON format. This systematic identification and processing of entities-such as pedestrians, vehicles, and traffic facilities-enables the construction of structured data representations for complex traffic scenarios. Building on this foundation, the VLM analyzes scenario data to predict collision risks by modeling the behaviors of both pedestrians and vehicles, supporting real-time decision-making for autonomous vehicles and road users. Furthermore, the VLM processes scene data to anticipate potential conflicts and provide actionable safety recommendations, enhancing the overall security of all traffic participants. The system achieved a perception accuracy of 93.05%, with risk prediction consistency and decision-making rule consistency rates of 85.91% and 87.72% respectively. By constructing a VLM-based intelligent pedestrian crossing perception system, this study offers a novel technical framework for improving perception, prediction, and decision-making in autonomous driving. Unlike traditional rule-based and deep learning approaches, which struggle with complex pedestrian behaviors and dynamic environments, our method integrates visual perception with reasoning capabilities, enabling structured, standardized, and explainable decision-making in pedestrian crossing scenarios.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC12379253	PMC
http://dx.doi.org/10.1038/s41598-025-14827-x	DOI Listing

Publication Analysis

Top Keywords

pedestrian crossing

autonomous driving

crossing scenarios

driving environments

large visual

visual language

language models

intelligent pedestrian

dynamic static

static entities

Similar Publications

The Silent Surge: Obesity Driving a Global Cardiovascular Crisis.

Glob Heart

September 2025

Manipal Academy of Higher Education, Manipal, Karnataka, India.

Panniyammakal Jeemon , Sivasankaran Sivasubramonian

Recent global estimates indicate that more than one billion people live with obesity, a figure that has doubled since 1990. When overweight individuals are included, nearly 2.5 billion adults are affected, with high body mass index contributing to an estimated 1.

View Article and Find Full Text PDF

Similar Publications

Assessment of the safety benefits of HUD warning under high-risk pedestrian crossing event in the connected environment.

Accid Anal Prev

September 2025

Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, China.

Yu Zhang , Xiaohua Zhao , Yang Bian , Jianling Huang , Duan Yu

The head-up display (HUD) warning system in a connected environment is expected to improve driving behavior and enhance pedestrian crossing safety. While existing research has preliminarily examined the effectiveness of HUD warning system in avoiding pedestrian collisions, scant attention has been given to the microcosmic influence on driving behavior and a precise quantification of its overall benefits, especially in high-risk pedestrian crossing scenarios. To investigate these influences, this study employed driving simulations to construct six connected scenarios: three warning systems (Baseline/head-down display(HDD)/HUD) × two weather conditions (clear weather/foggy weather).

View Article and Find Full Text PDF

Similar Publications

CLIP-based Multi-modal Feature Learning for Cloth-changing Person Re-Identification.

IEEE Trans Image Process

September 2025

Guoqing Zhang , Jieqiong Zhou , Lu Jiang , Yuhui Zheng , Weisi Lin

Contrastive Language-Image Pre-training (CLIP) has achieved remarkable results in the field of person re-identification (ReID) due to its excellent cross-modal understanding ability and high scalability. Since the text encoder of CLIP mainly focuses on easy-to-describe attributes such as clothing, and clothing is the main interference factor that reduces the recognition accuracy in cloth-changing person ReID (CC ReID). Consequently, directly applying CLIP to cloth-changing scenario may be difficult to adapt to such dynamic feature changes, thereby affecting the precision of identification.

View Article and Find Full Text PDF

Similar Publications

Behavioral and psychological determinants of pedestrian collisions on arterial roads with evidence from random parameter models.

Sci Rep

August 2025

Texas A&M Transportation Institute, Roadway Safety, Bryan, TX, 77807, USA.

Ahmed Hossain , Subasish Das , Monire Jafari , Michael Starewich , Rohit Chakraborty

Arterial roads, while comprising a small percentage of total roadway mileage in the U.S., contribute disproportionately to pedestrian fatalities.

View Article and Find Full Text PDF

Similar Publications

Driving Behavior of Older and Younger Drivers in Simplified Emergency Scenarios.

Sensors (Basel)

August 2025

School of Urban Construction and Transportation, Hefei University, Hefei 230606, China.

Yun Xiao , Mingming Dai , Shouqiang Xue

This study focuses on exploring the differences in driving abilities in emergency traffic situations between older drivers (aged 60-70) and young drivers (aged 20-35) in a simple traffic environment. Two typical emergency scenarios were designed in the experiment: Scenario A (intrusion of electric bicycles) and Scenario B (pedestrians crossing the road). The experiment employed a driving simulation system to synchronously collect data on eye movement characteristics, driving behavior, and physiological metrics from 30 drivers.

View Article and Find Full Text PDF

Similar Publications