98%
921
2 minutes
20
Considering how to make the model accurately understand and follow natural language instructions and perform actions consistent with world knowledge is a key challenge in robot manipulation. This mainly includes human fuzzy instruction reasoning and the following of physical knowledge. Therefore, the embodied intelligence agent must have the ability to model world knowledge from training data. However, most existing vision and language robot manipulation methods mainly operate in less realistic simulators and language settings and lack explicit modeling of world knowledge. To bridge this gap, we introduce a novel and simple robot manipulation framework, called Surfer. It is based on the world model, treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene. Then, the generalization ability of the model on new instructions and new scenes is enhanced by explicit modeling of the action and scene prediction in multimodal information. In addition, we built a robot manipulation simulation platform that supports physics execution based on the MuJoCo physics engine. It can automatically generate demonstration training data and test data, effectively reducing labor costs. To conduct a comprehensive and systematic evaluation of the visual-language understanding and physical execution of the manipulation model, we also created a robotic manipulation benchmark with different difficulty levels, called SeaWave. It contains four visual-language manipulation tasks of different difficulty levels and can provide a standardized testing platform for embedded AI agents in multimodal environments. Overall, we hope Surfer can freely surf in the robot's SeaWave benchmark. Extensive experiments show that Surfer consistently outperforms all baselines significantly in all manipulation tasks. On average, Surfer achieved a success rate of 54.74% on the defined four levels of manipulation tasks, exceeding the best baseline performance of 51.07%. The simulator, code, and benchmarks are released at https://pzhren.github.io/Surfer.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2025.3594117 | DOI Listing |
Microsyst Nanoeng
September 2025
Department of Ophthalmology, Key Laboratory of Precision Medicine for Eye Diseases of Zhejiang Province, Center for Rehabilitation Medicine,, Zhejiang Provincial People's Hospital (Affiliated People's Hospital, Hangzhou Medical College), Hangzhou, 314408, China.
Retinal surgery is one of the most delicate and complex operations, which is close to or even beyond the physiological limitation of the human hand. Robots have demonstrated the ability to filter hand tremors and motion scaling which has a promising output in microsurgery. Here, we present a novel soft micron accuracy robot (SMAR) for retinal surgery and achieve a more precise and safer operation.
View Article and Find Full Text PDFColloids Surf B Biointerfaces
September 2025
Nanobiosensorics Laboratory, Institute of Technical Physics and Materials Science, HUN-REN Centre for Energy Research, Budapest, Hungary; Nanobiosensorics Group, Institute of Biophysics, HUN-REN Biological Research Centre, Szeged, Hungary. Electronic address:
IEEE Trans Neural Netw Learn Syst
September 2025
Considering how to make the model accurately understand and follow natural language instructions and perform actions consistent with world knowledge is a key challenge in robot manipulation. This mainly includes human fuzzy instruction reasoning and the following of physical knowledge. Therefore, the embodied intelligence agent must have the ability to model world knowledge from training data.
View Article and Find Full Text PDFNanomicro Lett
September 2025
State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, 110016, People's Republic of China.
In the realm of secure information storage, optical encryption has emerged as a vital technique, particularly with the miniaturization of encryption devices. However, many existing systems lack the necessary reconfigurability and dynamic functionality. This study presents a novel approach through the development of dynamic optical-to-chemical energy conversion metamaterials, which enable enhanced steganography and multilevel information storage.
View Article and Find Full Text PDFRev Sci Instrum
September 2025
Hefei University of Technology, School of Mechanical Engineering, Hefei 230009, China.
In unstructured environments, robots face challenges in efficiently and accurately grasping irregular, fragile objects. To address this, this paper introduces a soft robotic hand tailored for such settings and enhances You Only Look Once v5s (YOLOv5s), a lightweight detection algorithm, to achieve efficient grasping. A rapid pneumatic network-based soft finger structure, broadly applicable to various irregularly placed objects, is designed, with a mathematical model linking the bending angle of the fingers to input gas pressure, validated through simulations.
View Article and Find Full Text PDF