Publications by authors named "Yong-Jin Liu"

Current prevailing vision-language models have achieved remarkable progress in 3D scene understanding while trained in the closed-set setting and with full labels. The major bottleneck for the current robot 3D scene recognition approach for robotic applications is that these models do not have the capacity to recognize any unseen novel classes beyond the training categories in diverse real-world robot applications such as robot manipulation as well as robot navigation. In the meantime, current state-of-the-art 3D scene understanding approaches primarily require a large number of high-quality labels to train neural networks, which merely perform well in a fully supervised manner.

View Article and Find Full Text PDF

Current unsupervised reinforcement learning methods often overlook reward nonstationarity during pre-training and the forgetting of exploratory behavior during fine-tuning. Our study introduces Self-Reference (SR), a novel add-on module designed to address both issues. SR stabilizes intrinsic rewards through historical referencing in pre-training, mitigating nonstationarity.

View Article and Find Full Text PDF

Traditional orthodontic treatment relies on subjective estimations of orthodontists and iterative communication with technicians to achieve desired tooth alignments. This process is time-consuming, complex, and highly dependent on the orthodontist's experience. With the development of artificial intelligence, there's a growing interest in leveraging deep learning methods to achieve tooth alignment automatically.

View Article and Find Full Text PDF

Plant sensors are commonly used in agricultural production, landscaping, and other fields to monitor plant growth and environmental parameters. As an important basic parameter in plant monitoring, leaf inclination angle (LIA) not only influences light absorption and pesticide loss but also contributes to genetic analysis and other plant phenotypic data collection. The measurements of LIA provide a basis for crop research as well as agricultural management, such as water loss, pesticide absorption, and illumination radiation.

View Article and Find Full Text PDF

The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence of flat and texture-less regions alongside delicate and fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors to recover the scene geometry. These methods excel in producing complete and smooth results for floor and wall areas.

View Article and Find Full Text PDF

Mixed emotions have attracted increasing interest recently, but existing datasets rarely focus on mixed emotion recognition from multimodal signals, hindering the affective computing of mixed emotions. On this basis, we present a multimodal dataset with four kinds of signals recorded while watching mixed and non-mixed emotion videos. To ensure effective emotion induction, we first implemented a rule-based video filtering step to select the videos that could elicit stronger positive, negative, and mixed emotions.

View Article and Find Full Text PDF

Applying diffusion models to image-to-image translation (I2I) has recently received increasing attention due to its practical applications. Previous attempts inject information from the source image into each denoising step for an iterative refinement, thus resulting in a time-consuming implementation. We propose an efficient method that equips a diffusion model with a lightweight translator, dubbed a Diffusion Model Translator (DMT), to accomplish I2I.

View Article and Find Full Text PDF

Generative Adversarial Networks have achieved significant advancements in generating and editing high-resolution images. However, most methods suffer from either requiring extensive labeled datasets or strong prior knowledge. It is also challenging for them to disentangle correlated attributes with few-shot data.

View Article and Find Full Text PDF
Article Synopsis
  • Conventional near-field acoustic holography often struggles with accurately reconstructing sound fields due to inadequate handling of block-sparse structures and mismatched block partitions.
  • This paper introduces a novel method using pattern-coupled Bayesian compressive sensing that leverages a hierarchical Gaussian-Gamma model to improve sparse reconstruction of sound fields.
  • The new approach allows for better identification of equivalent source strengths by employing hyperparameters that govern the sparsity of coefficients while considering neighboring elements, leading to enhanced reconstruction performance validated through simulations and experiments.
View Article and Find Full Text PDF
Article Synopsis
  • The paper introduces PCKRF, a new pipeline for 6D pose estimation that improves upon traditional methods like ICP by refining point clouds through a pose-sensitive completion network.
  • The pipeline consists of two main steps: completing input point clouds with pose information and then aligning these completed clouds to target clouds using a color-supported registration technique.
  • Results show that PCKRF enhances accuracy and stability in pose estimation, particularly with challenging objects, and can be integrated with existing methods for better overall performance.
View Article and Find Full Text PDF

As a significant geometric feature of 3D point clouds, sharp features play an important role in shape analysis, 3D reconstruction, registration, localization, etc. Current sharp feature detection methods are still sensitive to the quality of the input point cloud, and the detection performance is affected by random noisy points and non-uniform densities. In this paper, using the prior knowledge of geometric features, we propose a Multi-scale Laplace Network (MSL-Net), a new deep-learning-based method based on an intrinsic neighbor shape descriptor, to detect sharp features from 3D point clouds.

View Article and Find Full Text PDF

As communications are increasingly taking place virtually, the ability to present well online is becoming an indispensable skill. Online speakers are facing unique challenges in engaging with remote audiences. However, there has been a lack of evidence-based analytical systems for people to comprehensively evaluate online speeches and further discover possibilities for improvement.

View Article and Find Full Text PDF

A new species of the genus Hebius Thompson, 1913 is described from Youjiang District, Baise City, Guangxi Zhuang Autonomous Region, China, based on a single adult female specimen. It can be distinguished from its congeners by the following combination of characters: (1) dorsal scale rows 19-17-17, feebly keeled except the outermost row; (2) tail length comparatively long, TAL/TL ratio 0.30 in females; (3) ventrals 160 (+ 3 preventrals); (4) subcaudals 112; (5) supralabials 9, the fourth to sixth in contact with the eye; (6) infralabials 10, the first 5 touching the first pair of chin shields; (7) preocular 1; (8) postoculars 2; (9) temporals 4, arranged in three rows (1+1+2); (10) maxillary teeth 30, the last 3 enlarged, without diastem; (11) postocular streak presence; (12) background color of dorsal brownish black, a conspicuous, uniform, continuous beige stripe extending from behind the eye to the end of the tail; (13) anterior venter creamish-yellow, gradually fades to the rear, with irregular black blotches in the middle and outer quarter of ventrals, the posterior part almost completely black.

View Article and Find Full Text PDF

Benefiting from the intuitiveness and naturalness of sketch interaction, sketch-based video retrieval (SBVR) has received considerable attention in the video retrieval research area. However, most existing SBVR research still lacks the capability of accurate video retrieval with fine-grained scene content. To address this problem, in this paper we investigate a new task, which focuses on retrieving the target video by utilizing a fine-grained storyboard sketch depicting the scene layout and major foreground instances' visual characteristics (e.

View Article and Find Full Text PDF

3D dense captioning aims to semantically describe each object detected in a 3D scene, which plays a significant role in 3D scene understanding. Previous works lack a complete definition of 3D spatial relationships and the directly integrate visual and language modalities, thus ignoring the discrepancies between the two modalities. To address these issues, we propose a novel complete 3D relationship extraction modality alignment network, which consists of three steps: 3D object detection, complete 3D relationships extraction, and modality alignment caption.

View Article and Find Full Text PDF

For 3D animators, choreography with artificial intelligence has attracted more attention recently. However, most existing deep learning methods mainly rely on music for dance generation and lack sufficient control over generated dance motions. To address this issue, we introduce the idea of keyframe interpolation for music-driven dance generation and present a novel transition generation technique for choreography.

View Article and Find Full Text PDF

Positive human-agent relationships can effectively improve human experience and performance in human-machine systems or environments. The characteristics of agents that enhance this relationship have garnered attention in human-agent or human-robot interactions. In this study, based on the rule of the persona effect, we study the effect of an agent's social cues on human-agent relationships and human performance.

View Article and Find Full Text PDF

Simulating liquid-textile interaction has received great attention in computer graphics recently. Most existing methods take textiles as particles or parameterized meshes. Although these methods can generate visually pleasing results, they cannot simulate water content at a microscopic level due to the lack of geometrically modeling of textile's anisotropic structure.

View Article and Find Full Text PDF

Point cloud upsampling aims to generate dense point clouds from given sparse ones, which is a challenging task due to the irregular and unordered nature of point sets. To address this issue, we present a novel deep learning-based model, called PU-Flow, which incorporates normalizing flows and weight prediction techniques to produce dense points uniformly distributed on the underlying surface. Specifically, we exploit the invertible characteristics of normalizing flows to transform points between euclidean and latent spaces and formulate the upsampling process as ensemble of neighbouring points in a latent space, where the ensemble weights are adaptively learned from local geometric context.

View Article and Find Full Text PDF

Sketch-based image retrieval (SBIR) is a long-standing research topic in computer vision. Existing methods mainly focus on category-level or instance-level image retrieval. This paper investigates the fine-grained scene-level SBIR problem where a free-hand sketch depicting a scene is used to retrieve desired images.

View Article and Find Full Text PDF

Recent works have achieved remarkable performance for action recognition with human skeletal data by utilizing graph convolutional models. Existing models mainly focus on developing graph convolutional operations to encode structural properties of a skeletal graph, whose topology is manually predefined and fixed over all action samples. Some recent works further take sample-dependent relationships among joints into consideration.

View Article and Find Full Text PDF

Despite the recent advances in artificial tissue and organ engineering, how to generate large size viable and functional complex organs still remains as a grand challenge for regenerative medicine. Three-dimensional bioprinting has demonstrated its advantages as one of the major methods in fabricating simple tissues, yet it still faces difficulties to generate vasculatures and preserve cell functions in complex organ production. Here, we overcome the limitations of conventional bioprinting systems by converting a six degree-of-freedom robotic arm into a bioprinter, therefore enables cell printing on 3D complex-shaped vascular scaffolds from all directions.

View Article and Find Full Text PDF

Face portrait line drawing is a unique style of art which is highly abstract and expressive. However, due to its high semantic constraints, many existing methods learn to generate portrait drawings using paired training data, which is costly and time-consuming to obtain. In this paper, we propose a novel method to automatically transform face photos to portrait drawings using unpaired training data with two new features; i.

View Article and Find Full Text PDF