Publications by authors named "Shaoyi Du"

Hypergraphs, with their ability to model complex, beyond pair-wise correlations, presents a significant advancement over traditional graphs for capturing intricate relational data across diverse domains. However, the integration of hypergraphs into self-supervised learning (SSL) frameworks has been hindered by the intricate nature of high-order structural variations. This paper introduces the Self-Supervised Hypergraph Training Framework via Structure-Aware Learning (SS-HT), designed to enhance the perception and measurement of these variations within hypergraphs.

View Article and Find Full Text PDF

Hypergraph Neural Networks (HGNNs) have attracted much attention for high-order structural data learning. Existing methods mainly focus on simple mean-based aggregation or manually combining multiple aggregations to capture multiple information on hypergraphs. However, those methods inherently lack continuous non-linear modeling ability and are sensitive to varied distributions.

View Article and Find Full Text PDF

Background: Osteoarthritis (OA) of the hip is a progressive musculoskeletal disorder characterized by stiffness and limited passive range of motion. Hip OA patients experience mobility impairment and altered gait patterns when compared to healthy controls (HCs). Although various interventions have been designed to alleviate these symptoms, it is unclear if there is a reliable method to track biomechanical changes in patients with unilateral hip OA in a clinical setting.

View Article and Find Full Text PDF

The accurate diagnosis of neurodegenerative diseases (NDDs), such as Amyotrophic Lateral Sclerosis (ALS), Huntington's Disease (HD), and Parkinson's Disease (PD), remains a clinical challenge due to the complexity and subtlety of gait abnormalities. This paper proposes the Dual-Branch Attention-Enhanced Residual Network (DAERN), a novel deep learning architecture that integrates Dilated Causal Convolutions (DCCBlock) for local gait pattern extraction and Multi-Head Self-Attention (MHSA) for long-range dependency modeling. A CrossAttention Fusion module enhances feature integration, while SHapley Additive exPlanations (SHAP) and Integrated Gradients (IG) improve interpretability, providing clinically relevant insights into gait-based NDD classification.

View Article and Find Full Text PDF

This paper explores a cross-modality synthesis task that infers 3D human-object interactions (HOIs) from a given text-based instruction. Existing text-to-HOI synthesis methods mainly deploy a direct mapping from texts to object-specific 3D body motions, which may encounter a performance bottleneck since the huge cross-modality gap. In this paper, we observe that those HOI samples with the same interaction intention toward different targets, e.

View Article and Find Full Text PDF

The goal of the hypergraph foundation model (HGFM) is to learn an encoder based on the hypergraph computational paradigm through self-supervised pretraining on high-order correlation structures, enabling the encoder to rapidly adapt to various downstream tasks in scenarios, where no labeled data or only a small amount of labeled data are available. The initial exploratory work has been applied to brain disease diagnosis tasks. However, existing methods primarily rely on graph-based approaches to learn low-order correlation patterns between brain regions in brain networks, neglecting the modeling and learning of complex correlations between different brain diseases and patients.

View Article and Find Full Text PDF

Survival prediction on histopathology whole slide images (WSIs) involves the analysis of multi-level complex correlations, such as inter-correlations among patients and intra-correlations within gigapixel histopathology images. However, the current graph-based methods for WSI analysis mainly focus on the exploration of pairwise correlations, resulting in the loss of high-order correlations. Hypergraph-based methods can handle such high-order correlations, while existing hypergraph-based methods fail to integrate multi-level high-order correlations into a unified framework, which limits the representation capability of WSIs.

View Article and Find Full Text PDF

We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features. Traditional YOLO models, while powerful, have limitations in their neck designs that restrict the integration of cross-level features and the exploitation of high-order feature interrelationships. To address these challenges, we propose the Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS) framework, which transposes visual feature maps into a semantic space and constructs a hypergraph for high-order message propagation.

View Article and Find Full Text PDF

Objective: To address the high-order correlation modeling and fusion challenges between functional and structural brain networks.

Method: This paper proposes a hypergraph transformer method for modeling high-order correlations between functional and structural brain networks. By utilizing hypergraphs, we can effectively capture the high-order correlations within brain networks.

View Article and Find Full Text PDF

Elastography ultrasound imaging is increasingly important in the diagnosis of thyroid cancer and other diseases, but its reliance on specialized equipment and techniques limits widespread adoption. This paper proposes a novel multimodal ultrasound diagnostic pipeline that expands the application of elastography ultrasound by translating B-ultrasound (BUS) images into elastography images (EUS). Additionally, to address the limitations of existing image-to-image translation methods, which struggle to effectively model inter-sample variations and accurately capture regional-scale structural consistency, we propose a BUS-to-EUS translation method based on hierarchical structural consistency.

View Article and Find Full Text PDF
Article Synopsis
  • The study investigates the impact of orthodontic treatment on 3D facial recognition, crucial for personal identification in forensic medicine.
  • It involved 68 orthodontic patients (30 with tooth extractions, 38 without) and a control group of 30 individuals, with facial models acquired before and after treatment.
  • Results indicated that orthodontic treatment does not significantly affect 3D-3D facial recognition, with an average root mean square value showing high accuracy in identifying matches versus mismatches among individuals.
View Article and Find Full Text PDF

Pneumonia is a serious disease that can be fatal, particularly among children and the elderly. The accuracy of pneumonia diagnosis can be improved by combining artificial-intelligence technology with X-ray imaging. This study proposes X-ODFCANet, which addresses the issues of low accuracy and excessive parameters in existing deep-learning-based pneumonia-classification methods.

View Article and Find Full Text PDF

Inferring 3D human motion is fundamental in many applications, including understanding human activity and analyzing one's intention. While many fruitful efforts have been made to human motion prediction, most approaches focus on pose-driven prediction and inferring human motion in isolation from the contextual environment, thus leaving the body location movement in the scene behind. However, real-world human movements are goal-directed and highly influenced by the spatial layout of their surrounding scenes.

View Article and Find Full Text PDF

Crowd counting models in highly congested areas confront two main challenges: weak localization ability and difficulty in differentiating between foreground and background, leading to inaccurate estimations. The reason is that objects in highly congested areas are normally small and high-level features extracted by convolutional neural networks are less discriminative to represent small objects. To address these problems, we propose a learning discriminative features framework for crowd counting, which is composed of a masked feature prediction module (MPM) and a supervised pixel-level contrastive learning module (CLM).

View Article and Find Full Text PDF
Article Synopsis
  • - Action recognition in videos is crucial but is limited in single-view scenarios, leading to a need for multi-view methods that use data from different angles for better accuracy.
  • - The text introduces HyperMV, a new framework for multi-view event-based action recognition that turns event data into frame-like representations, leveraging a multi-view hypergraph neural network that enhances feature fusion through vertex attention.
  • - A new dataset called THU-50 is introduced, featuring 50 actions from 6 different viewpoints, which is significantly larger than existing datasets, and HyperMV demonstrates superior performance in recognizing actions across various scenarios compared to current methods.
View Article and Find Full Text PDF

Self-supervised representation learning for 3D point clouds has attracted increasing attention. However, existing methods in the field of 3D computer vision generally use fixed embeddings to represent the latent features, and impose hard constraints on the embeddings to make the latent feature values of the positive samples converge to consistency, which limits the ability of feature extractors to generalize over different data domains. To address this issue, we propose a Generative Variational-Contrastive Learning (GVC) model, where Gaussian distribution is used to construct a continuous, smoothed representation of the latent features.

View Article and Find Full Text PDF

Predicting the trajectory of pedestrians in crowd scenarios is indispensable in self-driving or autonomous mobile robot field because estimating the future locations of pedestrians around is beneficial for policy decision to avoid collision. It is a challenging issue because humans have different walking motions, and the interactions between humans and objects in the current environment, especially between humans themselves, are complex. Previous researchers focused on how to model human-human interactions but neglected the relative importance of interactions.

View Article and Find Full Text PDF

In this article, we propose a novel cascaded diffusion-based generative framework for text-driven human motion synthesis, which exploits a strategy named GradUally Enriching SyntheSis (GUESS as its abbreviation). The strategy sets up generation objectives by grouping body joints of detailed skeletons in close semantic proximity together and then replacing each of such joint group with a single body-part node. Such an operation recursively abstracts a human pose to coarser and coarser skeletons at multiple granularity levels.

View Article and Find Full Text PDF

In the realm of modern medicine, medical imaging stands as an irreplaceable pillar for accurate diagnostics. The significance of precise segmentation in medical images cannot be overstated, especially considering the variability introduced by different practitioners. With the escalating volume of medical imaging data, the demand for automated and efficient segmentation methods has become imperative.

View Article and Find Full Text PDF

Counting objects in crowded scenes remains a challenge to computer vision. The current deep learning based approach often formulate it as a Gaussian density regression problem. Such a brute-force regression, though effective, may not consider the annotation displacement properly which arises from the human annotation process and may lead to different distributions.

View Article and Find Full Text PDF

The traditional 3D object retrieval (3DOR) task is under the close-set setting, which assumes the categories of objects in the retrieval stage are all seen in the training stage. Existing methods under this setting may tend to only lazily discriminate their categories, while not learning a generalized 3D object embedding. Under such circumstances, it is still a challenging and open problem in real-world applications due to the existence of various unseen categories.

View Article and Find Full Text PDF

In the 3D skeleton-based action recognition task, learning rich spatial and temporal motion patterns from body joints are two foundational yet under-explored problems. In this paper, we propose two methods for improving these problems: (I) a novel glimpse-focus action recognition strategy that captures multi-range pose features from the whole body and key body parts jointly; (II) a powerful temporal feature extractor JD-TC that enriches trajectory features by inferring different inter-frame correlations for different joints. By coupling these two proposals, we develop a powerful skeleton-based action recognition system that extracts rich pose and trajectory features from a skeleton sequence and outperforms previous state-of-the-art methods on three large-scale datasets.

View Article and Find Full Text PDF

After decades of investigation, point cloud registration is still a challenging task in practice, especially when the correspondences are contaminated by a large number of outliers. It may result in a rapidly decreasing probability of generating a hypothesis close to the true transformation, leading to the failure of point cloud registration. To tackle this problem, we propose a transformation estimation method, named Hunter, for robust point cloud registration with severe outliers.

View Article and Find Full Text PDF