Publications by authors named "Benoit Liquet"

Building spatial process models that capture nonstationary behavior while delivering computationally efficient inference is challenging. Nonstationary spatially varying kernels (see, e.g.

View Article and Find Full Text PDF

High-dimensional datasets, where the number of variables ' ' is much larger than the number of samples ' ', are ubiquitous and often render standard classification techniques unreliable due to overfitting. An important research problem is feature selection, which ranks candidate variables based on their relevance to the outcome variable and retains those that satisfy a chosen criterion. This article proposes a computationally efficient variable selection method based on principal component analysis tailored to a binary classification problem or case-control study.

View Article and Find Full Text PDF

The selection of best variables is a challenging problem in supervised and unsupervised learning, especially in high-dimensional contexts where the number of variables is usually much larger than the number of observations. In this paper, we focus on two multivariate statistical methods: principal components analysis and partial least squares. Both approaches are popular linear dimension-reduction methods with numerous applications in several fields including in genomics, biology, environmental science, and engineering.

View Article and Find Full Text PDF
Article Synopsis
  • * Two deep learning models were developed to better correct for these issues; one uses labeled data while the other is semi-supervised, both trained on known concentrations of protoporphyrin IX (PpIX).
  • * Evaluations showed that these models had significantly higher correlation coefficients for PpIX concentration detection compared to classical methods, with the semi-supervised model also performing better on human data, reducing false positives by 36%.
View Article and Find Full Text PDF

We present a gentle introduction to elementary mathematical notation with the focus of communicating deep learning principles. This is a "math crash course" aimed at quickly enabling scientists with understanding of the building blocks used in many equations, formulas, and algorithms that describe deep learning. While this short presentation cannot replace solid mathematical knowledge that needs multiple courses and years to solidify, our aim is to allow nonmathematical readers to overcome hurdles of reading texts that also use such mathematical notation.

View Article and Find Full Text PDF
Article Synopsis
  • The text discusses a method for analyzing survival data where some patients may never experience the event of interest, using a mixture cure Cox model and an accelerated failure time (AFT) model when applicable.
  • It introduces a penalized likelihood technique to estimate mixture cure semi-parametric AFT models, which accounts for various types of censored data and uses Gaussian basis functions for baseline hazard estimation.
  • The method's efficacy is validated through simulation studies and a real case study on melanoma recurrence, showcasing its advantages over existing methods like the smcure R package and making it accessible via the aftQnp R package.
View Article and Find Full Text PDF

Through spectral unmixing, hyperspectral imaging (HSI) in fluorescence-guided brain tumor surgery has enabled the detection and classification of tumor regions invisible to the human eye. Prior unmixing work has focused on determining a minimal set of viable fluorophore spectra known to be present in the brain and effectively reconstructing human data without overfitting. With these endmembers, non-negative least squares regression (NNLS) was commonly used to compute the abundances.

View Article and Find Full Text PDF
Article Synopsis
  • - The study investigates the genetic links between breast cancer (BC) and thyroid disorders, revealing a positive correlation between BC risk and thyroxine (FT4) levels, and a negative correlation with thyroid-stimulating hormone (TSH) levels, particularly in estrogen receptor-positive BC.
  • - Polygenic risk scores indicate that higher FT4 and hyperthyroidism risks are associated with increased BC risk, while higher TSH risk is linked to decreased BC risk, highlighting the role of genetics in these diseases.
  • - The research identifies 49 shared genetic loci connected to both BC and thyroid traits and suggests that certain brain and immune system-related genes play significant roles in the relationship between these conditions.
View Article and Find Full Text PDF

Cross-phenotype association using gene-set analysis can help to detect pleiotropic genes and inform about common mechanisms between diseases. Although there are an increasing number of statistical methods for exploring pleiotropy, there is a lack of proper pipelines to apply gene-set analysis in this context and using genome-scale data in a reasonable running time. We designed a user-friendly pipeline to perform cross-phenotype gene-set analysis between two traits using GCPBayes, a method developed by our team.

View Article and Find Full Text PDF

Real-time monitoring using in-situ sensors is becoming a common approach for measuring water-quality within watersheds. High-frequency measurements produce big datasets that present opportunities to conduct new analyses for improved understanding of water-quality dynamics and more effective management of rivers and streams. Of primary importance is enhancing knowledge of the relationships between nitrate, one of the most reactive forms of inorganic nitrogen in the aquatic environment, and other water-quality variables.

View Article and Find Full Text PDF

Compositional data are a special kind of data, represented as a proportion carrying relative information. Although this type of data is widely spread, no solution exists to deal with the cases where the classes are not well balanced. After describing compositional data imbalance, this paper proposes an adaptation of the original Synthetic Minority Oversampling TEchnique (SMOTE) to deal with compositional data imbalance.

View Article and Find Full Text PDF

Background: Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation.

View Article and Find Full Text PDF

In situ sensors that collect high-frequency data are used increasingly to monitor aquatic environments. These sensors are prone to technical errors, resulting in unrecorded observations and/or anomalous values that are subsequently removed and create gaps in time series data. We present a framework based on generalized additive and auto-regressive models to recover these missing data.

View Article and Find Full Text PDF
Article Synopsis
  • * A community study, part of the HUPO Human Glycoproteomics Initiative, tested various software solutions using the same human serum datasets to see how well they perform in analyzing glycopeptides.
  • * The study found that while results varied among teams, some software strategies showed high performance, leading to recommendations for improving search solutions in glycoproteomics and guiding future software development.
View Article and Find Full Text PDF

Background: Heterogeneous respiratory system static compliance (C) values and levels of hypoxemia in patients with novel coronavirus disease (COVID-19) requiring mechanical ventilation have been reported in previous small-case series or studies conducted at a national level.

Methods: We designed a retrospective observational cohort study with rapid data gathering from the international COVID-19 Critical Care Consortium study to comprehensively describe C-calculated as: tidal volume/[airway plateau pressure-positive end-expiratory pressure (PEEP)]-and its association with ventilatory management and outcomes of COVID-19 patients on mechanical ventilation (MV), admitted to intensive care units (ICU) worldwide.

Results: We studied 745 patients from 22 countries, who required admission to the ICU and MV from January 14 to December 31, 2020, and presented at least one value of C within the first seven days of MV.

View Article and Find Full Text PDF

Background: The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level.

View Article and Find Full Text PDF

Semi-Markov models are widely used for survival analysis and reliability analysis. In general, there are two competing parameterizations and each entails its own interpretation and inference properties. On the one hand, a semi-Markov process can be defined based on the distribution of sojourn times, often via hazard rates, together with transition probabilities of an embedded Markov chain.

View Article and Find Full Text PDF

An increasing number of genome-wide association studies (GWAS) summary statistics is made available to the scientific community. Exploiting these results from multiple phenotypes would permit identification of novel pleiotropic associations. In addition, incorporating prior biological information in GWAS such as group structure information (gene or pathway) has shown some success in classical GWAS approaches.

View Article and Find Full Text PDF

Introduction: There is a paucity of data that can be used to guide the management of critically ill patients with COVID-19. In response, a research and data-sharing collaborative-The COVID-19 Critical Care Consortium-has been assembled to harness the cumulative experience of intensive care units (ICUs) worldwide. The resulting observational study provides a platform to rapidly disseminate detailed data and insights crucial to improving outcomes.

View Article and Find Full Text PDF

Anomaly detection (AD) in high-volume environmental data requires one to tackle a series of challenges associated with the typical low frequency of anomalous events, the broad-range of possible anomaly types, and local nonstationary environmental conditions, suggesting the need for flexible statistical methods that are able to cope with unbalanced high-volume data problems. Here, we aimed to detect anomalies caused by technical errors in water-quality (turbidity and conductivity) data collected by automated in situ sensors deployed in contrasting riverine and estuarine environments. We first applied a range of artificial neural networks that differed in both learning method and hyperparameter values, then calibrated models using a Bayesian multiobjective optimization procedure, and selected and evaluated the "best" model for each water-quality variable, environment, and anomaly type.

View Article and Find Full Text PDF

Identification of biomarkers is an emerging area in oncology. In this article, we develop an efficient statistical procedure for the classification of protein markers according to their effect on cancer progression. A high-dimensional time-course dataset of protein markers for 80 patients motivates us for developing the model.

View Article and Find Full Text PDF

Anticipating future changes of an ecosystem's dynamics requires knowledge of how its key communities respond to current environmental regimes. The Great Barrier Reef (GBR) is under threat, with rapid changes of its reef-building hard coral (HC) community structure already evident across broad spatial scales. While several underlying relationships between HC and multiple disturbances have been documented, responses of other benthic communities to disturbances are not well understood.

View Article and Find Full Text PDF

Background: In medical research, explanatory continuous variables are frequently transformed or converted into categorical variables. If the coding is unknown, many tests can be used to identify the "optimal" transformation. This common process, involving the problems of multiple testing, requires a correction of the significance level.

View Article and Find Full Text PDF

Integrative analysis of high dimensional omics datasets has been studied by many authors in recent years. By incorporating prior known relationships among the variables, these analyses have been successful in elucidating the relationships between different sets of omics data. In this article, our goal is to identify important relationships between genomic expression and cytokine data from a human immunodeficiency virus vaccine trial.

View Article and Find Full Text PDF