Publications by authors named "Pierre Geurts"

Detecting skeletal or bone-related deformities in model and aquaculture fish is vital for numerous biomedical studies. In biomedical research, model fish with bone-related disorders are potential indicators of various chemically induced toxins in their environment or poor dietary conditions. In aquaculture, skeletal deformities are affecting fish health, and economic losses are incurred by fish farmers.

View Article and Find Full Text PDF

In this article, we propose a method for evaluating feature ranking algorithms. A feature ranking algorithm estimates the importance of descriptive features when predicting the target variable, and the proposed method evaluates the correctness of these importance values by computing the error measures of two chains of predictive models. The models in the first chain are built on nested sets of top-ranked features, while the models in the other chain are built on nested sets of bottom ranked features.

View Article and Find Full Text PDF

In this work, we investigate multi-task learning as a way of pre-training models for classification tasks in digital pathology. It is motivated by the fact that many small and medium-size datasets have been released by the community over the years whereas there is no large scale dataset similar to ImageNet in the domain. We first assemble and transform many digital pathology datasets into a pool of 22 classification tasks and almost 900k images.

View Article and Find Full Text PDF

Many genomic data analyses such as phasing, genotype imputation, or local ancestry inference share a common core task: matching pairs of haplotypes at any position along the chromosome, thereby inferring a target haplotype as a succession of pieces from reference haplotypes, commonly called a mosaic of reference haplotypes. For that purpose, these analyses combine information provided by linkage disequilibrium, linkage and/or genealogy through a set of heuristic rules or, most often, by a hidden Markov model. Here, we develop an extremely randomized trees framework to address the issue of local haplotype matching.

View Article and Find Full Text PDF

Alzheimer's disease (AD) subtypes have been described according to genetics, neuropsychology, neuropathology, and neuroimaging. Thirty-one patients with clinically probable AD were selected based on perisylvian metabolic decrease on FDG-PET. They were compared to 25 patients with a typical pattern of decreased posterior metabolism.

View Article and Find Full Text PDF

In this chapter, we introduce the reader to a popular family of machine learning algorithms, called decision trees. We then review several approaches based on decision trees that have been developed for the inference of gene regulatory networks (GRNs). Decision trees have indeed several nice properties that make them well-suited for tackling this problem: they are able to detect multivariate interacting effects between variables, are non-parametric, have good scalability, and have very few parameters.

View Article and Find Full Text PDF

Machine learning approaches have been increasingly used in the neuroimaging field for the design of computer-aided diagnosis systems. In this paper, we focus on the ability of these methods to provide interpretable information about the brain regions that are the most informative about the disease or condition of interest. In particular, we investigate the benefit of group-based, instead of voxel-based, analyses in the context of Random Forests.

View Article and Find Full Text PDF

The elucidation of gene regulatory networks is one of the major challenges of systems biology. Measurements about genes that are exploited by network inference methods are typically available either in the form of steady-state expression vectors or time series expression data. In our previous work, we proposed the GENIE3 method that exploits variable importance scores derived from Random forests to identify the regulators of each target gene.

View Article and Find Full Text PDF

The detection of anatomical landmarks in bioimages is a necessary but tedious step for geometric morphometrics studies in many research domains. We propose variants of a multi-resolution tree-based approach to speed-up the detection of landmarks in bioimages. We extensively evaluate our method variants on three different datasets (cephalometric, zebrafish, and drosophila images).

View Article and Find Full Text PDF

We present SCENIC, a computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA-seq data (http://scenic.aertslab.org).

View Article and Find Full Text PDF

Background: Platelets have been involved in both immune surveillance and host defense against severe infection. To date, whether platelet phenotype or other hemostasis components could be associated with predisposition to sepsis in critical illness remains unknown. The aim of this work was to identify platelet markers that could predict sepsis occurrence in critically ill injured patients.

View Article and Find Full Text PDF

Motivation: Collaborative analysis of massive imaging datasets is essential to enable scientific discoveries.

Results: We developed Cytomine to foster active and distributed collaboration of multidisciplinary teams for large-scale image-based studies. It uses web development methodologies and machine learning in order to readily organize, explore, share and analyze (semantically and quantitatively) multi-gigapixel imaging data over the internet.

View Article and Find Full Text PDF

Circulating microRNAs (miRNAs) are increasingly recognized as powerful biomarkers in several pathologies, including breast cancer. Here, their plasmatic levels were measured to be used as an alternative screening procedure to mammography for breast cancer diagnosis.A plasma miRNA profile was determined by RT-qPCR in a cohort of 378 women.

View Article and Find Full Text PDF

This paper studies the link between resting-state functional connectivity (FC), measured by the correlations of fMRI BOLD time courses, and structural connectivity (SC), estimated through fiber tractography. Instead of a static analysis based on the correlation between SC and FC averaged over the entire fMRI time series, we propose a dynamic analysis, based on the time evolution of the correlation between SC and a suitably windowed FC. Assessing the statistical significance of the time series against random phase permutations, our data show a pronounced peak of significance for time window widths around 20-30 TR (40-60 s).

View Article and Find Full Text PDF

Networks are ubiquitous in biology, and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes.

View Article and Find Full Text PDF

Cephalometric analysis is an essential clinical and research tool in orthodontics for the orthodontic analysis and treatment planning. This paper presents the evaluation of the methods submitted to the Automatic Cephalometric X-Ray Landmark Detection Challenge, held at the IEEE International Symposium on Biomedical Imaging 2014 with an on-site competition. The challenge was set to explore and compare automatic landmark detection methods in application to cephalometric X-ray images.

View Article and Find Full Text PDF

During tumour dissemination, invading breast carcinoma cells become confronted with a reactive stroma, a type I collagen-rich environment endowed with anti-proliferative and pro-apoptotic properties. To develop metastatic capabilities, tumour cells must acquire the capacity to cope with this novel microenvironment. How cells interact with and respond to their microenvironment during cancer dissemination remains poorly understood.

View Article and Find Full Text PDF

Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning.

View Article and Find Full Text PDF

Genome control is operated by transcription factors (TFs) controlling their target genes by binding to promoters and enhancers. Conceptually, the interactions between TFs, their binding sites, and their functional targets are represented by gene regulatory networks (GRNs). Deciphering in vivo GRNs underlying organ development in an unbiased genome-wide setting involves identifying both functional TF-gene interactions and physical TF-DNA interactions.

View Article and Find Full Text PDF

Gene regulatory networks (GRNs) govern phenotypic adaptations and reflect the trade-offs between physiological responses and evolutionary adaptation that act at different time-scales. To identify patterns of molecular function and genetic diversity in GRNs, we studied the drought response of the common sunflower, Helianthus annuus, and how the underlying GRN is related to its evolution. We examined the responses of 32,423 expressed sequences to drought and to abscisic acid (ABA) and selected 145 co-expressed transcripts.

View Article and Find Full Text PDF

The primary goal of genome-wide association studies (GWAS) is to discover variants that could lead, in isolation or in combination, to a particular trait or disease. Standard approaches to GWAS, however, are usually based on univariate hypothesis tests and therefore can account neither for correlations due to linkage disequilibrium nor for combinations of several markers. To discover and leverage such potential multivariate interactions, we propose in this work an extension of the Random Forest algorithm tailored for structured GWAS data.

View Article and Find Full Text PDF

One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm.

View Article and Find Full Text PDF

Inflammation can contribute to tumor formation; however, markers that predict progression are still lacking. In the present study, the well-established azoxymethane (AOM)/dextran sulfate sodium (DSS)-induced mouse model of colitis-associated cancer was used to analyze microRNA (miRNA) modulation accompanying inflammation-induced tumor development and to determine whether inflammation-triggered miRNA alterations affect the expression of genes or pathways involved in cancer. A miRNA microarray experiment was performed to establish miRNA expression profiles in mouse colon at early and late time points during inflammation and/or tumor growth.

View Article and Find Full Text PDF

Networks provide a natural representation of molecular biology knowledge, in particular to model relationships between biological entities such as genes, proteins, drugs, or diseases. Because of the effort, the cost, or the lack of the experiments necessary for the elucidation of these networks, computational approaches for network inference have been frequently investigated in the literature. In this paper, we examine the assessment of supervised network inference.

View Article and Find Full Text PDF

Due to the relative transparency of its embryos and larvae, the zebrafish is an ideal model organism for bioimaging approaches in vertebrates. Novel microscope technologies allow the imaging of developmental processes in unprecedented detail, and they enable the use of complex image-based read-outs for high-throughput/high-content screening. Such applications can easily generate Terabytes of image data, the handling and analysis of which becomes a major bottleneck in extracting the targeted information.

View Article and Find Full Text PDF