Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Background: Predicting in advance the behavior of new chemical compounds can support the design process of new products by directing the research toward the most promising candidates and ruling out others. Such predictive models can be data-driven using Machine Learning or based on researchers' experience and depend on the collection of past results. In either case: models (or researchers) can only make reliable assumptions about compounds that are similar to what they have seen before. Therefore, consequent usage of these predictive models shapes the dataset and causes a continuous specialization shrinking the applicability domain of all trained models on this dataset in the future, and increasingly harming model-based exploration of the space.

Proposed Solution: In this paper, we propose CANCELS (CounterActiNg Compound spEciaLization biaS), a technique that helps to break the dataset specialization spiral. Aiming for a smooth distribution of the compounds in the dataset, we identify areas in the space that fall short and suggest additional experiments that help bridge the gap. Thereby, we generally improve the dataset quality in an entirely unsupervised manner and create awareness of potential flaws in the data. CANCELS does not aim to cover the entire compound space and hence retains a desirable degree of specialization to a specified research domain.

Results: An extensive set of experiments on the use-case of biodegradation pathway prediction not only reveals that the bias spiral can indeed be observed but also that CANCELS produces meaningful results. Additionally, we demonstrate that mitigating the observed bias is crucial as it cannot only intervene with the continuous specialization process, but also significantly improves a predictor's performance while reducing the number of required experiments. Overall, we believe that CANCELS can support researchers in their experimentation process to not only better understand their data and potential flaws, but also to grow the dataset in a sustainable way. All code is available under github.com/KatDost/Cancels .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197453PMC
http://dx.doi.org/10.1186/s13321-023-00716-wDOI Listing

Publication Analysis

Top Keywords

predictive models
8
continuous specialization
8
potential flaws
8
dataset
6
specialization
5
combatting over-specialization
4
bias
4
over-specialization bias
4
bias growing
4
growing chemical
4

Similar Publications

Background And Purpose: To review the existing evidence on multiple timepoint assessments of optic nerve sheath diameter (ONSD) as an indicator of intraindividual variation of intracranial pressure (ICP).

Methods: A systematic search identified studies assessing intraindividual variation in ICP through multiple timepoint measurements of ONSD using ultrasonography. Meta-analysis of studies assessing intraindividual correlation coefficients between ONSD and ICP was performed using a random effects model, and we calculated the weighted correlation coefficient for the expected change in ICP associated with variations in ONSD.

View Article and Find Full Text PDF

Introduction: Risperidone is approved for behaviors and psychological symptoms of dementia (BPSD), despite modest efficacy and known risks. Identifying responsive symptoms, treatment modifiers, and predictors is crucial for personalized treatment.

Method: A one-stage individual participant data meta-analysis of six randomized controlled trials (risperidone: n = 1009; placebo: N = 712) was conducted.

View Article and Find Full Text PDF

Background: Hospital-acquired venous thromboembolism (HA-VTE) is a leading cause of morbidity and mortality among hospitalized adults. Numerous prognostic models have been developed to identify those patients with elevated risk of HA-VTE. None, however, has met the necessary criteria to guide clinical decision-making.

View Article and Find Full Text PDF

Investigating the early-stage emissions of formaldehyde/VOCs from building materials and their influencing factors.

Environ Technol

September 2025

School of Architecture and Urban Planning, Chongqing Jiaotong University, Chongqing, People's Republic of China.

As urbanization accelerates, the issue of pollutant discharge from building materials has become the focus of public attention. Conducted in a ventilated environmental chamber, the experiments investigated the emission characteristics of VOCs from dry and wet building materials, focusing on the influencing factors, such as temperature, relative humidity (RH), ventilation, and seasonality. The impact of influencing factors was quantified using a one-factor-at-a-time control method.

View Article and Find Full Text PDF

Introduction: The effect of inflammatory bowel disease (IBD) on adverse in-hospital outcomes after radical prostatectomy (RP) for nonmetastatic prostate cancer (PCa) is not well known.

Materials And Methods: Descriptive analyses, propensity score matching and multivariable logistic regression models were used within the National Inpatient Sample (2000-2019) RP patients, after stratification according to Crohn's disease (CD) vs. ulcerative colitis (UC) vs.

View Article and Find Full Text PDF