98%
921
2 minutes
20
Differentially private (DP) synthetic datasets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such as health care and humanitarian action, where data is scarce and regulated by restrictive privacy laws. In this work, we investigate the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identify the most effective synthetic data generation techniques for training and evaluating machine learning models. We systematically investigate the impacts of differentially private synthetic data on downstream classification tasks from the point of view of utility as well as fairness. Our analysis is comprehensive and includes representatives of the two main types of synthetic data generation algorithms: marginal-based and GAN-based. To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic dataset generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness. Our findings demonstrate that marginal-based synthetic data generators surpass GAN-based ones regarding model training utility for tabular data. Indeed, we show that models trained using data generated by marginal-based algorithms can exhibit similar utility to models trained using real data. Our analysis also reveals that the marginal-based synthetic data generated using AIM and MWEM PGM algorithms can train models that simultaneously achieve utility and fairness characteristics close to those obtained by models trained with real data.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10843030 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0297271 | PLOS |
Probiotics Antimicrob Proteins
September 2025
Department of Microbiology, Faculty of Medicine, Shahid Sadoughi University of Medical Sciences, Yazd, Iran.
Anaerobic bacteria cause a wide range of infections, varying from mild to severe, whether localized, implant-associated, or invasive, often leading to high morbidity and mortality. These infections are challenging to manage due to antimicrobial resistance against common antibiotics such as carbapenems and nitroimidazoles. The empirical use of antibiotics has contributed to the emergence of resistant organisms, making the identification and development of new antibiotics increasingly difficult.
View Article and Find Full Text PDFJ Acoust Soc Am
September 2025
Department of Physics, University of Louisiana at Lafayette, Lafayette, Louisiana 70503, USA.
A method is presented for determining the significant parameters, maximum wind speed and radius of maximum wind speed, of the surface winds associated with a hurricane. The method is based on Bayesian inversion, using Markov chain Monte Carlo sampling. Underwater acoustic measurements are used to estimate parameters in the axisymmetric Holland model for hurricane surface winds.
View Article and Find Full Text PDFActa Crystallogr F Struct Biol Commun
October 2025
Science and Technology Facilities Council, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom.
Ease of access to data, tools and models expedites scientific research. In structural biology there are now numerous open repositories of experimental and simulated data sets. Being able to easily access and utilize these is crucial to allow researchers to make optimal use of their research effort.
View Article and Find Full Text PDFNat Prod Rep
September 2025
Saarland University, Department of Pharmacy, Saarbrücken, Germany.
Focus on 2004 to 2024The rediscovery of natural products (NPs) as a critical source of new therapeutics has been greatly advanced by the development of heterologous expression platforms for biosynthetic gene clusters (BGCs). Among these, species have emerged as the most widely used and versatile chassis for expressing complex BGCs from diverse microbial origins. In this review, we provide a comprehensive analysis of over 450 peer-reviewed studies published between 2004 and 2024 that describe the heterologous expression of BGCs in hosts.
View Article and Find Full Text PDFJ Chem Phys
September 2025
School of Mathematical and Physical Sciences, University of Sheffield, Hicks Building, Hounsfield Road, Sheffield S3 7RH, United Kingdom.
The development of the microstructure during polymeric spinodal decomposition can be monitored in real time using small-angle scattering. Information about the microstructure can be deduced from measurements of the structure factor-a quantity directly proportional to the scattered intensity. While the time evolution of the structure factor can be measured relatively easily, modeling it has proved to be much more difficult.
View Article and Find Full Text PDF