Grade Inflation in Generative Models.

ArXiv

Department of Pathology and the Division of Clinical Informatics, Department of Medicine, BIDMC and with Harvard Medical School, Boston, MA 02215.

Published: January 2025


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Generative models hold great potential, but only if one can trust the evaluation of the data they generate. We show that many commonly used quality scores for comparing two-dimensional distributions of synthetic vs. ground-truth data give better results than they should, a phenomenon we call the "grade inflation problem." We show that the correlation score, Jaccard score, earth-mover's score, and Kullback-Leibler (relative-entropy) score all suffer grade inflation. We propose that any score that values all datapoints equally, as these do, will also exhibit grade inflation; we refer to such scores as "equipoint" scores. We introduce the concept of "equidensity" scores, and present the Eden score, to our knowledge the first example of such a score. We find that Eden avoids grade inflation and agrees better with human perception of goodness-of-fit than the equipoint scores above. We propose that any reasonable equidensity score will avoid grade inflation. We identify a connection between equidensity scores and Rényi entropy of negative order. We conclude that equidensity scores are likely to outperform equipoint scores for generative models, and for comparing low-dimensional distributions more generally.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11722526PMC

Publication Analysis

Top Keywords

grade inflation
20
generative models
12
scores
8
score
8
equipoint scores
8
equidensity scores
8
grade
5
inflation
5
inflation generative
4
models generative
4

Similar Publications

PurposeTo evaluate the safety and ability of an ophthalmic solution containing Poloxamer 407 and Polyquaternium 133 to reduce conjunctival bacterial load before cataract surgery.MethodsPatients (n = 74) were randomized to 2 groups: treatment (n = 37) or placebo (treatment's vehicle; (n = 37)) BID from V1 to V3. Patients were also given standard postoperative treatment from V2 to V3.

View Article and Find Full Text PDF

BackgroundEndovascular coil embolization is a common treatment for intracranial aneurysms, but aneurysm recanalization remains a significant problem that may necessitate retreatment. This study aimed to identify patient, aneurysm, and procedural factors associated with recanalization in aneurysms treated exclusively with coil embolization.MethodsThis single center retrospective study assessed intracranial aneurysms treated with coiling-only between 2017 and 2022.

View Article and Find Full Text PDF

Purpose: The aim of this study is to explore the correlation between grade and total number of nasal and temporal side lid-parallel conjunctival folds (LIPCOFs) for signs and dry eye severity grading of different types of dry eye disease.

Methods: The data of 76 eyes of 38 patients with dry eye disease were selected. Fluorescein Breakup Time (FBUT), tear secretion function, corneal staining test, and slit lamp examination were performed.

View Article and Find Full Text PDF

Testing the Limits: Revisiting Standardized Testing in Pharmacy Education.

Am J Pharm Educ

August 2025

Division of Practice Advancement and Clinical Education, Associate Dean for Admissions and Accreditation, UNC Eshelman School of Pharmacy, The University of North Carolina at Chapel Hill, Chapel Hill, NC. Electronic address:

Standardized tests have long served as a tool in higher education admissions to assess academic readiness and predict student success. The Pharmacy College Admission Test (PCAT), established in 1974, historically played a crucial role in evaluating prospective pharmacy students. Research consistently linked higher PCAT scores with stronger academic performance in pharmacy programs.

View Article and Find Full Text PDF

Aims: Paradoxical embolism from a patent foramen ovale (PFO) can cause cryptogenic stroke. Agitated saline contrast transthoracic echocardiography (ASC-TTE), with the Valsalva manoeuvre (VM), is crucial for diagnosing PFO. However, the VM is associated with false-negative outcomes.

View Article and Find Full Text PDF