Publications by Abhirup Datta | LitMetric

Publications by authors named "Abhirup Datta"

Page 1 of 2

Geographical shifting of cholera burden in Africa and its implications for disease control.

Javier Perez-Saez , Qulu Zheng , Joshua Kaminsky , Kaiyue Zou , Maya N Demby , Abhirup Datta

Nat Med

August 2025

Cholera outbreaks cause substantial morbidity and mortality in Africa, yet changes in the geographic distribution of cholera burden over time remain uncharacterized. We used surveillance data and spatial statistical models to estimate the mean annual incidence of reported suspected cholera for 2011-2015 and 2016-2020 on a 20-km grid across Africa. Across 43 countries, mean annual incidence rates remained at 11 cases per 100,000 population, with 125,701 cases estimated annually (95% credible interval (CrI): 124,737-126,717) from 2016 to 2020.

View Article and Find Full Text PDF

Screening for Modifiable Risk Factors of Noncommunicable Diseases in Urban Young Adults, Indore, 2023-2024.

Rajesh Kothari , Vinita Kothari , Abhirup Datta , Shubhi Tiwari , Ameya Vaze

Cureus

June 2025

Introduction: Noncommunicable diseases (NCDs) lead to huge mortality in the population under 70 years of age at a global level. A national program called the National Programme for Prevention and Control of NCDs (NPNCD), targeting mainly individuals over 30 years of age, has been launched in India. Nearly 200 million Indians are young adults (ages 18-30 years).

View Article and Find Full Text PDF

Neural networks for geospatial data.

Wentao Zhan , Abhirup Datta

J Am Stat Assoc

June 2024

Analysis of geospatial data has traditionally been model-based, with a mean model, customarily specified as a linear regression on the covariates, and a Gaussian process covariance model, encoding the spatial dependence. While nonlinear machine learning algorithms like neural networks are increasingly being used for spatial analysis, current approaches depart from the model-based setup and cannot explicitly incorporate spatial covariance. We propose , embedding neural networks directly within the traditional Gaussian process (GP) geostatistical model to accommodate non-linear mean functions while retaining all other advantages of GP, like explicit modeling of the spatial covariance and predicting at new locations via kriging.

View Article and Find Full Text PDF

Graph-constrained Analysis for Multivariate Functional Data.

Debangan Dey , Sudipto Banerjee , Martin A Lindquist , Abhirup Datta

J Multivar Anal

May 2025

The manuscript considers multivariate functional data analysis with a known graphical model among the functional variables representing their conditional relationships (e.g., brain region-level fMRI data with a prespecified connectivity graph among brain regions).

View Article and Find Full Text PDF

Assessing predictability of environmental time series with statistical and machine learning models.

Matthew Bonas , Abhirup Datta , Christopher K Wikle , Edward L Boone , Faten S Alamri

Environmetrics

January 2025

The ever increasing popularity of machine learning methods in virtually all areas of science, engineering and beyond is poised to put established statistical modeling approaches into question. Environmental statistics is no exception, as popular constructs such as neural networks and decision trees are now routinely used to provide forecasts of physical processes ranging from air pollution to meteorology. This presents both challenges and opportunities to the statistical community, which could contribute to the machine learning literature with a model-based approach with formal uncertainty quantification.

View Article and Find Full Text PDF

Incorrect statistical reasoning in Guyll et al. leads to biased claims about strength of forensic evidence.

Michael Rosenblum , Elizabeth T Chin , Elizabeth L Ogburn , Akihiko Nishimura , Daniel Westreich , Abhirup Datta

Proc Natl Acad Sci U S A

November 2024

View Article and Find Full Text PDF

Direct Bayesian linear regression for distribution-valued covariates.

Bohao Tang , Sandipan Pramanik , Yi Zhao , Brian Caffo , Abhirup Datta

Electron J Stat

August 2024

Article Synopsis

This manuscript introduces a novel method for scalar-on-distribution regression, where subject-specific distributions serve as covariates to predict a single outcome, bypassing the need for prior estimation of these distributions.
The proposed approach uses observed repeated measures directly as covariates and applies a Gaussian process prior, achieving efficient Bayesian inference without needing intermediate density estimates.
The method shows superior performance in simulation studies compared to traditional regression that requires estimating densities first, especially when there are limited repeated measures per subject, and it also accommodates various forms of data dependencies.

View Article and Find Full Text PDF

Visibility graph-based covariance functions for scalable spatial analysis in non-convex partially Euclidean domains.

Brian Gilbert , Abhirup Datta

Biometrics

July 2024

We present a new method for constructing valid covariance functions of Gaussian processes for spatial analysis in irregular, non-convex domains such as bodies of water. Standard covariance functions based on geodesic distances are not guaranteed to be positive definite on such domains, while existing non-Euclidean approaches fail to respect the partially Euclidean nature of these domains where the geodesic distance agrees with the Euclidean distances for some pairs of points. Using a visibility graph on the domain, we propose a class of covariance functions that preserve Euclidean-based covariances between points that are connected in the domain while incorporating the non-convex geometry of the domain via conditional independence relationships.

View Article and Find Full Text PDF

A causal machine-learning framework for studying policy impact on air pollution: a case study in COVID-19 lockdowns.

Claire Heffernan , Kirsten Koehler , Misti Levy Zamora , Colby Buehler , Drew R Gentner , Abhirup Datta

Am J Epidemiol

January 2025

When studying the impact of policy interventions or natural experiments on air pollution, such as new environmental policies or the opening or closing of an industrial facility, careful statistical analysis is needed to separate causal changes from other confounding factors. Using COVID-19 lockdowns as a case study, we present a comprehensive framework for estimating and validating causal changes from such perturbations. We propose using flexible machine learning-based comparative interrupted time series (CITS) models for estimating such a causal effect.

View Article and Find Full Text PDF

A DYNAMIC SPATIAL FILTERING APPROACH TO MITIGATE UNDERESTIMATION BIAS IN FIELD CALIBRATED LOW-COST SENSOR AIR POLLUTION DATA.

Claire Heffernan , Roger PenG , Drew R Gentner , Kirsten Koehler , Abhirup Datta

Ann Appl Stat

December 2023

Low-cost air pollution sensors, offering hyper-local characterization of pollutant concentrations, are becoming increasingly prevalent in environmental and public health research. However, low-cost air pollution data can be noisy, biased by environmental conditions, and usually need to be field-calibrated by collocating low-cost sensors with reference-grade instruments. We show, theoretically and empirically, that the common procedure of regression-based calibration using collocated data systematically underestimates high air pollution concentrations, which are critical to diagnose from a health perspective.

View Article and Find Full Text PDF

Evaluation of Calibration Approaches for Indoor Deployments of PurpleAir Monitors.

Kirsten Koehler , Megan Wilks , Tim Green , Ana M Rule , Misti L Zamora , Abhirup Datta

Atmos Environ (1994)

October 2023

Low-cost air quality monitors are growing in popularity among both researchers and community members to understand variability in pollutant concentrations. Several studies have produced calibration approaches for these sensors for ambient air. These calibrations have been shown to depend primarily on relative humidity, particle size distribution, and particle composition, which may be different in indoor environments.

View Article and Find Full Text PDF

Modeling Multivariate Spatial Dependencies Using Graphical Models.

Debangan Dey , Abhirup Datta , Sudipto Banerjee

N Engl J Stat Data Sci

September 2023

Graphical models have witnessed significant growth and usage in spatial data science for modeling data referenced over a massive number of spatial-temporal coordinates. Much of this literature has focused on a single or relatively few spatially dependent outcomes. Recent attention has focused upon addressing modeling and inference for substantially large number of outcomes.

View Article and Find Full Text PDF

Improved fMRI-based pain prediction using Bayesian group-wise functional registration.

Guoqing Wang , Abhirup Datta , Martin A Lindquist

Biostatistics

July 2024

In recent years, the field of neuroimaging has undergone a paradigm shift, moving away from the traditional brain mapping approach towards the development of integrated, multivariate brain models that can predict categories of mental events. However, large interindividual differences in both brain anatomy and functional localization after standard anatomical alignment remain a major limitation in performing this type of analysis, as it leads to feature misalignment across subjects in subsequent predictive models. This article addresses this problem by developing and validating a new computational technique for reducing misalignment across individuals in functional brain systems by spatially transforming each subject's functional data to a common latent template map.

View Article and Find Full Text PDF

nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes.

Lukas M Weber , Arkajyoti Saha , Abhirup Datta , Kasper D Hansen , Stephanie C Hicks

Nat Commun

July 2023

Feature selection to identify spatially variable genes or other biologically informative genes is a key step during analyses of spatially-resolved transcriptomics data. Here, we propose nnSVG, a scalable approach to identify spatially variable genes based on nearest-neighbor Gaussian processes. Our method (i) identifies genes that vary in expression continuously across the entire tissue or within a priori defined spatial domains, (ii) uses gene-specific estimates of length scale parameters within the Gaussian process models, and (iii) scales linearly with the number of spatial locations.

View Article and Find Full Text PDF

BAYESIAN FUNCTIONAL REGISTRATION OF FMRI ACTIVATION MAPS.

Guoqing Wang , Abhirup Datta , Martin A Lindquist

Ann Appl Stat

September 2022

Functional magnetic resonance imaging (fMRI) has provided invaluable insight into our understanding of human behavior. However, large inter-individual differences in both brain anatomy and functional localization anatomical alignment remain a major limitation in conducting group analyses and performing population level inference. This paper addresses this problem by developing and validating a new computational technique for reducing misalignment across individuals in functional brain systems by spatially transforming each subjects functional data to a common reference map.

View Article and Find Full Text PDF

Identifying optimal co-location calibration periods for low-cost sensors.

Misti Levy Zamora , Colby Buehler , Abhirup Datta , Drew R Gentner , Kirsten Koehler

Atmos Meas Tech

January 2023

Low-cost sensors are often co-located with reference instruments to assess their performance and establish calibration equations, but limited discussion has focused on whether the duration of this calibration period can be optimized. We placed a multipollutant monitor that contained sensors that measure particulate matter smaller than 2.5 μm (PM), carbon monoxide (CO), nitrogen dioxide (NO), ozone (O), and nitric oxide (NO) at a reference field site for one year.

View Article and Find Full Text PDF

An illustration of model agnostic explainability methods applied to environmental data.

Christopher K Wikle , Abhirup Datta , Bhava Vyasa Hari , Edward L Boone , Indranil Sahoo

Environmetrics

February 2023

Historically, two primary criticisms statisticians have of machine learning and deep neural models is their lack of uncertainty quantification and the inability to do inference (i.e., to explain what inputs are important).

View Article and Find Full Text PDF

Laboratory and field evaluation of a low-cost methane sensor and key environmental factors for sensor calibration.

Joyce J Y Lin , Colby Buehler , Abhirup Datta , Drew R Gentner , Kirsten Koehler

Environ Sci Atmos

April 2023

Low-cost sensors enable finer-scale spatiotemporal measurements within the existing methane (CH) monitoring infrastructure and could help cities mitigate CH emissions to meet their climate goals. While initial studies of low-cost CH sensors have shown potential for effective CH measurement at ambient concentrations, sensor deployment remains limited due to questions about interferences and calibration across environments and seasons. This study evaluates sensor performance across seasons with specific attention paid to the sensor's understudied carbon monoxide (CO) interferences and environmental dependencies through long-term ambient co-location in an urban environment.

View Article and Find Full Text PDF

Countrywide Mortality Surveillance for Action in Mozambique: Results from a National Sample-Based Vital Statistics System for Mortality and Cause of Death.

Ivalda Macicame , Almamy M Kante , Emily Wilson , Brian Gilbert , Alain Koffi , Abhirup Datta

Am J Trop Med Hyg

May 2023

Sub-Saharan Africa lacks timely, reliable, and accurate national data on mortality and causes of death (CODs). In 2018 Mozambique launched a sample registration system (Countrywide Mortality Surveillance for Action [COMSA]-Mozambique), which collects continuous birth, death, and COD data from 700 randomly selected clusters, a nationally representative population of 828,663 persons. Verbal and social autopsy interviews are conducted for COD determination.

View Article and Find Full Text PDF

Correcting for Verbal Autopsy Misclassification Bias in Cause-Specific Mortality Estimates.

Jacob Fiksel , Brian Gilbert , Emily Wilson , Henry Kalter , Almamy Kante , Abhirup Datta

Am J Trop Med Hyg

May 2023

Verbal autopsies (VAs) are extensively used to determine cause of death (COD) in many low- and middle-income countries. However, COD determination from VA can be inaccurate. Computer coded verbal autopsy (CCVA) algorithms used for this task are imperfect and misclassify COD for a large proportion of deaths.

View Article and Find Full Text PDF

Multi-Cause Calibration of Verbal Autopsy-Based Cause-Specific Mortality Estimates of Children and Neonates in Mozambique.

Brian Gilbert , Jacob Fiksel , Emily Wilson , Henry Kalter , Almamy Kante , Abhirup Datta

Am J Trop Med Hyg

May 2023

The Countrywide Mortality Surveillance for Action platform is collecting verbal autopsy (VA) records from a nationally representative sample in Mozambique. These records are used to estimate the national and subnational cause-specific mortality fractions (CSMFs) for children (1-59 months) and neonates (1-28 days). Cross-tabulation of VA-based cause-of-death (COD) determination against that from the minimally invasive tissue sampling (MITS) from the Child Health and Mortality Prevention project revealed important misclassification errors for all the VA algorithms, which if not accounted for will lead to bias in the estimates of CSMF from VA.

View Article and Find Full Text PDF

Graphical Gaussian Process Models for Highly Multivariate Spatial Data.

Debangan Dey , Abhirup Datta , Sudipto Banerjee

Biometrika

December 2022

For multivariate spatial Gaussian process (GP) models, customary specifications of cross-covariance functions do not exploit relational inter-variable graphs to ensure process-level conditional independence among the variables. This is undesirable, especially for highly multivariate settings, where popular cross-covariance functions such as the multivariate Matérn suffer from a "curse of dimensionality" as the number of parameters and floating point operations scale up in quadratic and cubic order, respectively, in the number of variables. We propose a class of multivariate "Graphical Gaussian Processes" using a general construction called "stitching" that crafts cross-covariance functions from graphs and ensures process-level conditional independence among variables.

View Article and Find Full Text PDF

Non-linear probabilistic calibration of low-cost environmental air pollution sensor networks for neighborhood level spatiotemporal exposure assessment.

Andrew Patton , Abhirup Datta , Misti Levy Zamora , Colby Buehler , Fulizi Xiong

J Expo Sci Environ Epidemiol

November 2022

Background: Low-cost sensor networks for monitoring air pollution are an effective tool for expanding spatial resolution beyond the capabilities of existing state and federal reference monitoring stations. However, low-cost sensor data commonly exhibit non-linear biases with respect to environmental conditions that cannot be captured by linear models, therefore requiring extensive lab calibration. Further, these calibration models traditionally produce point estimates or uniform variance predictions which limits their downstream in exposure assessment.

View Article and Find Full Text PDF

Evaluating the Performance of Using Low-Cost Sensors to Calibrate for Cross-Sensitivities in a Multipollutant Network.

Misti Levy Zamora , Colby Buehler , Hao Lei , Abhirup Datta , Fulizi Xiong

ACS ES T Eng

May 2022

As part of our low-cost sensor network, we colocated multipollutant monitors containing sensors for particulate matter, carbon monoxide, ozone, nitrogen dioxide, and nitrogen monoxide at a reference field site in Baltimore, MD, for 1 year. The first 6 months were used for training multiple regression models, and the second 6 months were used to evaluate the models. The models produced accurate hourly concentrations for all sensors except ozone, which likely requires nonlinear methods to capture peak summer concentrations.

View Article and Find Full Text PDF

Hierarchical multivariate directed acyclic graph autoregressive models for spatial diseases mapping.

Leiwen Gao , Abhirup Datta , Sudipto Banerjee

Stat Med

July 2022

Disease mapping is an important statistical tool used by epidemiologists to assess geographic variation in disease rates and identify lurking environmental risk factors from spatial patterns. Such maps rely upon spatial models for regionally aggregated data, where neighboring regions tend to exhibit similar outcomes than those farther apart. We contribute to the literature on multivariate disease mapping, which deals with measurements on multiple (two or more) diseases in each region.

View Article and Find Full Text PDF