Accurately measuring vegetation height is essential for understanding ecosystem structure, carbon storage, and biodiversity, yet global height models have overwhelmingly focused on forests, excluding ecosystems with shorter herbaceous vegetation or shrubs. To address this gap in vegetation structure data, we developed the first global estimate of median vegetation height annually from 2000-2022 at 30 m resolution, using ICESat-2 satellite Lidar, Landsat cloud free composites, and other Earth Observation raster data. Thirty two (32) million ICESat-2 20 m segments were used within 10 independent draws to build ensemble Gradient Boosted Tree (GBT) models and estimate 90% prediction intervals.
View Article and Find Full Text PDFThe article describes production of a high spatial resolution (30 m) bimonthly light use efficiency (LUE) based gross primary productivity (GPP) data set representing grasslands for the period 2000 to 2022. The data set is based on using reconstructed global complete consistent bimonthly Landsat archive (400TB of data), combined with 1 km MOD11A1 temperature data and 1° CERES Photosynthetically Active Radiation (PAR). First, the LUE model was implemented by taking the biome-specific productivity factor (maximum LUE parameter) as a global constant, producing a global bimonthly (uncalibrated) productivity data for the complete land mask.
View Article and Find Full Text PDFProduction and validation of an open global ensemble digital terrain model (GEDTM30) and derived terrain variables at 1 arc-s spacing grid ( 30 m spatial resolution) is described. Copernicus DEM, ALOS World3D, and object height models were combined in a data fusion approach to generate a globally consistent digital terrain model (DTM). This DTM was then used to compute 15 standard terrain variables across six scales (30, 60, 120, 240, 480 and 960 m).
View Article and Find Full Text PDFSoil spectroscopy is a widely used method for estimating soil properties that are important to environmental and agricultural monitoring. However, a bottleneck to its more widespread adoption is the need for establishing large reference datasets for training machine learning (ML) models, which are called soil spectral libraries (SSLs). Similarly, the prediction capacity of new samples is also subject to the number and diversity of soil types and conditions represented in the SSLs.
View Article and Find Full Text PDFThe paper describes the production and evaluation of global grassland extent mapped annually for 2000-2022 at 30 m spatial resolution. The dataset showing the spatiotemporal distribution of cultivated and natural/semi-natural grassland classes was produced by using GLAD Landsat ARD-2 image archive, accompanied by climatic, landform and proximity covariates, spatiotemporal machine learning (per-class Random Forest) and over 2.3 M reference samples (visually interpreted in Very High Resolution imagery).
View Article and Find Full Text PDFProcessing large collections of earth observation (EO) time-series, often petabyte-sized, such as NASA's Landsat and ESA's Sentinel missions, can be computationally prohibitive and costly. Despite their name, even the Analysis Ready Data (ARD) versions of such collections can rarely be used as direct input for modeling because of cloud presence and/or prohibitive storage size. Existing solutions for readily using these data are not openly available, are poor in performance, or lack flexibility.
View Article and Find Full Text PDFThe article presents results of using remote sensing images and machine learning to map and assess land potential based on time-series of potential Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) composites. Land potential here refers to the potential vegetation productivity in the hypothetical absence of short-term anthropogenic influence, such as intensive agriculture and urbanization. Knowledge on this ecological land potential could support the assessment of levels of land degradation as well as restoration potentials.
View Article and Find Full Text PDFThis dataset presents global soil organic carbon stocks in mangrove forests at 30 m resolution, predicted for 2020. We used spatiotemporal ensemble machine learning to produce predictions of soil organic carbon content and bulk density (BD) to 1 m soil depth, which were then aggregated to calculate soil organic carbon stocks. This was done by using training data points of both SOC (%) and BD in mangroves from a global dataset and from recently published studies, and globally consistent predictive covariate layers.
View Article and Find Full Text PDFThe global potential distribution of biomes (natural vegetation) was modelled using 8,959 training points from the BIOME 6000 dataset and a stack of 72 environmental covariates representing terrain and the current climatic conditions based on historical long term averages (1979-2013). An ensemble machine learning model based on stacked regularization was used, with multinomial logistic regression as the meta-learner and spatial blocking (100 km) to deal with spatial autocorrelation of the training points. Results of spatial cross-validation for the BIOME 6000 classes show an overall accuracy of 0.
View Article and Find Full Text PDFThe article describes the production steps and accuracy assessment of an analysis-ready, open-access European data cube consisting of 2000-2020+ Landsat data, 2017-2021+ Sentinel-2 data and a 30 m resolution digital terrain model (DTM). The main purpose of the data cube is to make annual continental-scale spatiotemporal machine learning tasks accessible to a wider user base by providing a spatially and temporally consistent multidimensional feature space. This has required systematic spatiotemporal harmonization, efficient compression, and imputation of missing values.
View Article and Find Full Text PDFThis article describes a data-driven framework based on spatiotemporal machine learning to produce distribution maps for 16 tree species ( Mill., Mill., L.
View Article and Find Full Text PDFA spatiotemporal machine learning framework for automated prediction and analysis of long-term Land Use/Land Cover dynamics is presented. The framework includes: (1) harmonization and preprocessing of spatial and spatiotemporal input datasets (GLAD Landsat, NPP/VIIRS) including five million harmonized LUCAS and CORINE Land Cover-derived training samples, (2) model building based on spatial k-fold cross-validation and hyper-parameter optimization, (3) prediction of the most probable class, class probabilities and model variance of predicted probabilities per pixel, (4) LULC change analysis on time-series of produced maps. The spatiotemporal ensemble model consists of a random forest, gradient boosted tree classifier, and an artificial neural network, with a logistic regressor as meta-learner.
View Article and Find Full Text PDFAcross South America, the expansion of commodity land uses has underpinned substantial economic development at the expense of natural land cover and associated ecosystem services. Here, we show that such human impact on the continent's land surface, specifically land use conversion and natural land cover modification, expanded by 268 million hectares (Mha), or 60%, from 1985 to 2018. By 2018, 713 Mha, or 40%, of the South American landmass was impacted by human activity.
View Article and Find Full Text PDFSoil property and class maps for the continent of Africa were so far only available at very generalised scales, with many countries not mapped at all. Thanks to an increasing quantity and availability of soil samples collected at field point locations by various government and/or NGO funded projects, it is now possible to produce detailed pan-African maps of soil nutrients, including micro-nutrients at fine spatial resolutions. In this paper we describe production of a 30 m resolution Soil Information System of the African continent using, to date, the most comprehensive compilation of soil samples ([Formula: see text]) and Earth Observation data.
View Article and Find Full Text PDF