A taxonomy-free approach based on machine learning to assess the quality of rivers with diatoms.

Sci Total Environ

Department of Biology and Geobiotec - Geobiosciences, Geotechnologies and Geoengineering Research Centre, University of Aveiro, Campus de Santiago, 3810-193 Aveiro, Portugal.

Published: June 2020


Category Ranking

98%

Total Visits

921

Avg Visit Duration

2 minutes

Citations

20

Article Abstract

Diatoms are a compulsory biological quality element in the ecological assessment of rivers according to the Water Framework Directive. The application of current official indices requires the identification of individuals to species or lower rank under a microscope based on the valve morphology. This is a highly time-consuming task, often susceptible of disagreements among analysts. In alternative, the use of DNA metabarcoding combined with High-Throughput Sequencing (HTS) has been proposed. The sequences obtained from environmental DNA are clustered into Operational Taxonomic Units (OTUs), which can be assigned to a taxon using reference databases, and from there calculate biotic indices. However, there is still a high percentage of unassigned OTUs to species due to the incompleteness of reference libraries. Alternatively, we tested a new taxonomy-free approach based on diatom community samples to assess rivers. A combination of three machine learning techniques is used to build models that predict diatom OTUs expected in test sites, under reference conditions, from environmental data. The Observed/Expected OTUs ratio indicates the deviation from reference condition and is converted into a quality class. This approach was never used with diatoms neither with OTUs data. To evaluate its efficiency, we built a model based on OTUs lists (HYDGEN) and another based on taxa lists from morphological identification (HYDMORPH), and also calculated a biotic index (IPS). The models were trained and tested with data from 81 sites (44 reference sites) from central Portugal. Both models were considered accurate (linear regression for Observed and Expected richness: R ≈ 0.7, interception ≈ 0.8) and sensitive to global anthropogenic disturbance (Rs > 0.30 p < 0.006 for global disturbance). Yet, the HYDGEN model based on molecular data was sensitive to more types of pressures (such as, changes in land use and habitat quality), which gives promising insights to its use for bioassessment of rivers.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.scitotenv.2020.137900DOI Listing

Publication Analysis

Top Keywords

taxonomy-free approach
8
approach based
8
machine learning
8
sites reference
8
otus
6
based
5
reference
5
based machine
4
learning assess
4
assess quality
4

Similar Publications

The <3% dissimilar Amplicon Sequence Variant (ASV) clusters of the 18S-V4 barcode were used as species-proxies for the evaluation of ASV composition and ASV diversity indices characterizing the hitherto poorly investigated meiofaunal communities of the south-eastern part of the Levantine basin. Accompanied by abundance measurements, the relationships of these characteristics with sedimentary and bottom terrain parameters were interpreted. The construction of community composition profiles, namely ASVs' list and their estimated abundances, was done using our previously established procedure (Harbuzov et al.

View Article and Find Full Text PDF

Environmental DNA metabarcoding reveals a vast genetic diversity of marine eukaryotes. Yet, most of the metabarcoding data remain unassigned due to the paucity of reference databases. This is particularly true for the deep-sea meiofauna and eukaryotic microbiota, whose hidden diversity is largely unexplored.

View Article and Find Full Text PDF

Assessing the relevance of DNA metabarcoding compared to morphological identification for lake phytoplankton monitoring.

Sci Total Environ

March 2024

UMR CARRTEL, INRAE, Université Savoie Mont Blanc, 75bis av. De Corzent - CS 50511, FR - 74203 Thonon-les-Bains cedex, France; Pole R&D ECLA Ecosystèmes Lacustres, France. Electronic address:

Phytoplankton is a key biological group used to assess the ecological status of lakes. The classical monitoring approach relies on microscopic identification and counting of phytoplankton species, which is time-consuming and requires high taxonomic expertise. High-throughput sequencing, combined with metabarcoding, has recently demonstrated its potential as an alternative approach for plankton surveys.

View Article and Find Full Text PDF

Effective and standardized monitoring methodologies are vital for successful reservoir restoration and management. Environmental DNA (eDNA) metabarcoding sequencing offers a promising alternative for biomonitoring and can overcome many limitations of traditional morphological bioassessment. Recent attempts have even shown that supervised machine learning (SML) can directly infer biotic indices (BI) from eDNA metabarcoding data, bypassing the cumbersome calculation process of BI regardless of the taxonomic assignment of eDNA sequences.

View Article and Find Full Text PDF

Anthropogenic eutrophication is one of the most pressing issues facing lakes globally. Our ability to manage lake eutrophication is hampered by the limited spatial and temporal extents of monitoring records, stemming from the time-consuming and expensive nature of physiochemical and biological monitoring. Diatom-based biomonitoring presents an alternative to traditional eutrophication monitoring, yet it is restricted by the high degree of taxonomic expertise required.

View Article and Find Full Text PDF