The National Eutrophication Survey: lake characteristics and historical nutrient concentrations
Historical ecological surveys serve as a baseline and provide context for contemporary research, yet many of these records are not preserved in a way that ensures their long-term usability. The National Eutrophication Survey (NES) database is currently only available as scans of the original rep...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2018-01-01
|
Series: | Earth System Science Data |
Online Access: | https://www.earth-syst-sci-data.net/10/81/2018/essd-10-81-2018.pdf |
Summary: | Historical ecological surveys serve as a baseline and provide
context for contemporary research, yet many of these records are not
preserved in a way that ensures their long-term usability. The
National Eutrophication Survey (NES) database is currently only
available as scans of the original reports (PDF files) with no
embedded character information. This limits its searchability,
machine readability, and the ability of current and future
scientists to systematically evaluate its contents. The NES data
were collected by the US Environmental Protection Agency between
1972 and 1975 as part of an effort to investigate eutrophication in
freshwater lakes and reservoirs. Although several studies have
manually transcribed small portions of the database in support of
specific studies, there have been no systematic attempts to
transcribe and preserve the database in its entirety. Here we use
a combination of automated optical character recognition and manual
quality assurance procedures to make these data available for
analysis. The performance of the optical character recognition
protocol was found to be linked to variation in the quality
(clarity) of the original documents. For each of the four archival
scanned reports, our quality assurance protocol found an error rate
between 5.9 and 17 %. The goal of our approach was to
strike a balance between efficiency and data quality by combining
entry of data by hand with digital transcription technologies. The
finished database contains information on the physical
characteristics, hydrology, and water quality of about 800 lakes in
the contiguous US (Stachelek et al.(2017), <a href="https://doi.org/10.5063/F1639MVD" target="_blank">https://doi.org/10.5063/F1639MVD</a>). Ultimately, this database could be
combined with more recent studies to generate meta-analyses of water
quality trends and spatial variation across the continental US. |
---|---|
ISSN: | 1866-3508 1866-3516 |