Research Article |
Corresponding author: Martina Zilioli ( zilioli.m@irea.cnr.it ) Academic editor: Michele Freppaz
© 2019 Martina Zilioli, Alessandro Oggioni, Paolo Tagliolato, Alessandra Pugnetti, Paola Carrara.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Zilioli M, Oggioni A, Tagliolato P, Pugnetti A, Carrara P (2019) Feeding Essential Biodiversity Variables (EBVs): actual and potential contributions from LTER-Italy. In: Mazzocchi MG, Capotondi L, Freppaz M, Lugliè A, Campanaro A (Eds) Italian Long-Term Ecological Research for understanding ecosystem diversity and functioning. Case studies from aquatic, terrestrial and transitional domains. Nature Conservation 34: 477-503. https://doi.org/10.3897/natureconservation.34.30735
|
The conceptual framework of Essential Biodiversity Variables (EBVs) aims to capture the major dimensions of biodiversity change by structuring biodiversity monitoring and by ruling data collection amongst different providers. Amongst the research infrastructures adopting and implementing the EBV framework, LTER-Europe - the European node of ILTER (International Long-Term Ecological Research) - follows the approach to compare site-based biodiversity observations within and across its networks. However, a synoptic overview of their contributions with EBVs-relevant data is still missing, since data are not made available for several reasons. In this paper, we assess the capacity of LTER-Italy, one of the richest and heterogeneous networks of LTER sites in Europe, to provide data to “Species Distribution” and “Species Abundance” EBVs without inspecting and downloading their contents. To this aim, we mine the EBVs information which is publicly structured and shared by LTER site managers through DEIMS-SDR, the LTER-Europe online metadata repository. We classify the sites according to two types of contributions: (i) the actual contribution, based on metadata of datasets and (ii) the potential contribution, based on metadata of sites. Through these assessments, we investigate if LTER-Italy monitoring activities can provide EBVs measures and which sites currently provide datasets. By comparing the two contributions, we pinpoint the factors hampering the accessibility of LTER-Italy data and suggest solutions to increase the discoverability and reusability of LTER-Italy EBVs measurements. The research provides the first overview of EBVs monitored in LTER-Italy and the corresponding data management practices, as well as an evaluation of the interoperability of this network with respect to other research organisations for legal and technical aspects.
Essential Biodiversity Variables, LTER-Italy, DEIMS-SDR, metadata analysis, research infrastructure assessment, EML, EMF
Despite its indisputable role for human well-being and for ecosystem functioning (
Proliferation of studies is not always accompanied by integration of data in decision-making (
The conceptual framework of Essential Biodiversity Variables (EBVs) was endorsed by the CBD (
Although ongoing efforts are undertaken to align monitoring programmes to the EBV concept, the capacity of LTER networks to deliver relevant data has not been described yet, even if reported by authors of EBVs studies (
Integration of biodiversity datasets from multiple sources is one of the current challenges faced by ecological informatics. It requires the use of standardised measurement protocols, the adoption of common data standards, ontologies, the creation of controlled vocabularies (
At the same time, metadata compiled in standardised forms are fundamental for aggregation of biodiversity datasets. Metadata support different processes of data integration, by facilitating the discovery and the reuse of generated data to other scientists (
The EBVs framework is a theory-driven and academic approach to biodiversity monitoring. On the one hand, it helps to attain consensus on what is essential to monitor and where to focus the limited financial resources to assure the assessment of biodiversity change (
To be reliable, the above-mentioned description of LTER EBVs-relevant data has to pinpoint how the data can be integrated without missing the identification of all the potential sources of the research infrastructure considered. In fact, although facilities such as global IT aggregators (e.g. GBIF) or e-Science infrastructures (e.g. LifeWatch) increase the access to different users, scientists apply restrictions to data (e.g. commercial use) by limiting access and confidential sharing practices hamper the review of their contents. Moreover, the lack of funding for data curation and publishing activities limit their sharing through digital archives.
The objective of our study is to demonstrate the capacity of LTER-Italy to provide EBVs data through the analysis of its metadata resources, by considering that: (I) data (e.g. the dataset itself) has not always been published for several reasons; (II) not all LTER sites measure the biodiversity components, but monitoring occurs according to the ecological research focus of the programme.
To free the analysis of the LTER network from data inspection and to identify the specific causes of restricted access to datasets, we examined EBVs information structured in metadata of LTER sites and datasets published by site managers in the Dynamic Ecological Information Management System – Site and Data Set Registry (DEIMS-SDR), that is the most comprehensive catalogue of field observations sites in environmental research networks (
Metadata analysis overview. The diagram illustrates the activities required to perform the metadata analysis. The collection of EBV information from metadata is articulated in three steps which are followed by EBVs actual and potential contribution assessment for Species Abundance (abridged as “SA”) and Species Distribution (abridged as “SD”).
The LTER-Italy network is the Italian node of LTER-Europe and consists of 104 sites registered on DEIMS-SDR. It is the richest, amongst the European national LTER networks, with respect to the number of sites and it is one of the more heterogeneous for monitored ecosystems (
To avoid redundancy, we also excluded from our analysis the metadata from 23 Italian macrosites, as every macrosite aggregates the metadata of the sites it groups, which are individually analysed.
Hence, we analysed the metadata related to 72 sites and, in particular, we selected only those which compiled the metadata element “eLTER Parameter” (illustrated in the following subsection) which constitutes an informative tagging of the research activities of a site. The resulting set of sites is our statistical data sample and amounts to 43 sites.
For the purpose of the present study, metadata of datasets and sites in LTER-Italy, stored in DEIMS-SDR, are queried: The two metadata models (DEIMS-SDR Metadata Models), in which these metadata are structured, are the Site Metadata Model (SMM) and the Data Set Metadata Model (DSMM). Both models contain elements referable to EBVs that allow us to assess whether a site can be an EBVs data provider and if a dataset can be reused (e.g. for its aggregation with other EBVs measurements). While the first model contains explicit references to EBVs, for the second we had to establish which elements should have been taken into consideration in our analysis. To this aim, we followed the metadata requirements described by
EBVs information can be explicitly found from SMM in the “eLTER Parameters” element, whose content is a list of keywords from a hierarchically structured controlled vocabulary. The vocabulary is related to the LTER framework for standard observations (
Organisation of EBVs-related keywords for the metadata element “eLTER Parameter”. As an example, the Figure illustrates the tree structure for the Marine realm. The Figure shows the metadata field “Object (taxon)”, associated with the eLTER Parameters element and analysed for LTER-Italy metadata. According to the realm selected, specific taxonomic terms are exposed. Empty circles provide the branches illustrated. Light blue circles are not expanded in the Figure.
As shown in Figure
DSMM does not instead contain information explicitly referred to EBVs.
For this reason, we analysed the DSMM to identify elements suitable for the provision of the information suggested by
Table
The mapping was obtained by analysing the model and selecting suitable elements to provide the information considered and by checking them with compiled metadata.
Information suitable for building EBV data products mapped to DEIMS-SDR DSMM elements. The table illustrates elements which report information on EBV dimensions, attributes and uncertainties. The name of the related fields appears between parentheses while “ND” is used when elements to report the information are missing in the model.
EBV Dimensions ► | Taxonomy | Space | Time |
---|---|---|---|
EBV Attributes and Uncertainties ▼ | |||
Extent | |||
(e.g. how many and which species are documented; sampling locations, satellites; length of time series, continuous recording, time period of collection of records) | Taxonomic coverage - Biological classification (field_bio_classification) | Geographic (field_related_sites) | Temporal extent (field_date_range) |
OR Keywords (field_keywords_envthes) OR Title (field_title) | |||
Resolution | |||
(e.g. species, genus, higher taxonomic level; volume, resolution of satellite sensors; time window of sampling; sampling frequency) | Keywords (field_keywords_envthes) | Abstract (field_abstract) | Sampling time span (field_sampling_time_span) |
OR Title (field_title) | |||
Measurement units | |||
(e.g. taxonomic entity for which species distribution and abundance data are sampled; metres, cubic metres, degrees; hours, days, months, years, decades) | Title (field_title) | Method (field_related_links) | Minimum sampling unit (field_minimum_sampling_unit) |
OR Abstract (field_abstract) | |||
Uncertainties | |||
(e.g. wrongly recorded coordinates; precision of time of collection; identification and observation uncertainty differences in taxon concepts) | ND | ND | ND |
Information suitable for building EBV data products mapped to DEIMS-SDR DSMM elements. The table associates workflow steps required to build EBV-usable datasets to DSMM elements carrying the appropriate information. The name of the related field appears between parentheses.
Workflow Step ▼ | DSMM elements ▼ |
---|---|
Identify and import raw data and associated metadata (1) | Data set Title (field_title); |
DOI (field_doi), Online location (field_online_locator) | |
Check data-sharing agreements and licences (2) | Principal and granted permission (field_access_use_termref) |
Intellectual rights (field_dataset_rights) | |
Check data completeness and consistency (3) | Quality assurance |
(field_quality_assurance) |
The EBV information, described in the previous section, was collected for every site of LTER-Italy and structured in a database. The steps that we followed to collect the EBV information from the metadata elements, both of the sites and of the datasets, are presented below:
1. The investigator accesses the metadata through the public web interface of DEIMS-SDR and reads the content of the selected metadata elements exposed in a human-readable format. Through DEIMS-SDR, it is possible to read sites and datasets metadata shared by the network and, in particular, the values for the elements synthesised in Table
Site Metadata | Data Set Metadata |
---|---|
Site name | Dataset title |
eLTER parameters (Biodiversity (EBV) – Object (taxon)) | Related site |
Abstract | |
Keywords | |
Access and use constraints | |
Intellectual rights | |
Online distribution | |
Geographic | |
Temporal extent | |
Taxonomic coverage |
2. The investigator records the values of the variables under consideration for every site in a database. This database constituted the groundwork from which we derived the descriptive statistics presented in this study: it is publicly available in the form of a spreadsheet (
3. The investigator uses the database to identify two lists of sites:
a) the list of sites declaring SA and SD activities, obtained from site metadata;
b) the list of sites exposing SA and SD related datasets, obtained from dataset metadata;
The total number of LTER-Italy sites of list a) are used to measure the EBVs Potential Contribution (PC) of the network; the total number of LTER-Italy sites of list b) are used to measure the Actual Contribution (AC) of the network.
We measure the potential capacity of LTER-Italy to provide SA (or SD) data as the number of sites monitoring the selected variable against the total number of sites in our sample, as formalised in the following formula:
where PCv (LTER-Italy) is the Potential Contribution of LTER-Italy to EBV variable v. EBV in the formula represents the set of all EBVs: In our study, we are limiting v only to SA (Species Abundance) or SD (Species Distribution). SMv is the number of Sites with the site metadata compiled for variable v. Stotal is the number of sites taken into consideration as described in the “Case study” subsection.
We measure the actual capacity of LTER-Italy to provide SA (or SD) usable data as the number of sites providing at least one dataset metadata compiled for the selected variable against the total number of sites in our sample (Stotal):
where ACv (LTER-Italy) is the Actual Contribution of LTER-Italy to EBV variable v. In our study, v is limited to SA (Species Abundance) or SD (Species Distribution) amongst other possible EBVs. SDMv is the number of Sites with at least one dataset metadata in which one of the elements reported in Table
For the considered LTER-Italy sites, we also imported the values of metadata elements belonging to the “Data management” and “Data sharing policies” sections, which contain additional information about data handling and sharing practices. We decided to enrich the EBV information retrieved through eLTER parameters with that describing the data management practices exposed in Site Metadata, so as to identify the researcher’s attitude towards sharing data with external users. Although these data management practices are declared by the site managers in relation to their whole activity and not specifically referred to EBVs, we consider this information suitable for describing technological characteristics of the site (e.g. storage media and formats used, web services created and general policies applied to ecological data) and helpful to explain discrepancies between PC and AC.
For this assessment, we selected the following elements of SMM:
The application of the methodology to the LTER-Italy case study resulted in the outcomes presented in this section.
The Potential Contributions from LTER-Italy are:
PCSA (LTER – Italy) = 53%
PCSD (LTER – Italy) = 42%
It is possible to group the sites in accordance to the biome they declare to monitor as explained in subsection “Mapping EBV information”. Figures
EBVs coverage for Terrestrial Biome sites of LTER-Italy (total number 8). The height of the bars represents the number of Terrestrial sites that are claimed to measure the selected variable.
EBVs coverage for Marine Biome sites of LTER-Italy (total number 13). The height of the bars represents the number of Marine sites that are claimed to measure the selected variable.
EBVs coverage for Rivers and Lakes Biome sites of LTER-Italy (total number 10). The height of the bars represents the number of Freshwaters sites that are claimed to measure the selected variable.
The figures profile each biome-specific group with respect to the whole set of EBVs and each bar counts the number of sites declaring activities related to the corresponding EBV. Hence, through this analysis of metadata, we can compare our main analysed potential contributions (for SA and SD) with other EBVs monitored by LTER-Italy sites.
For marine biome, LTER-Italy accounts for 10 sites as potential providers for SA and SD EBV measures, which represent 23% of the sample in both cases. For terrestrial biome, LTER-Italy accounts for 8 and 7 sites for SA and SD EBV, respectively, which represent 19% and 16%; for the River and Lakes biome, the network accounts 5 (SA) and 1 (SD) sites, i.e. 12% and 2%, respectively.
SA and SD are the most measured EBVs. We can distinctly consider the total number of sites for every biome and restrict the analysis to them. In this case, the evaluation of the potential contribution to each biome is: 100% for SA, 88% for SD in Terrestrial biome; 77% for SA, 77% for SD in Marine biome; 50% for SA, 10% for SD in Rivers and Lakes biome.
Although there is a high number of biodiversity monitoring sites for both EBVs in each biome, the analysis suggests the presence of bias in long-term monitoring of biodiversity for EBVs, different from SA and SD. In fact, “Genetic composition” is an under-represented EBV class as only one site in LTER-Italy provides measures for “Allelic diversity” EBV; moreover, with respect to the six GEO BON classes that groups EBV (see “Introduction” section), while the Marine and Terrestrial biomes of Italy can be potentially described with 4 of the 5 class of EBVs, the Rivers and Lakes biome can be potentially described by data which cover only 2 out of 5 classes. By considering together SA and SD, monitoring sites which are potentially able to provide useful data are 72% of our sample.
The Actual Contributions from LTER-Italy are:
ACSA (LTER – Italy) = 14%
ACSD (LTER – Italy) = 14%
The two contributions are the same because all the dataset metadata are referred to as “Species Abundance”, thus providing measures of presence and absence of species which are useful also for “Species Distribution”.
We can expand this numeric result with some further consideration on EBV dimensions (taxonomy, space, time) and attributes (extent, resolution, measurement unit, uncertainties), trying to evince the adequacy of metadata with respect to those discussed in section “Materials and Methods”.
Figure
Metadata completeness with respect to EBV dimensions (taxonomy, space and time) and attributes (extent, resolution and measurement units) as mapped in Table
Regarding the observed taxonomic groups, metadata are accessible for plankton (phytoplankton, zooplankton) and vascular plants. The orientation of the network to focus on these taxonomic groups is confirmed by the analysis of site metadata by which we observe that 29 sites can provide abundance measures for Phytoplankton and 13 sites for Zooplankton. Even if, in 100% of cases, the metadata element providing information for the taxonomy extent is compiled, the terms used do not belong to the ecological controlled vocabulary: Identification of organisms is given through free texts defining heterogeneous groups of taxonomic categories. Traditional methods (e.g. vegetation surveys, cells counting) are used to provide data along different spatial and temporal extents as described in “Materials and Methods”.
Metadata are provided for long time-series datasets covering about 25–30 years or shorter periods. The 78% of metadata illustrated a sampling frequency time of five months, but resolution is provided by 56% and measurement units are not provided for 90% of metadata. In 100% of cases, sampling areas are carefully georeferenced through the metadata element “Geographic”, reporting information about the spatial extent with altitudes and bounding coordinates provided by geotagging devices. However, also in this case, resolution and measurement units are provided in 56% and in 44% of metadata, respectively.
Figure
Metadata completeness with respect to the information mapped in Table
Licences and data-sharing agreements are applied to 82% of datasets through the metadata element “Principal and granted permission”. In particular, there are distinctions in licensing based on intended use of datasets (for research, for public). For research uses, the actual granted permissions are “Free for access and use upon request” and “Free for access”, while for generic public uses, the “Other restrictions according to rules defined in intellectual rights” are applied by the providers and more finely defined by the metadata field “Intellectual rights”. The field “Intellectual rights” is specified for 44% of datasets and, in the case of generic public uses, it almost always asks for “co-authorship on publications resulting from the use of dataset”. We found just one dataset with “No access” granted.
Data quality information is not provided for any dataset; hence no dataset appears to be EBV-usable at metadata analysis level.
Figure
Data management practices associated with EBVs potential contributor sites. The Figure separately illustrates the relative percentages of sites for A policies applied to data B request formats for release data to external users C storage formats D storage location E web services used to make access to data.
Data storage location is “central” (i.e. in the server of an institution) for 79% of sites, while in 10% of sites, data are distributed amongst repositories of different institutions and, in 11% of sites, data are distributed within the same institution (i.e. multiple places for data within the organisation that maintains and manages data).
With respect to storage format, 62% of the sites organise their data in structured files or spreadsheets, while 21% of sites declare their management of spatial datasets. Finally, dataset’s proprietary formats are chosen by only 7% of sites.
Services for data access are not specified by 72% of sites while 14% exploit standard web services and 7% declare sharing its datasets through a generic “data portal”.
A general preference for offline release of data, that explains the Actual Contribution results, is evident in the analysis of the data request format: only 10% of sites give online access to data, while 90% of sites prefer to be contacted by telephone or mail for giving access to data.
Finally, focussing on the general data policy, the data usage must be acknowledged by 52% of sites through demand for co-authorship on publications resulting from the use of datasets; mutual agreement on reciprocal data sharing are required to data users in 7% of cases only, while information is not provided at all by 14% of sites.
Researchers and policy-makers are called to take joint actions to face biodiversity emergencies, as highlighted by the growing demand for readily accessible data that can be integrated and analysed in support of political decisions (
For the discussion of results, it is important to consider the following. First, LTER research is driven by specific scientific questions, posed by individual scientists or groups. These programmes are typically decentralised, rarely harmonised at global level and unevenly distributed geographically (
Through the analysis of EBV information derived from metadata, we described the potential and actual contribution of LTER-Italy to provide EBV related datasets for collection and mobilisation of SA and SD measures (
In fact, while 53% of sites potentially provide SA and 42% of sites SD data (Figure
LTER-Italy potential and actual contribution sites. LTER-Italy sites which potentially supply SA and SD site-based, long-term measures are represented with a placeholder in A while sites which currently provide SA and SD metadata for primary datasets are represented in B.
Our metadata analysis suggests that community-related reasons are the factors which can explain the gap between the network’s potential and actual capacity, thus providing clues to making data more accessible. Although several studies highlight that scientists often do not make their data available in digital form, for reasons including insufficient time and lack of funding (
In such a context of limited online access to data, well-compiled metadata are even more necessary. Different types of metadata can compensate for the choice to regulate access to data, by supplying information for discovering and mining EBV information.
Different from the data management workflow described in
Second, metadata can be useful to identify thematic focus of any network (not only LTER) exposing metadata in DEIMS-SDR. In fact, through metadata analysis, we assess that LTER-Italy conducts biodiversity measures through different numbers of sites in every realm. Marine and terrestrial biomes are described with a higher number of EBV classes (5 and 4, respectively) with respect to freshwaters biome (2 classes) and with different frequencies for each EBV. SA and SD are the most measured EBVs, but the analysis shows that not all the sites provide these measures. The result can direct financial resources to activate monitoring activities, at least by volunteers of local communities through citizen science projects which present several advantages over traditional in situ field surveys for the collection of SA and SD data (
The analysis of site metadata can provide spatial and temporal coverage, sampling frequency and monitored taxa, without the need for exploring related data, thus facilitating the planning of harmonised research activities at network scale. The method highlights, in fact, the capacity of the network in supplying data for taxa groups which are less monitored than invertebrates or vascular plants, towards which there is a bias described in the EBV-related literature (
With respect to other worldwide providers, we conclude that LTER-Italy can contribute to SA and SD measures and that interoperability to integrate them with other data is partly achieved at two levels (
– Legal interoperability, which occurs at metadata level, where general data policies applied from sites, principal and granted permissions, as well as intellectual rights related to datasets, are specified.
– Technical interoperability, which occurs at metadata level and is assured by the DEIMS-SDR IT infrastructure, which allows the export of EBVs metadata in standard schema.
Nevertheless, these two levels are not fully achieved because (i) LTER-Italy dataset metadata just partially report how to allow the reuse of data without directly contacting owners and (ii) the implementation of mapping DEIMS-SDR metadata models to standard schemas needs to be completed. For these reasons, the next section is dedicated to suggestions for the improvement of both the IT infrastructure and the data provider support system, in order to expand the visibility of LTER sites with respect to SA and SD measures.
The EBV concept should become the window into biodiversity observation systems upon which researchers, managers and decision-makers can better interact. Related web resources aid the streamlining of the EBV information exchange amongst different stakeholders insofar as its discovery and reuse are assured. The synoptic, comprehensive and harmonised overview of the set of local research which resulted by mining this information is of particular importance for LTER observational design purposes, as monitoring programmes need to be more coordinated and improved through sites’ collaborations. This paper suggests a method, based on metadata analysis, to reveal capacities and gaps in these networks with disparate focuses on ecology to provide EBVs measurements. Since the present analysis exploits metadata of field observations, harmonised through the EBV concept and described in the DEIMS-SDR repository, it can be applied to every research organisation using this information system (e.g. Murgia Alta EcoPotential site does not belong to LTER-Italy, but its site managers can benefit from DEIMS-SDR metadata models to expose information), by offering an approach both to coordinate monitoring schemes for primary data collection and to evenly assess the role of Biodiversity Research Infrastructures (BRIs) (
Our results demonstrate a documented capacity to provide essential measures at two different levels of interoperability through the information system DEIMS-SDR, but underline the need to support the community and to optimise the EBV information retrieval to improve the assessment and hence the effectiveness of LTER as an observing system. The analysis behind this work also allows us to provide some recommendations regarding the tools proposed for the LTER network. As discussed, DEIMS-SDR can be exhaustively consulted only through the user interface and provides information on the attributes of each of the EBV dimension (see Table
In order to provide the same analysis for different LTER networks or for a set of sites (e.g. those based on networks or projects), we would suggest:
1. to formally structure EBVs information both in SMM and in DSMM to (i) give visibility to those sites which choose to restrict the online data-sharing and (ii) to enable the automating of EBV information analysis through specific metadata elements. Particularly, we suggest:
I. to complete the implementation of the mapping between DSMM and EML schema, following that which is described in
II. to improve the description of datasets and their discovery: the DSMM should provide a field where the corresponding EBV or eLTER parameter could be inserted, as it currently happens for sites;
III. to map the values of eLTER parameter field (field_elter_parameters) in the field observedProperty of the EF metadata exposed by DEIMS. Currently, through EF schema, only the contents of the “Parameter” field (field_parameters_taxonomy), i.e. description of the observed parameters and parameter groups at the site, are provided without the hierarchy of details for methods and instrumentations provided by eLTER parameters;
2. to ensure metadata completeness through curation staff to create a legacy of well-designed and documented long-term observations. In fact, the process of creating and publishing metadata is relatively new amongst scientists despite its value in domains like ecology, where metadata improve the reusability of data. For example, protocols and instruments information are needed to assure interpretation of data over time and to allow comparisons when different methods are adopted. Metadata compilation is error prone (
The activities described in this paper have been partially funded by the Italian Flagship Project RITMARE, eLTER H2020 project, NextData, e-biodiversity RI Lifewatch Italy.