Research Article |
Corresponding author: Klaus Henle ( klaus.henle@ufz.de ) Academic editor: Pavel Stoev
© 2025 Klaus Henle, Reinhard A. Klenke, M. Benjamin Barth, Annegret Grimm-Seyfarth, Diana E. Bowler.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Henle K, Klenke RA, Barth MB, Grimm-Seyfarth A, Bowler DE (2025) Challenges and opportunities for assessing trends of amphibians with heterogeneous data – a call for better metadata reporting. Nature Conservation 58: 31-60. https://doi.org/10.3897/natureconservation.58.137848
|
Over the last decades, the worldwide decline of amphibian populations has become a major concern of researchers and conservationists. Studies have reported a diversity of trends, with some species strongly declining, others remaining stable and still others increasing. However, only a few species have been monitored annually for a long period of time by specific monitoring programmes. Instead, there are many heterogeneous datasets that contain observations of amphibians from professional surveys as well as diverse citizen science and other voluntary surveys. The use of these data brings a number of challenges, raising concerns about their validity and use in ecological research and conservation. We assessed to what extent such heterogeneous occurrence data can provide information on the status and trends of amphibians by contrasting different approaches to overcoming challenges with the data, using the German state of Saxony as an example. We assessed the effects of data processing decisions to infer absences, the use of survey method information and the statistical model (generalised linear mixed-effect occurrence model [GLMM] versus occupancy-detection model) and compared the trends with expert opinions (Red Lists). The different data processing decisions mainly led to similar annual occupancy estimates, newts being an exception. Annual occupancy estimates were typically less certain when attempting to account for the effects of survey methods, which could be explained by many missing values on methods. Separate models for drift fence data reduced the uncertainty in the annual occurrence probability estimates of the GLMM models, but uncertainty remained high for occupancy-detection models. For both methods, strong peaks and troughs in the annual occupancy estimates occurred for several species, which were not biologically plausible. Some peaks align with periods of lower sampling effort and were probably caused by shifts in the sampling locations or target species amongst years. Only for three species (Bufotes viridis, Hyla arborea and Pelophylax esculentus) were the trend results consistent amongst approaches and with expert opinions. For most other species, some inconsistencies appeared amongst models or approaches, indicating that trend assessments are sensitive to analytical choices. While heterogeneous data have proved useful for other taxa, our results highlight the complexity of using them for amphibians. We strongly recommend better harmonisation of data collection and metadata documentation, including explicit absence data and, if available, abundance data, to enable more robust trend assessments in the future.
Amphibian conservation, Anura, citizen science data, data filtering, drift fence data, Generalised Linear Mixed model, Germany, occupancy-detection model, Saxony, survey methods, Urodela
Over the last three decades, the worldwide decline of amphibian populations has become a major concern of researchers and conservationists (
Many species of amphibians show substantial natural fluctuations in population size over years, which make it challenging to assess trends and isolate human impacts (
Large aggregated databases, as available for amphibians and many other taxonomic groups, are a compilation from a range of activities (citizen scientists, conservation organisations, research institutions, conservation agencies, voluntary surveys and others), usually without a common standard for data collection and documentation (
Despite these challenges, these aggregated databases often hold the most comprehensive information on the spatio-temporal patterns of species occurrences. Given the importance of knowledge on species trends to conservation decision-making, testing the use and limits of these heterogeneous data is a key research area [e.g.
In Germany, the last Red List assessment (
Location of our study region, the Federal State of Saxony, in Germany (grey region) and sampling locations within Saxony after the filtering steps of our analysis (black points, sampled at least twice since 1997).
We examined the value of the Amphibian species database of Saxony (
The study region consists of the federal state of Saxony in the central east of Germany (Fig.
We used a dataset of occurrence records collected in Saxony up to the year 2020 (Fig.
Time-series of the total number of survey visits per year (a) and split by detection method (b); note the log-scale for the bottom graph.
The data cover 14 species: the newts Lissotriton vulgaris and Triturus cristatus, and the anurans Bombina bombina, Bufo bufo, Bufotes viridis, Epidalea calamita, Hyla arborea, Pelobates fuscus, Pelophylax kl. esculentus, P. lessonae, P. ridibundus, Rana arvalis, R. dalmatina and R. temporaria. Some records were additionally available for Bombina variegata and L. helveticus, but they were excluded from the analysis due to identification uncertainty and low number of records, respectively. We also excluded Salamandra salamandra because this species never concentrates at breeding sites, which makes population assessment methodologically not comparable to the other species.
The database includes some metadata on detection/survey methods, but this was not standardised. Various methods were used to collect the data, including observation by acoustics or sight and of dead individuals, as well as various types of traps and drift fences and capture by hand. We grouped similar detection methods together to harmonise the method names and reduce the number of categories (Fig.
Some of the data rows needed to be removed for diverse reasons, often relying on specific expert knowledge of the database (LfULG, pers. comm.). We removed data with the source “LfULG: Amphibienkartierung, Zusammengefasste Nachweise” from 1997 as these indicate a duplicated summary of the previous data, which was used for compiling the publication of the latest amphibian distribution atlas for Saxony (
Overview of the complete dataset available before (in brackets) and after filtering the data for analyses.
Source | # Species | First year | Last year | Median year | # Sites | # Observations | # Survey visits |
---|---|---|---|---|---|---|---|
Official monitoring (FFH) | 14 (16) | 2004 (1993) | 2015 (2017) | 2007 (2007) | 385 (3326) | 12494 (20105) | 5367 (13939) |
Fence data | 14 (16) | 2001 (2001) | 2019 (2019) | 2014 (2015) | 115 (479) | 24176 (34073) | 9585 (15702) |
Other | 14 (16) | 1992 (1907) | 2019 (2019) | 2007 (1997) | 1395 (31355) | 52325 (144237) | 25421 (72655) |
As the data were mostly not collected as part of structured monitoring, there was no site identifier code (ID) in the dataset to identify which data were collected in the same place. However, repeated sampling is the backbone of long-term monitoring to assess change. We grouped together occurrence point data into sites, based on a cluster analysis applied to the geographic coordinates of the observations. The aim of this step was to remove uncertainty in data allocation to a particular site in the absence of IDs and to identify neighbouring sites that likely reflect combined surveys around the same natural area; for instance, wetlands or ponds. To do this, we first calculated a distance matrix between all points in metres. We then applied hierarchical clustering of the distances and cut the distances at 1000 m (diameter) reflecting the limited dispersal potential of most German amphibian species and upper limit of the likely distance of any single survey (
The remaining challenges with the species data were that: (1) typically only detections of species were recorded, i.e. absences were rarely recorded and only in recent years; (2) the target species group on any given survey was unknown; hence, the lack of reporting of a species may either reflect a true absence (a species was not present during a survey), a lack of detection (false absence) or reporting (a species was detected, but the observation was not recorded); (3) sampling effort (e.g. survey duration) was not recorded; and (4), as noted above, sampling method (e.g. trap or visual) was not always known (known for 77% of the observations). but was rarely reported for the earlier years (23% in 1997 versus 84% in 2019) (Fig.
To attempt to account for these issues, we employed the following approaches and compared possible decisions about how to account for the heterogeneity of the data:
First, following others using heterogeneous databases [e.g.
Second, we inferred absences using the available presence-only data of species observations. We needed to infer absences for our models, since changes in the total number of presences may either reflect changes in sampling or changes in true species occurrence. Inferring absences is also standard in species distribution models that define pseudo-absences as a reference against which to compare presences. A commonly-used method in species distribution models is the target-group background method – using observations of non-focal species assumed to be surveyed with similar methods or by similar projects and people to identify absences with the same pattern of spatial bias as the presence data (
We used a similar target-group background method to infer absences, based on reported presences of other species that were assumed to be within the same target group. In other studies, the target group was defined by which species tend to be reported together by the same specialist recording societies (
Third, we derived a proxy of sampling effort, since absences may still reflect lack of reporting or limited survey effort rather than true absence of a species at a site. Following others (
Fourth, in a separate analysis, we tested the value of using the metadata information on survey methods when deriving absences. This step aimed to account for the fact that species differ in how well they are sampled by the range of survey methods used for amphibians. We first calculated the main detection method of any survey visit, based on the mode detection method of all species observations on a given survey visit list. We then identified the most important detection methods for each species, based on those used in at least 5% of each species’ observations. Finally, we only inferred absences for a species when the main detection method of a survey visit was one of the important detection methods for that species. We additionally included ‘survey method’ as a covariate in the models (see below). For these analyses, we excluded observations without a reported survey method, which led to a considerable loss of data.
Finally, in another separate analysis, we created a subset of the dataset that included only observations collected by drift fences, based on information within the ‘project origin’ column. These analyses had to be restricted to data for the years since 2010, since drift fence surveys were not common (at least according to the available metadata) before then (Fig.
Collectively, all these decisions led to four main data subsets that are hereafter referred to as: (1) absences (based on all taxa); (2) absences (based on broad taxon group); (3) survey method and (4) drift fence data. In each case, we ran the statistical analyses discussed below.
We used two alternative approaches to analyse the occurrence data derived from the above processing steps. Each species was analysed in a separate model. Together, this meant four data subsets x two statistical models = eight analyses per species.
First, we used generalised linear mixed models (GLMM), similar to a ‘reporting rate model’ that has been used by others (
Second, we used occupancy-detection models, which have been successfully used in the analysis of similar heterogeneous data (
Note, while we used the same covariates in the GLMM and occupancy-detection models, they differ in fundamental ways. The occupancy-detection model can be used to predict the annual occupancy probability “of a site”, accounting for imperfect detection (i.e. that sometimes species are not detected at a site, even if they are present). By contrast, the GLMM predicts species occurrence probability “on any given survey visit” (i.e. site occupancy on a given date, under a given sampling method/list length). This means that we expect the two models to strongly differ in their absolute values of their predictions – the occupancy-detection site occupancy probability will typically be higher than the survey-visit occurrence probability, since the former adjusts for imperfect detection on a visit. Typically, the probabilities predicted by the occupancy-detection model are closer to conservation questions of interest about the probability that a species is present at a site. However, in either case, both probabilities can be used to reveal changes in species occupancy/occurrence across years.
Based on the annual occupancy/occurrence estimates from both models, we calculated long-term (1997–2019) and short-term trends (2010–2019) using simple linear regression models with year (as a continuous variable) as the only predictor and the occupancy/occurrence probabilities as the response, including the uncertainty of the probabilities (standard deviation of the estimate) as a measurement error term, using the brms library of R (
To check the plausibility of our model predictions, we compared the consistencies and discrepancies of the trend results amongst the different models and subsets of data used. We further compared qualitative results with the most recent Red List assessment of amphibians in Saxony (
The different data processing decisions – whether using all taxa or taxa on order level (newts or anurans) to infer absences or including survey method as a covariate – mostly led to similar annual occupancy/occurrence estimates (Fig.
Predicted occurrence proportions/occupancy probability for each species in each year between 1997 and 2019, based on the GLMM model and the occupancy-detection model. The occurrence/occupancy proportion is the predicted proportion of occupied sites. Ribbons show the 95% confidence intervals. Different colours refer to different data processing decisions (see Methods section for details). The difference between using all taxa or broad taxa to infer absence was unimportant for anurans; therefore, the lines for All taxa are hidden behind the lines for Broad taxa.
Predicted long-term trend (i.e. mean annual growth rates in occurrence/occupancy) for each species between 1997 and 2019. Each point shows the mean and 95% CI. The vertical dashed line represents the line of no change: points to the left indicate declines, while those to the right indicate increases. Species are ordered by the mean of their trend estimates across models/data subsets.
Compared to the GLMM model, the occupancy-detection led to less certain and more similar predictions across species in predicted occupancy proportions. However, it is important to note that these two models predict different responses: the occupancy-detection model predicted annual site occupancy, while the GLMM model predicted occurrence on any given survey visit. However, in our results, the GLMM models were more able to separate common (e.g. Bufo bufo) and rare species (e.g. Epidalea calamita) than the occupancy-detection models, probably because of the extra uncertainty added by estimating detection probabilities within the occupancy-detection models.
For both methods, strong peaks and troughs in the annual occupancy estimates occurred for some species. Some peaks align with periods of lower sampling effort (e.g. compare 2006/2007 in Figs
The predictions of long-term trends in occupancy, accounting for uncertainty in annual estimates in a Bayesian framework (
The most recent expert assessment of trends of amphibians in Saxony applies to the period 1990/2000–2013 (LfULG, pers. comm.) and is called short-term trends, but covers a similar period as our long-term trends (1997–2019). This assessment did not identify any species that was increasing, P. kl. esculentus remained stable and the other 12 species (of the ones also assessed by us) were judged as declining (
Comparison of the model-based long-term trends in Saxony and the expert-based judgement of decline in Saxony for the period 1990/2000–2013 (
Species | Model-based trend | Expert judgment |
---|---|---|
Bombina bombina | None or increasing | Strong decline |
Bufo bufo | Increasing | Decline, but strength unknown |
Bufotes viridis | Decreasing | Strong decline |
Epidalea calamita | None or uncertain | Strong decline |
Hyla arborea | Decreasing | Strong decline |
Lissotriton vulgaris | None or increasing | Strong decline |
Pelobates fuscus | None or increasing | Decline, but strength unknown |
Pelophylax kl. esculentus | None or increasing | No trend |
Pelophylax lessonae | None | Decline, but strength unknown |
Pelophylax ridibundus | None or decreasing | Decline, but strength unknown |
Rana arvalis | None | Decline, but strength unknown |
Rana dalmatina | Decreasing or increasing | Decline, but strength unknown |
Rana temporaria | Decreasing or increasing | Decline, but strength unknown |
Triturus cristatus | Decreasing or increasing | Decline, but strength unknown |
We compared predictions from models built on fence data, the most standardised continuous source of data (
The separate models for the fence data somewhat increased the uncertainty in the annual occupancy estimates of the GLMM models compared to models that combined all the other data (Fig.
Predicted occupancy/occurrence proportions for each species in each year between 2010 and 2019, based on the GLMM model and the occupancy-detection model. The occupancy/occurrence proportion is the predicted proportion of occupied sites. Ribbons show the 95% confidence intervals. Different colours refer to different data processing decisions (see Methods section for details).
The occupancy-detection models generally led to more uncertain trend predictions, but the direction of trends was mostly the same between the GLMM and occupancy detection models both for the fence and non-fence data (Fig.
Estimates of short-term trends in occurrence/occupancy between 2010–2019 when based on drift fence surveys versus other sampling methods. Each point shows the mean and 95% CI of the mean annual change estimate. The vertical dashed line represents the line of no change. Species with less than 20 records from drift fences were excluded. Species are ordered by the mean of their trend estimates across models/data subsets.
Given the serious decline of amphibians (
Assessing trends was partly limited by the uncertainty in the model predictions. Results for annual occupancies showed considerable uncertainty for the majority of species and subsets of data, both for GLMM and occupancy detection models. An exception was for the GLMM models for the short-term trends for the non-fence data and partially also for the fence data. Presumably, the large uncertainty is due to heterogeneity of the sampling methods that could not be completely accounted for in the models and the considerably reduced dataset when survey methods were accounted for because many datasets lacked this information and, thus, had to be excluded.
Another challenge affecting trend assessment was the strong interannual fluctuations. We found strong peaks and troughs in the occupancy predictions (Fig.
When occurrence databases are visualised and publicly shared as species atlas maps, they can influence the behaviour of data collectors, for example, by either targeting areas of high species richness or filling gaps in areas with no recorded presences. Recent projects have taken advantage of the potential to influence sampling site location by producing maps of sampling priorities. While there is not a consensus on the best way to define sampling priorities, this ‘adaptive sampling’ approach has been tested in simulation experiments (
Methods for monitoring amphibians have changed over time and are affected by incentives and effort (
Another source of sampling bias is monitoring peaks in years of intense targeted surveys as, for example, for the regular (every six years) reporting for the European Fauna-Flora-Habitat (FFH) Directive (Council Directive 92/43/EEC). The peaks between 2004–2006 and 2010–2012 in Fig.
More positive and fewer negative trend estimates were found with the occupancy-detection model including survey method as a covariate than in the GLMM models in the long-term assessments. The same tendency was apparent for the different approaches to infer absences and in the short-term trend assessments. For three species (Triturus cristatus, Rana dalmatina, R. temporaria), this led to contrasting trend predictions. They were positive in the occupancy-detection models that included survey method as a covariate in all three species and no trend or negative for the other models with the exception of the GLMM model with survey method included for R. temporaria. All three species were regarded as strongly declining in the expert assessment (
The trend direction was consistent amongst models for five out of 14 assessed species: increasing for Bufo bufo, no trend for Pelophylax lessonae and Rana arvalis and declining for Bufotes viridis and Hyla arborea. For the latter two species, the experts regarded the decline as strong and for the other three species as declining of unknown strength. For B. viridis, the German-wide Red List (
The same applies to Epidalea calamita, which thus also should have shown a strong decline in our assessments. However, our model-based assessments indicated an absence of a trend. This likely is due to the fact that the loss of breeding habitats (ephemeral waterbodies) usually results in an end of surveys of the lost sites and an absence of relevant data in the Species Record Database of Saxony, ZenA. The lack of information of the loss of habitats as a cause of the termination of time series of data is a general problem for all large-scale species distribution databases. Experts may adjust their judgement based on such experience. However, due to the lack of reporting absences, such information is not quantified, which means it is not possible to reliably account for the contribution of such losses to the decline of a species at larger scale, regardless of method. We highly recommend adapting databases to incorporate such information and to encourage users to add such information. Without that, we have to wait until sufficiently accurate annual remote sensing data of habitat loss is available against which we can compare the time-series data for all monitored sites. Adjustments of databases and sampling protocols for obligatory entering the loss of surveyed sites into the database can be achieved more rapidly at lower costs.
For one of the remaining species, Pelophylax esculentus, our models and expert opinion indicated no trend, except for our occupancy detection model when survey method was included as a covariate. In that case, there was a particularly strong apparent trough with a rapid drop in occupancy from almost 1 to almost 0. Presumably, this pattern was caused by shifts in survey methods that could not be accounted for despite using survey method as covariate. We do not believe that identification uncertainty contributed to any shift in estimated occupancy since, in Saxony, no pure P. lessonae populations are known (
For three other species (Lissotriton vulgaris, Bombina bombina, Pelobates fuscus), our models also indicated either no trend or increasing trend, while expert opinion was a decline. The positive trend for the newt L. vulgaris in our models may contain upward bias as surveys of amphibians shifted towards urban environments in several databases in recent years, which tend to lead to over-optimistic estimates in species common in human settlements (
A major limitation of our trend estimates, as well as the data informing the expert based Red Lists, is that the metric of change is based on the occupancy of (clusters of) sites, which change slower than abundance and is, thus, less sensitive in revealing changes, especially declines. Some species might be declining in local density, which is, however, not detectable yet by changes in occupancy. For example, in insects, strong declines can be shown in terms of biomass (
Declines in abundance have been found also for ubiquitous amphibian species, such as B. bufo in various parts of Europe (
Furthermore, in the Saxonian and German-wide Red List assessments, occupancy was not modelled (
A model can only be as good as the data. Clearly, our results show that the lack of standardised monitoring data represents a major challenge for estimating species trends and supports the development of more standardised data collection and more precise documentation of methodology, as recommended in
Greater standardisation and method transparency would also benefit Red List assessments (
Platforms such as eBird have proved successful for bird recording, allowing individual surveyors, often citizen scientists and other volunteers, to collect data in different ways, but importantly, encouraging key features of the survey to be reported. Metadata within the ZenA database has already improved since recent records often have known survey methods (e.g. type of trap used), but this is still a problem when using older data for assessment of change. While we argue for more standardised data collection, a single survey method will not be suitable to survey the whole community. Hence, multiple survey methods are needed – see
Metadata documentation could radically help the use of these data for trend assessment (
Additionally, to document the dynamics of amphibian populations, sampling needs to be undertaken at known breeding sites as well as at sites that contain suitable habitat, but have not had any previous recorded breeding. Both these survey types can be recorded in the same database, provided the right metadata structure is available for documentation. Key metadata are: (1) ability to record zero event surveys (i.e. when a survey was undertaken and not a single individual of any target species was seen to ensure documentation of colonisations and extinctions; (2) ability to record abundance of individuals when a site is found to be occupied and (3) ability to record life stage to document whether there are signs of reproduction at the site (
We also recommend that the recording community for amphibians come together to develop harmonised survey standards and consider how reporting protocols should be tailored for amphibians where appropriate (
Large-scale, long-term presence databases collated by a range of different contributors have become increasingly available and used for assessing trends in species. Our analyses of such a database for amphibians from Saxony, Germany, showed high sensitivity of trend predictions to analytical choice: type of statistical model, methods to infer absences, sub-setting of the data and co-variables. Substantial changes in survey intensity, methods used, spatial shifts in surveys and a lack of sufficient metadata for much of the survey data create major challenges for reliable trend predictions. In this regard, amphibians may be more challenging than other taxonomic groups, such as birds or butterflies, because they are a diverse group monitored with many different methods by diverse kinds of people. Still several of these challenges exist also for other taxonomic groups [e.g.
To the extent possible, we should even push for coordination and harmonisation of methods across regions. Databases of metadata of monitoring schemes and monitoring organisations, such as those created by the EuMon project and revised by the ADVANCE project (
We gratefully acknowledge the support of iDiv and the strategic project ‘sMon’ funded by the German Research Foundation (grant number DFG–FZT 118, 202548816). The Landesamt für Umwelt, Landwirtschaft und Geologie, Freiberg, kindly extracted the data for amphibians from their Zentrale Artdatenbank (the species distribution database for Saxony) and especially Ulrich Zöphel and Holger Lueg from the LfULG provided valuable background information about the database and about the last Red List assessment for amphibians in Saxony. We gratefully acknowledge the many contributors of data to the Zentrale Artdatenbank, as without their work, no analysis would have been possible. We also thank two anonymous reviewers for their very valuable recommendations that improved our manuscript. We also thank the language editor and the copy editor for spotting a few typing and language errors.
The authors have declared that no competing interests exist.
No ethical statement was reported.
This work was supported by iDiv and the strategic project ‘sMon’ funded by the German Research Foundation (grant number DFG–FZT 118, 202548816).
Conceptualization: KH, AGS, DEB. Data curation: MBB. Formal analysis: DEB. Funding acquisition: RAK. Methodology: KH, RAK, DEB. Project administration: RAK, KH. Software: DEB. Validation: KH. Visualization: DEB. Writing – original draft: KH, DEB. Writing – review and editing: AGS, MBB, RAK.
Klaus Henle https://orcid.org/0000-0002-6647-5362
Reinhard A. Klenke https://orcid.org/0000-0002-6860-8085
Annegret Grimm-Seyfarth https://orcid.org/0000-0003-0577-7508
Diana E. Bowler https://orcid.org/0000-0002-7775-1668
All of the data that support the findings of this study are available in the main text or Supplementary Information.
Clustering of sites
Data type: docx
Explanation note: Example set of sampling points showing the assigned clustering of neighbouring points into clusters (indicated by colours) that was used as a site identifier in the analysis.