Research Article |
Corresponding author: Shaun W. Molloy ( shaunecologist@gmail.com ) Academic editor: Bernd Gruber
© 2017 Shaun W. Molloy, Robert A. Davis, Judy A. Dunlop, Eddie J.B. van Etten.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Molloy SW, Davis RA, Dunlop JA, van Etten EJB (2017) Applying surrogate species presences to correct sample bias in species distribution models: a case study using the Pilbara population of the Northern Quoll. Nature Conservation 18: 27-46. https://doi.org/10.3897/natureconservation.18.12235
|
The management of populations of threatened species requires the capacity to identify areas of high habitat value. We developed a high resolution species distribution model (SDM) for the endangered Pilbara northern quoll Dasyurus hallucatus, population using MaxEnt software and a combined suite of bioclimatic and landscape variables. Once common throughout much of northern Australia, this marsupial carnivore has recently declined throughout much of its former range and is listed as endangered by the IUCN. Other than the potential threats presented by climate change, and the invasive cane toad Rhinella marina (which has not yet arrived in the Pilbara). The Pilbara population is also impacted by introduced predators, pastoral and mining activities. To account for sample bias resulting from targeted surveys unevenly spread through the region, a pseudo-absence bias layer was developed from presence records of other critical weight-range non-volant mammals. The resulting model was then tested using the biomod2 package which produces ensemble models from individual models created with different algorithms. This ensemble model supported the distribution determined by the bias compensated MaxEnt model with a covariance of of 86% between models with both models largely identifying the same areas as high priority habitat. The primary product of this exercise is a high resolution SDM which corroborates and elaborates on our understanding of the ecology and habitat preferences of the Pilbara Northern Quoll population thereby improving our capacity to manage this population in the face of future threats.
Northern Quoll, Pilbara, MaxEnt, biomod2, Sample Bias, Cane Toad, Threatened Species
Species distribution models (SDMs) use environmental data from known locations of a species to predict places where that species could potentially occur within landscapes or regions (
The accuracy of a SDM depends on such factors as the quality and appropriateness (in regard to sample size and representativeness) of the presence and/or absence data for the target species or community, the expertise of the modeller, the selection of an appropriate modelling tool (or software package), the selection of an appropriate suite of predictive/independent variables, the quality of the variable data used, and an acknowledgement of the strengths and limitations of the SDM (
In this study we set up a SDM to determine the potential distribution (PD) of the northern quoll Dasyurus hallucatus population found in the Pilbara biogeographic region of Western Australia (
Northern quolls are a suitable subject for distribution modelling as they have a strong habitat affiliation with complex rocky areas, often in close association with permanent water (
Once widely distributed from the Pilbara region of Western Australian (WA) across northern Australia to southern Queensland (Figure
The main WA populations of northern quoll occur in two discrete mainland regions, the Kimberley and Pilbara, separated by the arid Great Sandy Desert. Both mitochondrial DNA sequences and nuclear microsatellite loci reveal clear differentiation between Kimberley and Pilbara populations and a greater distinction between these populations than those in the Northern Territory and Queensland (
Given that the Pilbara population of the northern quoll is genetically and demographically distinct from all other populations, retains its pre-European genetic diversity, is currently outside of the cane toad’s distribution, and has much of its habitat still intact, this population has been assigned a high conservation, research and management priority (
The available presence data for this population is clustered around areas of development interest to the mining industry, or where targeted surveys have occurred. This begged the question: was this an example of sample bias or a true representation of northern quoll distribution? Sample bias, where sampling has not been uniform over the project area, e.g. where only easily accessed areas, or known populations have been sampled, has the potential to distort a SDM (
Our objective was to construct a high resolution SDM for the Pilbara northern quoll by applying an innovative form of bias compensation to a well proven modelling method, MaxEnt, and testing this SDM with an ensemble model.
The area modelled for this exercise is the Pilbara biogeographic region (Fig.
All presence data, both for northern quoll and surrogate species, was supplied by the West Australian Department of Parks and Wildlife NatureMap database (
To obtain optimum efficiency, minimize multicollinearity and prevent overfitting, the suite of variables used should be kept compact (preferably ≤10 in number) and be comprised of those variables best able to define the PD of the target species or community (
To avoid using unnecessary time and resources in identifying an appropriate suite of predictive variables, a two-stage process was adopted. The first stage used a series of statistical tests (described below) to halve the number of potential variables so as to quickly and effectively limit multicollinearity in the bioclimatic variables and to remove those variables for which their contribution to the model was low or counterproductive. The second stage was to use a stepwise elimination process to identify a final suite of predictive variables suitable for use in all further modelling.
Firstly, to reduce multicollinearity between scalar variables, we calculated both the Pearson and Spearman rank correlation coefficient between each pair of variables using the northern quoll presence data. This was done using the pairs function in the ‘psych’ R package (
The final cut was undertaken through a step-wise elimination process using MaxEnt (
For our primary modelling tool we selected MaxEnt for its capacity to produce effective SDMs using presence-only data (
Some limitations have been recognised with MaxEnt, notably a tendency for it to underperform where there is a biased sample, poorly chosen predictive variables or inadequate testing of results (
• Withholding a random 30% of presences for testing purposes over 10 bootstrapped repetitions (
• Combining all presences within a pixel (~1km2) as a single presence. This resulted in a reduction in the number of presences from 1984 to 324.
To compensate for our limited presence data we followed
To construct this bias grid, presence records for all non-volant CWR mammals (including northern quoll) in the Pilbara were gathered from the Department of Parks and Wildlife Nature Map data base (Department of Parks and Wildlife 2007-) and categorised into northern quoll presences and pseudo-absences. We note that, although this was a separate data set to that of the original presence data set, many presences were replicated. All records were then used to conduct a Point Density Analysis (PDA) using the Point Density function of the Spatial Analyst toolbox in ArcGIS 10.3. This function counted the total number of records for each cell within a 44 cell radius (the default radius). The resulting shapefile was then directly incorporated as a bias grid into the MaxEnt SDM.
As the above SDM was compiled using just one modelling tool and as different algorithms and methodologies can yield very different and often contradictory results, we opted to test the rigour of the preferred (MaxEnt) SDM using ensemble modelling techniques. This involved compiling a suite of different algorithms to construct multiple SDMs for the target species within a single platform and then combining these SDMs to produce a single ensemble, or composite, SDM (
The ensemble modelling was undertaken using the biomod2 package in the R platform (
We selected the five best performing modelling algorithms for our ensemble model. These were Generalised Linear Model (GLM), Generalised Additive Model (GAM), Generalised Boosted Model (GBM), Flexible Discriminant Analysis (FDA) and Multiple Adaptive Regression Splines (MARS). In running these a random 30% of presences would be used to calibrate the model and 70% of presences could be withheld for testing. This process was then repeated 10 times to add rigour to the results. Unlike MaxEnt, the biomod2 package does not provide an option to use a bias layer to compensate for sample bias. Therefore we applied the surrogate presence data set, from which we constructed the bias layer, as a substitute for true absences in our model imputs.
All outputs of all algorithms were evaluated with a True Skill Statistic (TSS), and Receiver Operator Characteristic (ROC - a test comparable with the MaxEnt’s AUC statistic) and combined. A weighting was given to each algorithm based on ROC performance and all model outputs were combined to produce a weighted mean SDM which we used as our biomod2 output.
Comparisons between the MaxEnt and biomod2 SDM were again made using the pairs function in the psych R package. This was done by compiling a point data set of 10,000 random points across the study area. This point data set was then used to extract values from both SDMS and the two resulting data sets compared through the pairs analysis.
The individual modelling packages used, their results and the results of the ensemble modelling process are given in Suppl. material
From the broad suite of variables tested (Suppl. material
Pairs analysis of predictive variables against northern quoll presences. Diagonal=variable name and histogram, left of diagonal= scatter plots and trend lines and right of diagonal gives Spearman rank correlation coeffient. Axis figure represent point values corresponding variables.
Final suite of variables with % contribution and permutation importance as determined through step-wise MaxEnt analyses. All contribution and importance values reflect positive relationships to northern quoll presence. Source data for all variables is available in Suppl. material
Variable | % Contribution | Permutation Importance |
---|---|---|
Vegag= Department of Agriculture and Food Western Australia Vegetation Mapping (Rangelands) | 35 | 15.5 |
DEM = Digital Elevation Model | 26.2 | 37.4 |
BIO18 = Precipitation of Warmest Quarter | 15.1 | 16.3 |
Slope= Terrain slope raster produced from the DEM | 11.5 | 14 |
BIO9 = Mean Temperature of Driest Quarter | 4.7 | 3.3 |
BIO19 = Precipitation of Coldest Quarter | 4.1 | 9.3 |
Water = Euclidean Distance to Water Courses | 3.4 | 4.2 |
The bias file (Figure
Bias grid GIS file created from pseudo absence data (presence records for critical weight range mammals). Red dots are Northern Quoll presences and yellow dots are other non-volant CWR species.
The MaxEnt SDM (Figure
The full outputs for the biomod2 modelling process are given in Suppl. material
The independent variables that we used were suitably diverse in nature with an acceptable level of covariance within the variables used in the final selection. The use of the jack-knife test to determine the final suite showed that the removal of any of the variables selected in the final suite would have compromised what was a strong model. The literature informs us that the number of variables used was appropriate for a modelling exercise of this nature (
As demonstrated (Table
The conservation of a threatened species requires good information about population locations and ecological requirements within its geographic range, particularly when threatening processes are ongoing. Applying appropriately selected surrogate species data to both the MaxEnt and biomod2 software packages has enabled us to develop SDMs which identify remarkably similar core areas of likely quoll habitat, as well as less optimal habitat that may only be occupied in favourable conditions. These apparently less favourable habitats may however be of high conservation value as current information suggests that all Pilbara northern quoll populations are genetically linked, and high level of dispersal occurs between geographically distant populations (
The SDM pairs analysis (Figure
In comparison with previous SDMs on this northern quoll population (
In this study we have demonstrated a methodology capable of addressing three of the more common problems associated with SDMs, specifically how to: 1) address bias in a high resolution SDM over a large and diverse landscape with a limited, and potentially biased, presence only data set; 2) selecting an appropriate suite of predictive variables for the construction of such a model; and 3) establish a means by which the suitability and outputs of a modelling tool can be verified. We have developed an innovative approach to constructing an SDM by pre-emptively identifying problems likely to arise due to data limitations and addressing these issues by reviewing the options available and selecting a combination of responses which minimise bias effects and meet the needs and constraints of the modeller.
We note that a true comparison between a model with randomly selected psuedo-absences and a bias-corrected model using a bias layer created from surrogate presences, remains the preferred way to demonstrate the usefulness of this form of bias compensation. However, in the absence of broad-scale sampling and ground truthing to test truth versus prediction (as is proposed) the demonstrated approach remains the most feasible form of bias compensation under the given circumstances.
Being aware of the resource and skill limitations preventing many conservation managers from constructing SDMs, this methodology was deliberately selected to meet these limitations. All data used was freely available and all software used was freeware, other than the use of a commonly used GIS package that could also be substituted with freeware. Skill levels were limited to what the authors considered average for an ecological project team, i.e. no modelling, statistical, GIS or programming specialists were required for this study.
In this exercise we produced a SDM that predicted areas where new populations and sub-populations of the northern quoll might be found outside of areas currently known to be habitat. By identifying areas of high habitat value, this SDM facilitates the identification and conservation of high priority habitat areas, potential translocation sites and potential movement corridors. As a consequence of having identified a sound suite of predictive variables, our understanding of the habitat requirements of the Pilbara population of the northern quoll has been increased. Finally, by comparing the distributions identified through this exercise with proposed mining and infrastructure projects in the environmental impact assessment process, this SDM can be used to minimise impacts on this unique and important northern quoll population.
The ensemble modelling process validates our choice of the MaxEnt model with bias file as the preferred SDM. However, this is a desktop exercise derived from a relatively small and uniform sample. This model should be validated and refined through on ground sampling and research. Our study exemplifies a preferred practice in the use of SDMs by corroborating our findings with a control, in this case one derived through an ensemble modelling approach. We commend this practice to modellers and caution against outcomes derived from only a single modelling approach. Our study has revealed the most comprehensive known refined distribution map for the endangered Northern Quoll. It has confirmed their reliance on rocky upland habitats and their limited distribution within the Pilbara region. These outcomes will be of great importance to land managers when considering the impacts of planned developments within the region.
GIS data sets used in variable assessments and map of Pilbara vegetation systems
Data type: distribution data
Full readout for the MaxEnt northern quoll SDM
Data type: statistical data
Weighted mean SDMs for individual algorithms and evaluation statistics (biomod2)
Data type: statistical data