Corresponding author: Yicheng Wang ( ywangqau@163.com ) Academic editor: Yiannis Matsinos
© 2018 Yicheng Wang, Hayri Önal, Qiaoling Fang.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Wang Y, Önal H, Fang W (2018) How large spatially-explicit optimal reserve design models can we solve now? An exploration of current models’ computational efficiency. Nature Conservation 27: 17-34. https://doi.org/10.3897/natureconservation.27.21642
|
Spatially-explicit optimal reserve design models select best sites from a set of candidate sites to assemble nature reserves to protect species (or habitats) and these reserves display certain spatial attributes which are desirable for species. These models are formulated with linear 0–1 programming and solved using standard optimisation software, but they were run on different platforms, resulting in discrepant or even conflicting messages with regard to their computational efficiency. A fair and accurate comparison of the convenience of these models would be important for conservation planners who use these models. In this article, we considered eight models presented in literature and tested their computational efficiency using randomly generated data sets containing up to 2000 sites. We focused on reserve contiguity and compactness which are considered crucial to species persistence. Our results showed that two of these models, namely
nature reserve design, mixed integer programming, computational efficiency, contiguity, compactness, spatial optimisation
Anthropogenic activities have caused tremendous impacts on ecosystems all over the world. These impacts include but are not limited to environmental pollution, habitat loss and fragmentation, invasion of exotic species and climate change (e.g.
The reserve design problem is defined as selecting a subset from a larger set of candidate sites to assemble nature conservation reserves to provide adequate protection to targeted species or habitats. This problem has been studied using various approaches, including site scoring (e.g.
The early optimisation formulations of the reserve design problem adapted the set covering and maximal covering frameworks well known in the operations research literature (
Several modelling approaches have been introduced in the reserve design literature to incorporate spatial aspects in site selection. Due to their differences in structure and size, those models differ in their computational performance. Moreover, individual models were run on different platforms with different CPU speeds and solved using different MIP solvers and, therefore, it is not clear how those models compare to each other in terms of their computational efficiency. A fair and accurate comparison of the convenience of these models would be important for on-the-ground conservation practitioners. This is the main motivation for this article. For this purpose, we selected eight optimisation models incorporating (or could incorporate easily by proper modification) contiguity and compactness and conducted computational experiments with these models by running them on the same computer and using the same data sets that were generated randomly. We aimed to answer two questions: i) how do these models perform in solving contiguity and compactness problems with different sizes and ii) what are the likely factors affecting their computational efficiencies?
We carried out the tests on a Lenovo desktop computer with Intel Pentium (R) CPU of 2.8 gigahertz and RAM of 8.0 gigabyte and the operation system was Windows 64-bit. As the MIP solver, GUROBI 5.0 integrated in GAMS 23.9 was used.
Table
Mechanisms used to ensure contiguity and algebraic formulae of the eight models used in the computational efficiency tests.
Some of the models were not originally proposed for reserve design purposes or did not consider the cost of site selection. We made necessary modifications to those models so that they are applicable to the reserve design problem we address here. The objective of all the models is stated as the minimisation of the total cost of site selection, namely:
where Ui is a binary variable which equals 1 if site i is selected and 0 otherwise and ci is the cost of selecting site i.
Those models, not originally proposed for reserve design problems, are extended by adding the species coverage constraint described below:
where ei is the population size of target species in site i; and p is the targeted (minimum viable) population of target species. This constraint is supposed to ensure a long-term survival chance for target species.
The
where Xij is a binary variable which equals 1 if an arc directs from i to neighbour j and 0 otherwise and Ni is the set of i’s neighbour sites. For details, see
We first generated ten hypothetical data sets. Each data set considers a 10*10 grid partition including one hundred sites. The distribution of species over these sites was generated by using a uniform distribution function with the range of [0, 5], which means that the population size (ei) in each site takes an integer value between 0 and 5. We generated the selection cost (ci) similarly using the range of real numbers [1, 10]. We assigned a different value to the target population size (p) in each data set, where p is calculated by p=α*n where n=100 is the number of candidate sites and α is a population coefficient ranging from 0.25 to 2.5 with an increment of 0.25 (thus, a population between 25 and 250 individuals must be protected in the reserve and a larger α implies a larger reserve area). By designing reserves with different areas, we intend to test the impact of the number of selected sites on computational efficiency.
To eliminate the impact of input data on the test result, for each value of α we generated 50 data sets, each with a different distribution of species and selection cost across the 100 sites and used in a different run. To save time in the test and avoid tying the computer to a long computational time, in each run we imposed a one-hour maximum processing time limit. The total time limit for all runs was limited to 2 hours (i.e. the programme was terminated if it took more than 2 hours to complete the 50 runs). Average CPU times in completed runs were reported.
The
Minimize (7)
where Xki is a binary variable which equals 1 if site i is selected into the reserve which is centred at site k and dki is the distance between centre site k and selected site i. The first part of (7) is the total distance between the cluster centres and selected sites and the second part is selection cost. These two components can be given different weights to reflect their importance in site selection; for computational purposes, we used the same weights (=1.0).
In the
s.t.:
where Wij is a binary variable which equals 1 when sites i and j are both selected and 0 otherwise. The first part of (8) represents reserve compactness while the second part is the cost of site selection. Constraint (9) relates Wij to the site selection variables.
The computational results of the 100-site contiguity problem are summarised in Table
Computational efficiency of alternative contiguity models in solving a 100-site problem.
Models | Population size to be covered in the reserve | ||||
---|---|---|---|---|---|
50 | 100 | 150 | 200 | 250 | |
|
136.9 | >3600.0 a | >3600.0 a | >3600.0 a | >2400.0 b |
|
9.0 | 50.8 | 20.2 | 3.0 | 0.1c |
|
64.2 | >969.1a | >477.1a | 47.0 | 0.1c |
|
>2024.3 a | >3600.0 a | >3600.0 a | >3600.0 a | >2400.6 b |
|
>505.6 a | >944.4 a | >824.2 a | >2405.3 a | >1224.9 b |
|
>1854.8 a | >2551.1 a | >2026.5 a | >123.5 a | 0.2 c |
|
>1905.5 a | >3600.0 a | >3600.0 a | >311.8 a | 0.9 c |
|
12.0 | 5.3 | 3.3 | 4.2 | 6.6 c |
The models of
The
All models, except the
Having observed that, in the 100-site test problems, the
The
The
Computational efficiency of
Models | αa | n=200 (10*20) | n=500 (20*25) | n=1000 (25*40) | n=1500 (30*50) | n=2000 (40*50) | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Model size b | Time (s) | Model size b | Time (s) | Model size b | Time (s) | Model size b | Time (s) | Model size b | Time (s) | ||
|
2.5 | 2,361 1,682 | 0.2 c | 6,141 4,322 | 0.5 c | 12,481 8,742 | 1.1 c | 18,831 13,142 | 2.4 c | 25,244 17,594 | 4.2 c |
2.0 | 4.0 | 26.5 | 108.3 | 383.5 d | e | ||||||
1.5 | >552.3 d | >2124.9 d | >3600.0 d | >3600.0 d | e | ||||||
1.0 | >3600.0 d | >3600.0 d | >3600.0 d | >3600.0 d | e | ||||||
0.5 | >1822.0 d | >3600.0 d | e | e | e | ||||||
Önal et al. ( |
2.5 | 40,001 39,263 | >3278.5 d | 250,001 248,093 | >3600.0 d | 1,000,001 996,133 | e | 2,250,001 2,244,163 | e | 4,000,001 3,992,183 | e |
2.0 | 2851.4 d | >3600.0 d | >3600.0 d | e | e | ||||||
1.5 | 2608.3 d | >3600.0 d | >3600.0 d | e | e | ||||||
1.0 | >2613.6 d | >3600.0 d | >3600.0 d | f | g | ||||||
0.5 | 1238.5 d | >3600.0 d | >3600.0 d | f | g |
In this section, we compare the computational efficiency of the
Comparison of W-CM model and Ö-CM model in solving contiguity and compactness problems with various sizes.
Models | αa | n=100 (10*10) | n=200 (10*20) | n=500 (20*25) | |||
---|---|---|---|---|---|---|---|
Model size a | Time (s) | Model size a | Time (s) | Model size a | Time (s) | ||
W-CM | 2.5 | 6,091; 5,772 | 1.4b | 22,261; 21,582 | 4.8 b | 130,891; 129,072 | 78.3b |
2.0 | 178.2c | >2401.6 d | >3600.0 c | ||||
1.5 | 1692.3c | >3600.0 d | >3600.0 c | ||||
1.0 | >3318.8c | >3600.0 d | >3600.0 c | ||||
0.5 | 90.6 | >3600.0 d | >3600.0 c | ||||
Ö-CM | 2.5 | 10,001; 19,443 | 2.4 b | 40,001; 39,263 | 1474.4 b | 250,001; 248,093 | >3600.0 c |
2.0 | 2.0 | 1360.7 c | >3600.0 c | ||||
1.5 | 1.9 | 1135.4 c | >3600.0 c | ||||
1.0 | 1.9 | 1073.4 c | >3600.0 c | ||||
0.5 | 1.7 | 767.6 c | >3600.0 c |
The W-CM model encountered difficulty for some population settings (α) when solving the 100-site problem. For α=2.5, the solution time was only 1.4 second, but for the settings α=2.0 and 1.5, it jumped to 178.2 and 1692.3 seconds. For α=1.0, only 3 runs were completed within the 2 hour total time limit. However, for α=0.5, the solution time was decreased dramatically to 90.6 seconds, showing the fast-slow-fast pattern observed when solving the contiguity problems. For n=200 and 500, the model completed only 2 runs within the 2 hour total time limit for most population settings. In general, larger models require more solution times. However, raising the population requirement level improved the computational efficiency. For example, for α=2.5, the average solution times were only 4.8 and 78.3 seconds for n=200 and n=500, respectively.
The Ö-CM model used only about 2 seconds in the 100-site problem for all population coefficient settings. When n=200, the model used about 20 minutes or less for all population settings, again better than the W-CM model. When n was increased to 500, this model encountered difficulty too, where only 2 runs could be completed within the 2 hour total time limit.
It is interesting that the Ö-CM model is computationally more efficient than the W-CM model although it has a larger size both in terms of the number of variables and the number of equations (see Table
Finally, we note that Finally, we note that theg time = 2124.9 seconds). The model completed only 2 runs when n=1000, 1500 and 2000. The worst performance was observed when α=1.0 and 0.5, where the model ran out of the 1 hour time limit almost in each run even with the smallest test problem (n=200).
The
In this section, we compare the computational efficiency of the
The W-CM model encountered difficulty for some population settings (α) when solving the 100-site problem. For α=2.5, the solution time was only 1.4 second, but for the settings α=2.0 and 1.5, it jumped to 178.2 and 1692.3 seconds. For α=1.0, only 3 runs were completed within the 2 hour total time limit. However, for α=0.5, the solution time was decreased dramatically to 90.6 seconds, showing the fast-slow-fast pattern observed when solving the contiguity problems. For n=200 and 500, the model completed only 2 runs within the 2 hour total time limit for most population settings. In general, larger models require more solution times. However, raising the population requirement level improved the computational efficiency. For example, for α=2.5, the average solution times were only 4.8 and 78.3 seconds for n=200 and n=500, respectively.
The Ö-CM model used only about 2 seconds in the 100-site problem for all population coefficient settings. When n=200, the model used about 20 minutes or less for all population settings, again better than the W-CM model. When n was increased to 500, this model encountered difficulty too, where only 2 runs could be completed within the 2 hour total time limit.
It is interesting that the Ö-CM model is computationally more efficient than the W-CM model although it has a larger size both in terms of the number of variables and the number of equations (see Table
Finally, we note that the Ö-CM model (which includes an additional distance component
in the objective function, see (7)) is computationally more efficient than the
Based on the computational experiments we conducted, we identified four important factors that affect the computational efficiency of MIP models in nature reserve design problems with contiguity and compactness considerations: 1) Model size, 2) The share of selected sites, 3) Model structure and 4) Input data. We discuss these factors below.
Model size is the most important factor affecting the computational efficiency of these models. As the model size (especially the number of binary variables) increases, solution time may increase substantially. Figure
Relationship between processing time and the number of candidate sites (
Our test runs showed that the solution times also depend significantly on the share of selected sites in the optimal solution, which in turn depends on the specification of the population coefficient (α). This can be seen clearly, for instance, in the results obtained with the
The
Model structure is the third important factor that affects computational performance of the models we tested. Although the number of variables and constraints in a model may be substantial, due to the algebraic model structure, a large number of those variables and constraints may actually be redundant. Those redundant elements are eliminated during the preliminary solution procedure (Presolve in some GAMS solvers such as CPLEX and GUROBI) and the model sizes are much smaller after the Presolve. This is one of the main reasons for the superior computational advantage of the
In addition, some algebraic forms may be integer friendly and some may not. For example, the graph-theoretic contiguity model introduced by
where Xij and Uj are binary variables as defined before and Nj is the set of j’s neighbour sites. This constraint states that if site j is selected, then at most four arcs can direct to it from its neighbour sites. The computational efficiency of the model can be improved substantially by modifying this constraint as in constraints (3) and (4). A comparison of the original
Significance of model structure in affecting the model’s computational efficiency (number of sites n=36. The size of population to be covered equals toαtimes n.).
It is clear that replacing constraint (10) with constraints (3) and (4) slashes the solution time. We performed a paired samples t-test for each value of α to test the significance of the difference between these two models’ processing times. The result reported a p value less than 0.05 for each α value, indicating that the difference between the processing times is statistically significant. This result should be attributed to the fact that a constraint like (10) is integer unfriendly while the constraints (3) and (4) improve the model’s computational efficiency by helping construct a unimodular constraint matrix (
Input data such as land price and species distribution across candidate sites are also crucial factors that can affect the model’s computational efficiency. Figure
Computational efficiency of the
The shortest time used by the
We should note that none of these factors alone can determine the performance of a MIP model, instead they affect the model’s performance together. Solution techniques have been designed to speed up the computational process, such as supplying a cut value (
We have compared the computational efficiencies of eight representative optimal reserve design models that explicitly incorporate contiguity and compactness considerations. We identified the models presented by
The current nature reserves, whether terrestrial or marine, may not be adequate in providing sufficient protection to biodiversity. For instance, only over 1.89% of the marine protected areas (MPAs) worldwide is covered by exclusively no-take MPAs, far from the Convention on Biological Diversity’s (CBD) Aichi Target 11 of 10% MPA coverage by 2020 and even further from the 30% target recommended by the IUCN World Parks Congress 2014 (IUCN 2018). It is almost certain that the current systems of nature reserves should be expanded. Acquiring more protected areas, however, would have to be done in a world where the human economy struggles to grow and the ecosystems and nonhuman species demands protection. Indeed, there is a conflict between economic growth and wildlife conservation (
The authors thank Dr. Falk Huettmann for his comments which greatly improve the quality of this manuscript. This work was supported by National Natural Science Foundation of China [71202096] and CREES Project [ILLU 05–0361].