Research Article 
Corresponding author: Yicheng Wang ( ywangqau@163.com ) Academic editor: Yiannis Matsinos
© 2018 Yicheng Wang, Hayri Önal, Qiaoling Fang.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Wang Y, Önal H, Fang W (2018) How large spatiallyexplicit optimal reserve design models can we solve now? An exploration of current models’ computational efficiency. Nature Conservation 27: 1734. https://doi.org/10.3897/natureconservation.27.21642

Spatiallyexplicit optimal reserve design models select best sites from a set of candidate sites to assemble nature reserves to protect species (or habitats) and these reserves display certain spatial attributes which are desirable for species. These models are formulated with linear 0–1 programming and solved using standard optimisation software, but they were run on different platforms, resulting in discrepant or even conflicting messages with regard to their computational efficiency. A fair and accurate comparison of the convenience of these models would be important for conservation planners who use these models. In this article, we considered eight models presented in literature and tested their computational efficiency using randomly generated data sets containing up to 2000 sites. We focused on reserve contiguity and compactness which are considered crucial to species persistence. Our results showed that two of these models, namely
nature reserve design, mixed integer programming, computational efficiency, contiguity, compactness, spatial optimisation
Anthropogenic activities have caused tremendous impacts on ecosystems all over the world. These impacts include but are not limited to environmental pollution, habitat loss and fragmentation, invasion of exotic species and climate change (e.g.
The reserve design problem is defined as selecting a subset from a larger set of candidate sites to assemble nature conservation reserves to provide adequate protection to targeted species or habitats. This problem has been studied using various approaches, including site scoring (e.g.
The early optimisation formulations of the reserve design problem adapted the set covering and maximal covering frameworks well known in the operations research literature (
Several modelling approaches have been introduced in the reserve design literature to incorporate spatial aspects in site selection. Due to their differences in structure and size, those models differ in their computational performance. Moreover, individual models were run on different platforms with different CPU speeds and solved using different MIP solvers and, therefore, it is not clear how those models compare to each other in terms of their computational efficiency. A fair and accurate comparison of the convenience of these models would be important for ontheground conservation practitioners. This is the main motivation for this article. For this purpose, we selected eight optimisation models incorporating (or could incorporate easily by proper modification) contiguity and compactness and conducted computational experiments with these models by running them on the same computer and using the same data sets that were generated randomly. We aimed to answer two questions: i) how do these models perform in solving contiguity and compactness problems with different sizes and ii) what are the likely factors affecting their computational efficiencies?
We carried out the tests on a Lenovo desktop computer with Intel Pentium (R) CPU of 2.8 gigahertz and RAM of 8.0 gigabyte and the operation system was Windows 64bit. As the MIP solver, GUROBI 5.0 integrated in GAMS 23.9 was used.
Table
Mechanisms used to ensure contiguity and algebraic formulae of the eight models used in the computational efficiency tests.
Model number  Mechanism to ensure contiguity  Algebraic formula  References 

1  An arc can direct from node i to neighbour node j only if the value assigned to i is less than the value assigned to j, so an arc directing away from a node will not return to it and therefore loops are prevented.  (1)(6); 

2  Two trees are formed, one in primal graph, the other in dual graph, the interwoven structure of these two trees prevents the formation of loops.  (1)(6); (2)(4) in 

3  One selected site serves as a sink, every other selected site provides one unit of supply to the flow and all flows finally reach the sink.  (1) and (2); (1)(3) in 

4  The tail length of a selected site equals the number of arcs linking to it and that length is strictly larger than that of the site’s preceding sites.  (1)(2); (3)(7) in 

5  Solve a relaxed problem first with no contiguity constraints. If a cycle occurs, apply an inequality (a cut) to that cycle and solve again until there is no cycle in the solution.  (1)(6); (8) in 

6  An amount of flow is injected into the partition from an outside node, each selected site consumes one unit of flow, residual flow is absorbed by a predefined variable.  (1) and (2); (5)(8) in 

7  Capital flows from the supply node to transshipment nodes and for each selected site, capital can come into it from only one of its adjacent sites.  (1) and (2); (2)(7) in 

8  If a site is selected then at least one of its adjacent sites, which is nearer (structurally or functionally) to the centre, must also be selected.  Minimize ;(2),(4),(6) and (8) in 

Some of the models were not originally proposed for reserve design purposes or did not consider the cost of site selection. We made necessary modifications to those models so that they are applicable to the reserve design problem we address here. The objective of all the models is stated as the minimisation of the total cost of site selection, namely:
Minimize
(1)where U_{i} is a binary variable which equals 1 if site i is selected and 0 otherwise and c_{i} is the cost of selecting site i.
Those models, not originally proposed for reserve design problems, are extended by adding the species coverage constraint described below:
(2)
where e_{i} is the population size of target species in site i; and p is the targeted (minimum viable) population of target species. This constraint is supposed to ensure a longterm survival chance for target species.
The
where X_{ij} is a binary variable which equals 1 if an arc directs from i to neighbour j and 0 otherwise and N_{i} is the set of i’s neighbour sites. For details, see
We first generated ten hypothetical data sets. Each data set considers a 10*10 grid partition including one hundred sites. The distribution of species over these sites was generated by using a uniform distribution function with the range of [0, 5], which means that the population size (e_{i}) in each site takes an integer value between 0 and 5. We generated the selection cost (c_{i}) similarly using the range of real numbers [1, 10]. We assigned a different value to the target population size (p) in each data set, where p is calculated by p=α*n where n=100 is the number of candidate sites and α is a population coefficient ranging from 0.25 to 2.5 with an increment of 0.25 (thus, a population between 25 and 250 individuals must be protected in the reserve and a larger α implies a larger reserve area). By designing reserves with different areas, we intend to test the impact of the number of selected sites on computational efficiency.
To eliminate the impact of input data on the test result, for each value of α we generated 50 data sets, each with a different distribution of species and selection cost across the 100 sites and used in a different run. To save time in the test and avoid tying the computer to a long computational time, in each run we imposed a onehour maximum processing time limit. The total time limit for all runs was limited to 2 hours (i.e. the programme was terminated if it took more than 2 hours to complete the 50 runs). Average CPU times in completed runs were reported.
The
Minimize (7)
where X_{ki} is a binary variable which equals 1 if site i is selected into the reserve which is centred at site k and d_{ki} is the distance between centre site k and selected site i. The first part of
(7) is the total distance between the cluster centres and selected sites and the second part is selection cost. These two components can be given different weights to reflect their importance in site selection; for computational purposes, we used the same weights (=1.0).In the
(8)
s.t.:
(9)
where W_{ij} is a binary variable which equals 1 when sites i and j are both selected and 0 otherwise. The first part of (8) represents reserve compactness while the second part is the cost of site selection. Constraint (9) relates W_{ij} to the site selection variables.
The computational results of the 100site contiguity problem are summarised in Table
Computational efficiency of alternative contiguity models in solving a 100site problem.
Models  Population size to be covered in the reserve  

50  100  150  200  250  

136.9  >3600.0 ^{a}  >3600.0 ^{a}  >3600.0 ^{a}  >2400.0 ^{b} 

9.0  50.8  20.2  3.0  0.1^{c} 

64.2  >969.1^{a}  >477.1^{a}  47.0  0.1^{c} 

>2024.3 ^{a}  >3600.0 ^{a}  >3600.0 ^{a}  >3600.0 ^{a}  >2400.6 ^{b} 

>505.6 ^{a}  >944.4 ^{a}  >824.2 ^{a}  >2405.3 ^{a}  >1224.9 ^{b} 

>1854.8 ^{a}  >2551.1 ^{a}  >2026.5 ^{a}  >123.5 ^{a}  0.2 ^{c} 

>1905.5 ^{a}  >3600.0 ^{a}  >3600.0 ^{a}  >311.8 ^{a}  0.9 ^{c} 

12.0  5.3  3.3  4.2  6.6 ^{c} 
The models of
The
All models, except the
Having observed that, in the 100site test problems, the
The
The
Computational efficiency of
Models  α^{a}  n=200 (10*20)  n=500 (20*25)  n=1000 (25*40)  n=1500 (30*50)  n=2000 (40*50)  

Model size ^{b}  Time (s)  Model size ^{b}  Time (s)  Model size ^{b}  Time (s)  Model size ^{b}  Time (s)  Model size ^{b}  Time (s)  

2.5  2,361 1,682  0.2 ^{c}  6,141 4,322  0.5 ^{c}  12,481 8,742  1.1 ^{c}  18,831 13,142  2.4 ^{c}  25,244 17,594  4.2 ^{c} 
2.0  4.0  26.5  108.3  383.5 ^{d}  e  
1.5  >552.3 ^{d}  >2124.9 ^{d}  >3600.0 ^{d}  >3600.0 ^{d}  e  
1.0  >3600.0 ^{d}  >3600.0 ^{d}  >3600.0 ^{d}  >3600.0 ^{d}  e  
0.5  >1822.0 ^{d}  >3600.0 ^{d}  e  e  e  
Önal et al. ( 
2.5  40,001 39,263  >3278.5 ^{d}  250,001 248,093  >3600.0 ^{d}  1,000,001 996,133  e  2,250,001 2,244,163  e  4,000,001 3,992,183  e 
2.0  2851.4 ^{d}  >3600.0 ^{d}  >3600.0 ^{d}  e  e  
1.5  2608.3 ^{d}  >3600.0 ^{d}  >3600.0 ^{d}  e  e  
1.0  >2613.6 ^{d}  >3600.0 ^{d}  >3600.0 ^{d}  f  g  
0.5  1238.5 ^{d}  >3600.0 ^{d}  >3600.0 ^{d}  f  g 
In this section, we compare the computational efficiency of the
Comparison of WCM model and ÖCM model in solving contiguity and compactness problems with various sizes.
Models  α^{a}  n=100 (10*10)  n=200 (10*20)  n=500 (20*25)  

Model size ^{a}  Time (s)  Model size ^{a}  Time (s)  Model size ^{a}  Time (s)  
WCM  2.5  6,091; 5,772  1.4^{b}  22,261; 21,582  4.8 ^{b}  130,891; 129,072  78.3^{b} 
2.0  178.2^{c}  >2401.6 ^{d}  >3600.0 ^{c}  
1.5  1692.3^{c}  >3600.0 ^{d}  >3600.0 ^{c}  
1.0  >3318.8^{c}  >3600.0 ^{d}  >3600.0 ^{c}  
0.5  90.6  >3600.0 ^{d}  >3600.0 ^{c}  
ÖCM  2.5  10,001; 19,443  2.4 ^{b}  40,001; 39,263  1474.4 ^{b}  250,001; 248,093  >3600.0 ^{c} 
2.0  2.0  1360.7 ^{c}  >3600.0 ^{c}  
1.5  1.9  1135.4 ^{c}  >3600.0 ^{c}  
1.0  1.9  1073.4 ^{c}  >3600.0 ^{c}  
0.5  1.7  767.6 ^{c}  >3600.0 ^{c} 
The WCM model encountered difficulty for some population settings (α) when solving the 100site problem. For α=2.5, the solution time was only 1.4 second, but for the settings α=2.0 and 1.5, it jumped to 178.2 and 1692.3 seconds. For α=1.0, only 3 runs were completed within the 2 hour total time limit. However, for α=0.5, the solution time was decreased dramatically to 90.6 seconds, showing the fastslowfast pattern observed when solving the contiguity problems. For n=200 and 500, the model completed only 2 runs within the 2 hour total time limit for most population settings. In general, larger models require more solution times. However, raising the population requirement level improved the computational efficiency. For example, for α=2.5, the average solution times were only 4.8 and 78.3 seconds for n=200 and n=500, respectively.
The ÖCM model used only about 2 seconds in the 100site problem for all population coefficient settings. When n=200, the model used about 20 minutes or less for all population settings, again better than the WCM model. When n was increased to 500, this model encountered difficulty too, where only 2 runs could be completed within the 2 hour total time limit.
It is interesting that the ÖCM model is computationally more efficient than the WCM model although it has a larger size both in terms of the number of variables and the number of equations (see Table
Finally, we note that Finally, we note that theg time = 2124.9 seconds). The model completed only 2 runs when n=1000, 1500 and 2000. The worst performance was observed when α=1.0 and 0.5, where the model ran out of the 1 hour time limit almost in each run even with the smallest test problem (n=200).
The
In this section, we compare the computational efficiency of the
The WCM model encountered difficulty for some population settings (α) when solving the 100site problem. For α=2.5, the solution time was only 1.4 second, but for the settings α=2.0 and 1.5, it jumped to 178.2 and 1692.3 seconds. For α=1.0, only 3 runs were completed within the 2 hour total time limit. However, for α=0.5, the solution time was decreased dramatically to 90.6 seconds, showing the fastslowfast pattern observed when solving the contiguity problems. For n=200 and 500, the model completed only 2 runs within the 2 hour total time limit for most population settings. In general, larger models require more solution times. However, raising the population requirement level improved the computational efficiency. For example, for α=2.5, the average solution times were only 4.8 and 78.3 seconds for n=200 and n=500, respectively.
The ÖCM model used only about 2 seconds in the 100site problem for all population coefficient settings. When n=200, the model used about 20 minutes or less for all population settings, again better than the WCM model. When n was increased to 500, this model encountered difficulty too, where only 2 runs could be completed within the 2 hour total time limit.
It is interesting that the ÖCM model is computationally more efficient than the WCM model although it has a larger size both in terms of the number of variables and the number of equations (see Table
Finally, we note that the ÖCM model (which includes an additional distance component
in the objective function, see (7)) is computationally more efficient than the
Based on the computational experiments we conducted, we identified four important factors that affect the computational efficiency of MIP models in nature reserve design problems with contiguity and compactness considerations: 1) Model size, 2) The share of selected sites, 3) Model structure and 4) Input data. We discuss these factors below.
Model size is the most important factor affecting the computational efficiency of these models. As the model size (especially the number of binary variables) increases, solution time may increase substantially. Figure
Relationship between processing time and the number of candidate sites (
Our test runs showed that the solution times also depend significantly on the share of selected sites in the optimal solution, which in turn depends on the specification of the population coefficient (α). This can be seen clearly, for instance, in the results obtained with the
The
Model structure is the third important factor that affects computational performance of the models we tested. Although the number of variables and constraints in a model may be substantial, due to the algebraic model structure, a large number of those variables and constraints may actually be redundant. Those redundant elements are eliminated during the preliminary solution procedure (Presolve in some GAMS solvers such as CPLEX and GUROBI) and the model sizes are much smaller after the Presolve. This is one of the main reasons for the superior computational advantage of the
In addition, some algebraic forms may be integer friendly and some may not. For example, the graphtheoretic contiguity model introduced by
(10)where X_{ij} and U_{j} are binary variables as defined before and N_{j} is the set of j’s neighbour sites. This constraint states that if site j is selected, then at most four arcs can direct to it from its neighbour sites. The computational efficiency of the model can be improved substantially by modifying this constraint as in constraints (3) and (4). A comparison of the original
Significance of model structure in affecting the model’s computational efficiency (number of sites n=36. The size of population to be covered equals toαtimes n.).
It is clear that replacing constraint (10) with constraints (3) and (4) slashes the solution time. We performed a paired samples ttest for each value of α to test the significance of the difference between these two models’ processing times. The result reported a p value less than 0.05 for each α value, indicating that the difference between the processing times is statistically significant. This result should be attributed to the fact that a constraint like (10) is integer unfriendly while the constraints (3) and (4) improve the model’s computational efficiency by helping construct a unimodular constraint matrix (
Input data such as land price and species distribution across candidate sites are also crucial factors that can affect the model’s computational efficiency. Figure
Computational efficiency of the
The shortest time used by the
We should note that none of these factors alone can determine the performance of a MIP model, instead they affect the model’s performance together. Solution techniques have been designed to speed up the computational process, such as supplying a cut value (
We have compared the computational efficiencies of eight representative optimal reserve design models that explicitly incorporate contiguity and compactness considerations. We identified the models presented by
The current nature reserves, whether terrestrial or marine, may not be adequate in providing sufficient protection to biodiversity. For instance, only over 1.89% of the marine protected areas (MPAs) worldwide is covered by exclusively notake MPAs, far from the Convention on Biological Diversity’s (CBD) Aichi Target 11 of 10% MPA coverage by 2020 and even further from the 30% target recommended by the IUCN World Parks Congress 2014 (IUCN 2018). It is almost certain that the current systems of nature reserves should be expanded. Acquiring more protected areas, however, would have to be done in a world where the human economy struggles to grow and the ecosystems and nonhuman species demands protection. Indeed, there is a conflict between economic growth and wildlife conservation (
The authors thank Dr. Falk Huettmann for his comments which greatly improve the quality of this manuscript. This work was supported by National Natural Science Foundation of China [71202096] and CREES Project [ILLU 05–0361].