Plant species diversity and composition in limestone forests of the Vietnamese Cat Ba National Park

Plant species diversity and composition play crucial roles in many ecosystem services and are largely influenced by environmental conditions, as well as natural and/or anthropogenic disturbances. However, our knowledge of the drivers of plant species diversity and composition in the limestone forests of Vietnam, a hotspot of biodiversity, is limited. To fill this knowledge gap, we surveyed plant species in the Cat Ba National Park (CBNP), located on a limestone archipelago. We hypothesised that: (1) topography, accessibility and spatial isolation drive the diversity and composition of plant communities in the CBNP and that (2) isolated areas contribute to high floristic regional diversity by supporting unique species assemblages. We expected high tree species diversity within the tropical limestone forests of the CBNP, but also that: (3) the abundance of non-tree species negatively affects tree regeneration diversity and abundance. Data were obtained from 90 random sample plots (500 m 2 ) and 450 sub-sample plots (25 m 2 ) in three areas was observed in communities growing under harsh site conditions. We conclude that plant species diversity in the CBNP is high, particularly in easily accessible lowland areas where tree species contribute greatly to biodiversity. However, here, non-tree species can even restrict tree regeneration.


Introduction
Plant species diversity and composition are important drivers of forest ecosystem functions and services. They provide habitat and resources for different taxonomic groups and food resources (e.g. fruits) for humans (Trejo and Dirzo 2002;Behera and Misra 2005;Seta et al. 2018). Plant species richness and species co-existence in forests are influenced by topography, soil characteristics and climate variables (temperature and precipitation parameters), as well as habitat heterogeneity and habitat size (Wright 2002;Bailey et al. 2010). In addition, natural and anthropogenic disturbances can contribute both to biodiversity gain and loss and to changes in species composition (Behera and Misra 2005;Mendes et al. 2016). Each forest type has unique characteristics in terms of both plant species diversity and its most important drivers. Thus, studies of distinct forest types and forest configurations can reveal new and unexplored patterns.
According to one fundamental ecological theory, environmental heterogeneity is crucial for the diversity, distribution and growth of plant species communities (Hutchinson 1957;Trichon 1997;Vivian-Smith 1997;Holl et al. 2013); differences in soil nutrients (Tateno and Takeda 2003;Yavitt et al. 2009) or micro-topographic variation (Trichon 1997;Vivian-Smith 1997;Tateno and Takeda 2003) lead to niche differentiation, inducing species co-existence. Plant species diversity also depends on habitat configuration, including geographic isolation or fragmentation (Quinn and Harrison 1988;Martin-Queller et al. 2017;Uhl et al. 2021). Isolated habitats can be characterised by distinct habitat conditions with a high diversity of specialised species (Quinn and Harrison 1988;Martin-Queller et al. 2017;Uhl et al. 2021). In contrast, Scheffer et al. (2006) found a negative effect of small habitat size and isolation on the biodiversity of shallow lakes and ponds in Britain due to limitations in habitat size, degree of connectivity and dispersal. In other studies, it was found that half of the native plant species richness was lost in small mixed dipterocarp forest fragments after a century of isolation (Turner et al. 1996). Therefore, the impact of isolation or fragmentation on plant species richness at local and regional scales must be explored in more detail for different forest types. This may be especially important for forest landscapes that are characterised by natural barriers, such as steep slopes and ridges or water bodies in island landscapes.
In addition to species diversity, interactions amongst vertical forest layers are important for forest development and functioning. The overstorey layer affects the understorey primarily via canopy cover (Berger and Puettmann 2000;Prieto et al. 2014) and litter quality (Zhang et al. 2010;Tsai et al. 2018). The overstorey layer provides favourable habitats for, for example, ferns, lianas, epiphytes, herbs and epiphyllous organisms (Mezaka et al. 2020). The understorey contributes to the total species diversity of a forest ecosystem (Linares-Palomino et al. 2009) and affects soil properties, evaporation and humidity. Tree species regeneration as part of the understorey influences the composition and structure of the overstorey layer over the long term (Fang et al. 2014;Thrippleton et al. 2018;Pham et al. 2020). Understorey plant species assemblages may compete with tree regeneration for water, nutrients and light or may facilitate regeneration by, for example, protecting small trees from browsers (Nuttle et al. 2013). However, the relationship between overstorey and understorey depends on the forest type and on environmental conditions and may, therefore, vary with landscape heterogeneity (Laska 1997;Tardella et al. 2017) or with the impact of former (natural or anthropogenic) disturbance history.
Tropical and subtropical forests are known as remarkably diverse ecosystems and are considered biodiversity hotspots (Fangliang et al. 1997;Both et al. 2011). Their immense species richness makes complete sampling coverage difficult and underscores the importance of every study's contribution to the characterisation of the diversity of these forest ecosystems (Cicuzza et al. 2013). Plant species diversity and composition also play important roles in socio-economics, environmental protection and the living conditions of people who depend on forests and their products, especially in tropical forests of Southeast Asia (including Vietnam; Sodhi et al. 2009;Sun et al. 2009). In Vietnam, 11,373 plant species, belonging to 2,524 genera, 378 families and seven phyla have been identified; these include many species important for medical use (FSIV 2009). In 2006, the forested area was 12,874 million ha (38% forest cover) comprising of 10,410 million ha of natural forests and 2,464 million ha of plantations (FSIV 2009). In addition, the Vietnamese government has established 30 national parks and protected areas to protect and preserve precious genetic resources within natural forests (Pham and Nguyen 2018). To date, floristic research in Vietnam has concentrated on a few national parks, nature reserves and forest types. Here, we focused on the Vietnamese Cat Ba National Park (CBNP). This study aims to increase our knowledge of plant species diversity and the distribution of plant species in limestone forests of Vietnam, a forest type poorly studied up until now. The CBNP is located on a limestone archipelago and differs greatly from the mainland national parks of Vietnam in having many isolated islands and valleys and it is not known how this landscape configuration contributes to the plant species diversity and composition of the National Park or if the configuration affects the relationship between the overstorey and the understorey vegetation including tree regeneration. By surveying plant species in three different study sites of the CBNP representing in total ca. 4,000 ha, we hypothesised that; (1) topography, accessibility and spatial isolation are the main drivers of differentiation amongst plant communities in the CBNP. Further, we assumed that: (2) isolated areas contribute to high floristic regional diversity of the CBNP by supporting unique species assemblages. We also expected high diversity of tree species, but also that: (3) the abundance of non-tree species can negatively affect tree regeneration diversity and abundance. We assumed that this negative effect would be greater in habitats characterised by adequate water and nutrient availability by inducing competition (Holmgren et al. 1997). A negative effect of the understorey on tree species regeneration could influence tree species diversity of the CBNP in the long term.
Our study aims to contribute to the knowledge of the autecology of different plant species and plant species assemblages within limestone forests of Vietnam as a potential basis for future monitoring and conservation programmes.
The CBNP is located on a limestone archipelago that consists of 366 islands and islets (CBNP 2005). The total CBNP area is 16,197 ha with ca. 10,932 ha of terrestrial island areas and 5,265 ha of marine areas (CBNP 2005(CBNP , 2007. The average elevation is We collected data at three different study sites: MSA, a mid-slope area; LLA, a lowland area; ISA, an isolated area; 4, 5 and 6 are other restricted areas in the CBNP that were not considered for this study (Pham et al. 2020). Map data copyrighted by OpenStreetMap contributors and available from https://www.openstreetmap.org (CC BY-SA 2.0). between 100 and 150 m a.s.l. with the highest peak of 331 m a.s.l. and average slopes ranging from 15° to 35° (CBNP 2007). The CBNP is characterised by two distinct seasons: the rainy season (from May to October) and the dry season (from November to April) with yearly rainfall from 1,500 to 2,000 mm. Average annual temperature is 23 °C and the average air humidity is approximately 86% (CBNP 2005;Le 2006).
Various ecosystems and forest types can be found in the CBNP and these include evergreen forests on limestone, wetland forest ecosystems in steep mountain valleys, mangrove forests, coral reefs and cave systems (CBNP 2005;Le 2006). The forest ecosystems include primary (undisturbed by direct human activity) evergreen broadleaf tropical rain forests, secondary evergreen broadleaf tropical forests in the lowlands (previously disturbed by human activity) and on limestone mountains, secondary moist evergreen restoration forests on limestone mountains and valleys, restored bamboo forests, wetland forests in the limestone valleys, mangrove forests and plantation forests (Carle and Holmgren 2003;CBNP 2007).
In 2004, Cat Ba Island was designated a UNESCO biosphere reserve because of its diverse flora and fauna (CBNP 2005). In total, 1,561 vascular plant species belonging to 842 genera and 186 families have been recorded across the different terrestrial ecosystems. Amongst them, 408 tree species have been observed in the CBNP in the past years by local authorities (CBNP 2007). Twenty-nine tree species listed in the IUCN Red List were recorded in the CBNP (Le 2006;CBNP 2007). Forty-three additional tree species are listed on the Vietnam Red List in terms of conservation (Le 2006;CBNP 2007). To allow the natural development of ecosystems and to protect precious plants and animals in the CBNP, six strictly protected zones were established, totalling 4,915 ha ( Fig. 1). In these protected areas, management activities, such as timber logging, hunting or slash and burn, are prohibited. We chose three of the six restricted areas for our study that represent the three main conservation areas of the National Park. These three areas are characterised by primary and secondary evergreen broadleaf tropical forest ecosystems (Fig. 1). These areas also represent an accessibility gradient. The lowland area (LLA) (1,916.4 ha) is in the centre of the Park and is the most easily accessible of the three. The mid-slope area (MSA) (600 ha) lies in the north-western part of the Park and is characterised by steeper slopes compared to the LLA. The third area is an isolated area (ISA) (1,557.8 ha) and is located on a separate island in the eastern part of the Park. Its island situation isolates ISA from the two other study areas (Fig. 1). Note that in Fig. 1, some local villages were included into the boundaries of MSA, making the study area larger than the actual forested area.

Data sampling
We used the simple random sampling technique (Kleinn et al. 2009) to set up sample plots (Fig. 2). Thirty strips were created in each research area. In each strip, random sample plots were generated using random numbers to determine their coordinates. Two uniform random numbers U 1i , U 2i (the U interval from 0 to 1) were used each time to calculate X i = U 1i * X max , with Y i = U 2i * Y max as coordinates for each random sample plot, where X max and Y max were the largest coordinates of the area map (Fig. 2). If the coordinate (X i , Y i ) appeared within the defined strip, this point was accepted as a sample plot point. Otherwise, the point was rejected and the procedure was repeated with two new U(s) random values (Fig. 2).
Based on this technique, a total of 90 random sample plots were created (30 plots in each of the three protected areas (LLA, MSA and ISA)). Each sample plot area was 500 m 2 in size (20 m × 25 m).

Overstorey tree layer
Within each sample plot, diameter at breast height (DBH) and height of all overstorey trees with DBH ≥ 5 cm were measured. We identified each individual tree to the species level. We defined species of the overstorey tree layer as tree species that now or in the future will be able to form the upper forest canopy. Botanists of the Northeast College of Agriculture and Forest (AFC) and from Cat Ba National Park assisted with identification. All recorded tree species were assigned to categories of threat according to the IUCN Red List categories (IUCN 2017).

Regeneration layer
We assigned individual trees, potentially able to reach the upper canopy in the future, with DBH < 5 cm to the regeneration layer. Regeneration was recorded on five subplots, which were established on five positions within each sample plot (Fig. 2) (Pham et al. 2020). Each subplot area was 25 m 2 (5 m × 5 m). Seedlings and saplings of all tree species were identified to the species level. Tree species found in the regeneration layer were also assigned to categories of threat following the approach for standing tree species.

Non-tree species layer
We additionally assigned plant species, other than trees and irrespective of their height, to the forest understorey and identified them to species level. We defined this layer as the non-tree species layer. Species included shrubs (even > 5 cm DBH), bamboo species, vines, medical plants and edible plants (MARD 2006;Sorrenti 2017). The non-tree species were recorded in the five subplots (Fig. 2) by estimating their coverage percentage. Coverage from all five subplots was then averaged to determine per plot coverage.

Growth site conditions
To characterise the growing conditions of plant species in the different plots and study areas, we recorded variables describing topography, soil conditions and light availability, as well as former impacts by humans.
As topographic variables (T), we recorded slope (T_SL), elevation (T_Ele) and percentage of rock surface area (T_RS) per sample plot. At the centre of each sample plot, the slope was measured with an inclinometer. Longitude, latitude and elevation were measured with a Garmin GPSMAP 64st device. A visually estimated mean value of rock exposure across the five subplots resulted in values for rock surface area (T_RS) per plot (%).
Soil conditions (S) included chemical and physical properties. Soil samples were collected at the centre of each sample plot using a soil auger (diameter 10 cm). A 20 cm core of the topsoil layer was sampled to analyse absolute soil moisture content (S_SM), soil humus content (S_SH), base saturation (S_BS), pH (S_pH), hydrolytic acidity (S_HA), total cation exchange capacity (S_CEC), soil texture (S_Sand, S_Silt, S_Clay) and percentage of rock in soil (S_RS). Soil depth (S_SD) was measured with a steel rod on the five subplots; the five values were then averaged. Samples were analysed in the soil laboratory of the Vietnam National University of Forestry. For details, see Pham et al. (2020) and Pham et al. (2022).
Light availability (L) was measured with a solariscope (SOL 300B, Ing.-Büro Behling, Wedemark) as an indirect site factor (L_ISF), which is the proportion of diffuse sunlight as a percentage of open field conditions. Measurements were conducted at 2 m above the ground on three diagonal subplots across the sample plot (Fig. 2).
After its establishment in 1986, the board of directors of the CBNP tried to reduce the human impact in the core zones of the CBNP by moving people outside the core zones. To date, however, many villages are still located close to the CBNP. Hence, activities of the local people, such as illegal logging and hunting, can still be detected. To roughly quantify a possible human impact (H), we counted footpaths (H_FP), tree stumps (H_STP) and poacher traps to catch animals (H_AT) on the plots as proxies for human activities.

Data analysis
To analyse plant species composition in the CBNP and to identify different forest communities, we used hierarchical cluster analysis with Ward's method using the function 'vegdist()' to create a Bray Curtis distance matrix, the function 'hclust()' to conduct a cluster analysis and the function 'cutree()' to draw a community dendrogram. These functions are implemented in the 'vegan' package in R (Oksanen et al. 2019).
To display the spatial distribution of communities in multidimensional space, we used the non-metric multidimensional scaling (NMDS) ordination method, based on abundance data. In the first step, we used the 'metaMDS()' function to run an NMDS and then used the 'envfit()' function to add environmental data into the NMDS graph using 'vegan' in R . We included all vegetation layers into the NMDS. Tree species of the overstorey and regeneration layers were combined to avoid duplication of species names.
To determine indicator species for identified forest communities, we used the function 'multipatt()' in the package 'indicspecies' (De Cáceres and Legendre 2009). The indicator species analysis calculates the specificity of a species as the number of occurrences of a species within a community relative to the number of occurrences across communities and the frequency of the species within a community as the relative number of occurrences per community. Multiplying specificity and frequency results in an indicator value between 0 (species not occurring in a specific forest community) and 1 (species occurring only and always in a specific community). The most significant indicator species identified amongst the tree and non-tree species were used to name the forest communities that were determined by cluster analysis.
To determine factors affecting community distribution in the CBNP, we correlated the NDMS axes' values with the environmental factors and human impact indicators. The function 'anova()' following a post-hoc Tukey-test with function 'glht()' in 'multcomp' package was used to compare these variables amongst communities.
We additionally applied different selection operators to build decision trees to weigh the predictors that characterised plant species communities (= the response). Forward selection of the predictors performed best. This method starts with an empty model and adds predictors to explain the response. Performance is estimated in each round using cross-validation. Only predictors with the highest performance increase are kept, then a new round is started. The maximum number of attributes to add was limited to seven to avoid overfitting; the iteration was aborted when performance no longer increased. This yielded a decision tree classifying the plant species communities. Classification trees represent a robust, non-parametric, binary procedure that partitions variance in the variation of communities through a series of splits in more homogeneous groups based on environmental factors or human impact variables (De'ath and Fabricius 2000;Cutler et al. 2007).
We contrasted plant species richness amongst the detected forest communities, species groups (tree and non-tree-species) and vegetation layers (overstorey tree, regeneration, non-tree layer) at different spatial scales. For alpha diversity, we compared plot-based species richness amongst communities using the 'anova()' function, followed by a post-hoc Tukey-test with function 'glht()' in the 'multcomp' package. To compare total species diversity in the identified communities, we used the 'iNEXT' package in R (Hsieh et al. 2016) to estimate the respective gamma diversity. The 'iN-EXT' package applies different orders of Hill numbers (q) and quantifies diversity by applying rarefaction and extrapolation methods (Chao et al. 2014). We considered the first three Hill numbers referring to species richness (q = 0), the true diversity of the Shannon Index, which is the exponential of the Shannon Index (q = 1) and Simpson diversity (q = 2) (Chao and Jost 2012;Hsieh et al. 2016). We estimated diversity for each defined forest community separately and across communities as an estimate for the gamma diversity of the CBNP.
To investigate how much each detected forest community contributed to the gamma diversity of the CBNP, we contrasted the estimated species pool of different combinations of communities to the species pool estimated for a combination of all detected communities. We also estimated the gamma diversity of tree and non-tree species of the CBNP and of different forest communities.
The relationship between non-tree species and the overstorey tree layer and between non-tree species and the regeneration layer was checked using linear regression models. We investigated the relationships across communities and for each community separately.
All statistical analyses were conducted using the statistical software R version 3.4.2 (R Core Team 2017). The level of significance for all statistical inferences was defined as p-value < 0.05.

Forest communities of the CBNP
We identified four main communities that differ distinctly in species composition in Cat Ba National Park, (Fig. 3, Appendix 1). Each community has characteristic indicator species (Table 1). The most reliable indicators (highest indicator index value) amongst the tree and non-tree species provided the community names (Table 1, species characteristics are provided in Appendix 2). We defined the Saraca dives + Calamus tetradactylus community (SCt), the Sterculia lanceolata + Chloris barbata community (SCb), the Ficus superba + Acanthus ebracteatus community (FAe) and the Clausena excavata + Desmos cochinchinensis community (CDc) (Appendix 2). Most indicator species (14 species) were associated with the SCt community, six species were associated with the SCb community and only four indicator species with the FAe community. For the CDc community, ten indicator species were identified (Table 1).
Ordination revealed that the SCt community was found mainly in the lowland area (LLA). The SCb community characterised a transition from the lowland to midslope area. The FAe community was associated with the mid-slope area and the CDc community with the isolated area (Fig. 3).

Forest structure and abiotic conditions of the forest communities
Forest structure differed amongst the four communities. On average, we found the lowest diameter at breast height (DBH), height (Ht), basal area (BA) and volume (Vol) in the FAe community, while the SCt community was characterised by the highest mean volume, DBH, basal area and height. The SCb community had the highest tree species richness, whereas the CDc community was the most species-poor in the overstorey layer (Table 2). Community comparisons, as well as the decision tree analysis, showed that the communities FAe and CDc were associated with steeper slopes and a high percentage of rock surface area (Fig. 4, Table 3, Appendix 3). In contrast, the SCb and SCt communities were found in areas with deeper soils (Fig. 4, Table 3, Appendix 3). Although soils are shallower in the habitats of the FAe and CDc communities, their soil humus content (S_SH), soil moisture (S_SM) and cation exchange capacity (S_CEC) were high (Fig. 4, Table 3, Appendix 3) in the upper soil layers. Light conditions differed little between the four communities. Overall, SCb received the most light (Table 3, Appendix 3). Indicators of human impact were mainly associated with the communities SCt and SCb (Fig. 4).  Table 3.  Differences in environmental factors among communities were also reflected in the differences in environmental factors among the study sites in which the communities were mainly located (see Figs 3, 4, Appendices 1, 4). For example, the average slopes in ISA and MSA were steeper than in LLA. As the FAe and CDc communities were associated mainly with the MSA and ISA sites, they were also characterised by higher slope values compared to the other two communities (Appendix 1). In addition, the ISA site had the highest percentage of rock surface as did the CDc community (Appendix 3).

Patterns of plant species diversity of the forest communities
Across communities, the non-tree species layer was, on average, more species-rich than the overstorey and regeneration layers, with the highest non-tree species richness found in the SCb community (Fig. 5). The overstorey tree layer was significantly more diverse in the SCt and SCb communities compared to the FAe and CDc communities (Fig. 5). A similar pattern was found for the regeneration layer (Fig. 5). Thus, the communities with the lowest plot-based species richness of the regeneration layer also had the lowest plot-based species richness in the overstorey tree layer.
Gamma diversity estimations identified significantly higher values for the communities SCt and SCb compared to the communities FAe and CDc. For the Hill number q = 2, the SCb community, which characterised the transition from the lowland to the mid-slope area, was significantly more diverse than the other three communities; confidence intervals do not overlap. For all Hill numbers, the forest community associated mainly with the isolated area (CDc) had the lowest gamma diversity (Fig. 6). When plots of all four communities were combined, gamma diversity did not exceed the most diverse community, indicating that community assemblages are not complementary, but exhibit nested diversity.
When investigating the importance of each community to CBNP gamma diversity, we first estimated the species pools of different combinations of communities. We found that the SCb and SCt communities were most important for regional gamma diversity. When these communities were eliminated from community combinations, gamma diversity was greatly reduced as compared to the total species pool and to other community combinations (Appendix 5). If, however, the CDc and FAe communities were not included in the estimation procedure, the estimated species pool was not significantly different from the total species pool, indicating that these communities are mainly subsets of the species-richer communities (Appendix 5).
Gamma diversity patterns of the different communities were driven in large part by tree species (Fig. 7). Tree species were significantly less diverse in the communities FAe and CDc compared to the other two communities (Fig. 7, Appendix 6). For the Figure 5. Boxplots of species richness of the three investigated layers in each forest community. Letters indicate statistically significant differences amongst communities. SCt; Saraca dives + Calamus tetradactylus community. SCb; Sterculia lanceolata + Chloris barbata community. FAe; Ficus superba + Acanthus ebracteatus community. CDc; Clausena excavata + Desmos cochinchinensis community. non-tree species, the difference between the two community groups (SCb, SCt vs. FAe, CDc) was not as pronounced for q = 0 (Fig. 7). For q = 1, the SCb community was significantly more diverse than the communities FAe and CDc. For q = 2, SCb was more diverse than all other identified communities (Fig. 7, Appendix 6).
Tree and non-tree species contributed nearly equally to the diversity of the CBNP (Table 4). However, in the SCt and SCb communities, tree species diversity was higher than non-tree species diversity, while in the other two communities, the converse was true (Table 4). Figure 6. Estimated gamma diversity for different Hill numbers and forest communities. q = 0: species richness; q = 1: Shannon diversity; q = 2: Simpson diversity. The red colour represents the estimated gamma diversity when all communities are considered for estimation. Yellow-green: Saraca dives + Calamus tetradactylus community (SCt). Green: Sterculia lanceolata + Chloris barbata community (SCb). Light-blue: Ficus superba + Acanthus ebracteatus community (FAe). Pink: Clausena excavata + Desmos cochinchinensis community (CDc). Graphs were extrapolated to a sample size of 90 plots.  Figure 7. Gamma diversity patterns of tree species and non-tree species of the different communities. The graphs show species richness (Hill number q = 0, including 95% confidence interval). Red: all four communities, Yellow-green: Saraca dives + Calamus tetradactylus community (SCt), Green: Sterculia lanceolata + Chloris barbata community (SCb), Light-blue: Ficus superba + Acanthus ebracteatus community (FAe), Pink: Clausena excavata + Desmos cochinchinensis community (CDc). For Hill numbers q = 1 and q = 2, see Appendix 6. Graphs were extrapolated to a sample size of 90 plots.

Relationships between vertical forest layers
We did not find a significant relationship between species richness of the non-tree layer and that of the tree regeneration layer. We also found no effect of species richness of the overstorey tree layer on species richness of the non-tree species layer or the regeneration layer (Fig. 8). Across all communities, there was also no significant effect of the coverage of the non-tree species layer on the species richness or abundance of the regeneration layer (Fig. 9a, c). However, at the community level, we found that coverage of the non-tree species layer negatively affected both abundance and richness of the regeneration layer in the SCb community and of richness in the SCt community. In contrast, coverage by the non-tree species layer positively influenced the richness of the regeneration layer in the FAe community (Fig. 9b, d).
We did not observe any significant effect of species richness or coverage of the non-tree species layer on tree abundance or richness of threatened tree species in the regeneration layer (Appendix 7).  . Linear regression models of the coverage of non-tree species and species richness and abundance of the tree regeneration layer. Graphs (a) and (c) show the relationship between the coverage of non-tree species with the abundance and species richness of tree regeneration across all communities, graphs (b) and (d) for the four communities. Dashed lines show statistically non-significant, solid lines show significant correlations. To improve visibility, we did not include the confidence intervals for (b) and (d).

Topographic heterogeneity drives the spatial distribution of forest communities in the CBNP
We distinguished four forest communities in the CBNP that differ in forest structure and are characterised by different environmental conditions (Fig. 3, Tables 1, 2, Appendix 1). Topographic variables, in combination with soil conditions, turned out to be important factors influencing the differences in plant species composition across the CBNP. Topographic heterogeneity can result in different microclimatic conditions, affecting temperature and soil moisture at small scales, thus influencing the abundance and distribution of plant species (Vivian-Smith 1997; Seta et al. 2018). In addition, topographic heterogeneity indirectly acts on plant species composition and diversity through drainage and by differences in the hydraulic regime, which results in different growing conditions for plants on ridges, slopes and in valleys (Trichon 1997). These characteristics may also affect seed erosion and seed accumulation, leading to differences in plant species distributions (Vivian-Smith 1997;Holl et al. 2013).
We found that the SCt and SCb communities occurred mainly on sites characterised by deep soils, whereas the FAe and CDc communities were found on sites with rough terrain (steep slopes and high rock surface), but with relatively high soil nutrient content (high soil humus content and CEC) (Fig. 4, Table 3, Appendices 1, 4). Our environmental characterisation of the communities corresponded well to the investigated study sites. The SCt and SCb communities characterised the lowland area in transition to the mid-slope study area, while the CDc community was associated mainly with the isolated area, characterised by steep slopes. Our results are in line with Zhang et al. (2021), who identified topography as the main factor determining niche differentiation of tree species in heterogeneous tropical limestone forests of China. Topography influenced soil depth and, thereby, most likely determined water availability and drought stress, factors that may have shaped species distribution.
Topographic heterogeneity can also influence light availability, an important determinant of plant species distribution (Zhu et al. 2016). It also affects forest composition by influencing regeneration patterns (Tateno and Takeda 2003). However, in our case, the indirect site factor (L_ISF), which we used as a proxy for light availability, did not affect plant species distributions (Fig. 4, Table 3, Appendix 4). It is important to mention that the gradient of light availability recorded across our study sites was quite low, indicating a relatively homogeneous canopy cover across the CBNP.
Although indicators of human impact could not be clearly connected to plant species distribution, observed differences in the forest communities of the different study areas could have reflected human influence, which was strongest in the lowland area. In the lowland area, with the communities SCt and SCb, species richness was highest (Figs 4, 5, Table 3). In contrast, other studies found negative effects of human disturbance on species diversity; humans extract certain tree species with subsequent effects on species composition (Blanc et al. 2000;Do et al. 2017). Chazdon (2003), however, concluded that forest recovery could be rapid even after large-scale human disturbances when soil and aboveground vegetation were not heavily impacted. In addition, Dao and Hölscher (2018) found a positive correlation between the occurrence of footpaths and the density of tree species used for non-timber forest products. This suggests a positive effect of human activity on some early-successional species. Other studies have confirmed the strong potential of tropical forests to recover after historical disturbances (Lusk and Smith 1998;Bayliss-Smith et al. 2003), but also underscore the role of historical disturbances, whether anthropogenic or natural, in shaping species composition.
We identified several indicator species for the four forest communities that provided valuable information on species-environment relationships. While species of the communities SCt and SCb seemed to be restricted to deep soils, indicator species in the FAe and CDc were able to tolerate harsh soil conditions, such as a high percentage of rock surface and shallow mineral soils (Guo et al. 2017). They were also able to overcome potential dispersal filters due to the island isolation of the ISA. The bird-dispersed Clausena excavata, for example, is characterised by rapid germination, thereby avoiding the risk of desiccation in shallow soils. High seedling survival under various environmental conditions makes this species both a successful invader outside its native range and a successful coloniser (Vieira et al. 2010). Supporting results by Santo-Silva et al. (2021) demonstrated that isolation favours the abundance of disturbance-adapted pioneer tree species. The annual herb Blumea lacineata, another indicator in the CDc community, is also characterised by an effective dispersal strategy and is classified as a weed species (Wester 1992). On the other hand, Sterculia lanceolata, an indicator species of SCb, can be characterised as a mid-to late-successional species on deeper soils (Zhang et al. 2013). Its co-existence with the tropical weed species Chloris barbata (Holm et al. 1979) in the CBNP suggests that this community and its assemblage of indicator species characterise a successional stage after disturbance in lowland and mid-slope areas. Lower DBH and tree heights in this community, as compared to the SCt community of the lowlands, support this assumption. The species assemblage of the Saraca dives + Calamus tetradactylus reliably characterises lowland sites, as Saraca dives is a dominant species in foothills and in valleys with high and constant water availability in limestone forests (Zhang et al. 2021).
Thus, our study underscores the value of using identified indicator species or groups of species for an overall assessment of the environmental conditions in tropical limestone forests of Southeast Asia. By monitoring certain indicator species, shifts in environmental conditions can be reliably detected.

Differences in forest community composition do not drive the biodiversity of the CBNP
We identified in total 302 species belonging to 112 families in the CBNP (Table 4, Appendices 5, 8) within the 90 recorded sample plots. The total species pool was estimated at 368 species. We found that, as compared to other studies, plant species diversity in the CBNP was quite high. For example, diversity was higher than that of transitional rainforest vegetation in southwestern Ethiopia (from 130 to 139 species) (Assefa et al. 2013), neotropical primary and secondary forests (100 species found on two 1 ha plots (van Andel 2001), or subtropical forests in China (240 species in 27 plots of up to 900 m 2 (Both et al. 2011). It was less diverse, however, than rainforests in Columbia (442 vascular plant species in 0.9 ha; (Galeano et al. 1998) or seasonal dry tropical forests in Mexico (917 species in 20 representative sites of 0.1 ha; (Trejo and Dirzo 2002). Numbers are comparable with forests of Dinagat Island in the Philippines where 432 native plant species have been recorded (Lillo et al. 2019). In Malaysian tropical rain forests, 825 species in 50 ha forest inventory plots have been detected (Fangliang et al. 1997). Although differences in species numbers among the aforementioned studies may also be due to different experimental designs and scales of inventory (Ferraz et al. 2004;Cicuzza et al. 2013;Júnior et al. 2014), our results indicate that plant species diversity in the CBNP is comparable to other tropical and subtropical forests worldwide. Interestingly, non-tree species contributed roughly 50% to total species diversity (Table  4) emphasizing both the importance of different growth forms for biodiversity and con-firming other studies (Gentry and Dodson 1987;Linares-Palomino et al. 2009). The forest communities FAe and CDc, characterizing rough and rocky terrain, had a higher share of non-tree species than those of the lowland and mid-slope areas, indicating that this species group is shaped by stochastic assembly processes, while the distribution of tree species is mainly driven by environmental differences among habitats (Both et al. 2011). Similarly, isolation seems to have acted as a stronger barrier for tree than for non-tree species (Hill and Curran 2003;Martin-Queller et al. 2017).
In contrast to our expectations, the heterogeneity we found in forest community composition did not appear to drive overall plant species diversity of the CBNP. The linkage between isolation and species diversity has been investigated in previous studies (Quinn and Harrison 1988;van Andel 2001;Slik et al. 2003;Scheffer et al. 2006;Chytrý et al. 2010). Isolated conditions can promote species diversity and the distinctiveness of communities (Ferraz et al. 2004), and can also drive speciation, leading to species endemism (EI-Bana 2009;Chytrý et al. 2010). Our findings indicate that the isolated area has the lowest species diversity in our study (Figs 6,7,Appendix 5). This supports the conclusion by van Andel (2001) that relative isolation can also have a negative effect on species diversity. Slik et al. (2003) confirmed that diversity was negatively correlated with isolation and habitat size in North-eastern Borneo forests. Our results, therefore, suggest that rough terrain conditions in combination with the isolated location function as a strong dispersal and establishment filter that only a subset of the plant species can overcome. Such species are characterized by good dispersal ability (see above). The missing complementarity in species diversity among communities, however, indicates that the species assemblages colonizing the isolated area are instead a subset of richer species communities. Species colonizing the isolated sites can therefore be characterized as generalists growing under various site conditions (Vieira et al. 2010;Santo-Silva et al. 2021).
Our results also confirm the theory that larger areas boost high species diversity. The two species-rich SCt and SCb communities (Figs 3, 6, Table 4, Appendix 5) were located mostly on the main island of the CBNP; this presumably provided more habitats and better habitat connectivity with positive effects on the dispersal ability of species (MacArthur and Wilson 1967;Hill and Curran 2003;Martin-Queller et al. 2017). This agrees with Scheffer et al. (2006), who stated that local diversity is reduced in response to isolation through dispersal limitation. They also implied that larger local habitat patches and greater connectivity facilitate regional biodiversity. Turner et al. (1996) even found that a long period of isolation in lowland tropical forests in Singapore was a major contributor to species loss.
The species-rich SCb and SCt communities were associated with greater soil depth, while the percentage of rock surface and slope were correlated with species-poorer communities (FAe and CDc communities) (Fig. 4, Table 3). Slope and rock outcrops have a huge effect on soil cover in our research areas (Chytrý et al. 2010;Seta et al. 2018), influencing soil nutrients (Chytrý et al. 2010) and water availability (Zhang et al. 2021). This seems not only to result in differences in community composition as discussed before, but also in different numbers of tree and non-tree species. Tree species diversity was more strongly influenced than non-tree species by differences in topography, with more tree species colonizing the lowlands. The Simpson diversity of non-tree species was par-ticularly high in the SCb community. Here the presence of indicator Chloris barbata suggested an impact of former disturbance. We do not have, however, enough information on former anthropogenic or natural disturbance events to make a definitive connection.
Humans can have a huge influence on floristic diversity; there are numerous examples from around the world. For example, Júnior et al. (2014) found that human activities changed tree species diversity and forest structure in semi-deciduous forests in Brazil. Assefa et al. (2013) found that anthropogenic interferences reduced species diversity in southwest Ethiopian forests. In our case, indicators of human impact and species-rich forest communities were positively associated (Fig. 5, Table 2). These results may indicate that the natural recovery potential of a species after historical disturbances is high (van Andel 2001;EI-Bana 2009). In any case, reconstruction of former disturbances would be useful to evaluate the potential for tree species recovery as well as to understand the effects of former disturbances on current species composition. However, as differences in human impact and environmental conditions among communities and study sites interact, we cannot verify a potential impact of former human disturbance on plant species diversity and composition in the CBNP.

Relationship between vertical forest layers
Across communities, we found no interactions amongst forest layers (Both et al. 2011), but found contrasting patterns for the different communities. In the two species-rich communities (SCb and SCt) found on deep soils, we detected negative effects of nontree species coverage on tree species richness and abundance in the regeneration layer (Figs 8,9b). It is well known that tree regeneration competes with herbaceous species on sites with sufficient water and nutrient availability (Both et al. 2011). Interestingly, under harsh environmental conditions in the CDc communities and also in the FAe (Fig. 9d), coverage by non-tree species appeared to have neutral (CDc) to positive (FAe) effects on the tree regeneration species richness; this supports the hypothesis that changes in water availability, such as those induced by more shallow soils, can shift plant interactions from competition to facilitation (Bertness and Callaway 1994;Holmgren et al. 1997). Along the steep slopes with high rock coverage, non-tree species can improve conditions for tree regeneration, for example, by trapping litter and humus. This, in turn, can improve water and nutrient status, supporting seedling germination and survival (Yirdaw et al. 2015;Yirdaw et al. 2019). This may also explain higher soil moisture values found in the topsoils of the FAe and CDc communities as compared to SCt and SCb. In addition, litter and humus accumulation may generally improve seed storage, preventing rapid erosion.
Competition between tree regeneration and non-tree species in the SCt and SCb communities may also have resulted from former canopy disturbances. Increased light levels induce understorey growth with potential negative effects on tree regeneration. The light-demanding weed species Chloris barbata, identified as an indicator species in the SCb community, may be an indication of former disturbances as are the higher values of human impact indicators in SCt and SCb. However, the light availability measured for this study was around 9.17% ± 6.4% (mean ± sd) and homogeneous across communities.

Limitations of the study
Although we have considered many environmental factors and indicators of human impact to explain plant species composition and diversity in the CBNP, we acknowledge some shortcomings. We did not explicitly consider climatic factors (Slik et al. 2003;Cicuzza et al. 2013), such as rainfall and temperature. However, since sites were close to one another (compare Fig. 1), we did not expect great differences in climatic conditions. We also do not have detailed information on the disturbance history of the CBNP, even though evolutionary or historical disturbance processes may control the distribution of certain species in certain sites (Trejo and Dirzo 2002;Tuomisto et al. 2003;Chytrý et al. 2010). Finally, factors such as geographical distance or tectonics may also have played a role in the observed patterns of plant species composition and diversity; both of these factors might affect environmental conditions, habitat formation and floral assemblages (van Andel 2001;Slik et al. 2003;Júnior et al. 2014). In future research, it may also be worth focusing on plant functional groups, which would require more research into the autecology of the different tree and non-tree species.

Conclusions
Our study demonstrates that plant species composition and diversity in the CBNP vary strongly at the regional scale. Contrary to our expectations, the isolated island site did not contribute much to total plant species richness. Species found there seemed well adapted to the harsh conditions, but were also found at other sites and can, therefore, be classified as generalists.
We also showed that environmental factors are important drivers of plant species composition and species diversity in the CBNP with non-tree species contributing about 50% to total species richness. Our data indicate that this species group is less prone than tree species to environmental filtering. Furthermore, the coverage of non-tree species negatively impacted species richness of tree regeneration on sites with sufficient water and nutrient availability (Sterculia lanceolata + Chloris barbata and Saraca dives + Calamus tetradactylus community). Under harsh environmental conditions, non-tree species appeared to facilitate tree regeneration richness of the Ficus superba + Acanthus ebracteatus community, underscoring the important function of non-tree species to forest development in these tropical forests on limestone. The Sterculia lanceolata + Chloris barbata and the Saraca dives + Calamus tetradactylus community, found mainly in lowlands in transition to mid-slope conditions, contribute most to plant species diversity in the CBNP.
From these findings, we conclude that plant species composition and diversity in the CBNP is rich, with tree and non-tree species contributing equally to this diversity, but with higher tree species diversity in the lowlands. For future conservation management, the protection of tree species, for example, from illegal logging activities, should be one priority for managers of the CBNP. This is particularly important as the area most accessible to the local population is characterised by tree species-rich forest communities. In general, the main island contributes most to plant species richness in the CNBP due to the availability of microsites and connectivity amongst habitats. Thus, monitoring of species diversity and composition and conservation management to prevent fragmentation should focus on these areas. Here and beyond the CBNP, the dynamics of the identified indicator species groups will help to detect future changes in environmental conditions.
The identified forest communities with their indicator species assemblages provide fundamental information on the interactions between plant species distribution and environmental conditions in limestone forests of Vietnam and may also help to characterise the limestone forests beyond the borders of the CBNP. Yirdaw  Species: Sterculia lanceolata Family: Sterculiaceae Character: Small timber tree, this species grows in mountainous areas, especially the hillsides in the midlands. The tree has large green leaves all year round and is suitable for moist areas. Flowers look like a star shape and in red. Distributed from Nnorth to central Vietnam.
Species: Chloris barbata Family: Poaceae Character: Annual grass; distribution in dry places, especially coastal areas.
Species: Ficus superba Family: Moraceae Character: This species is a small to medium timber tree species, tree height is from 5-15 m. It grows favourably in rich nutrient soil, with good moisture and drainage, is also a drought-tolerant species, not fussy about cultivated soil. This species has a very strong regeneration. Its distribution is throughout Vietnam.
Source: https://giahuygarden.vn Table A2. The environmental and human activity characteristics in the three study sites (LLA, MSA and ISA) in Cat Ba National Park. The values represent the mean and standard deviation of 30 plots per study site (in total 90 plots). Small letters indicate significant differences at p ≤ 0.05 between the three areas. The "multicomp" package was used to calculate differences between the three study sites (Hothorn et al. 2008). The acronym column shows the abbreviation of the factor. T, terrain factors; S, soil properties; L, light availabilities; and H, human disturbances.  Figure A3. Continued. Figure A4. Linear models of non-tree species and threatened tree species abundance. Graphs (a) and (c) show the relationship between non-tree species and threatened tree species abundance across communities. Graphs (b) and (d) present the relationship of non-tree species to threatened tree species abundance separated by the four communities. All relationships are not significant as indicated by the dashed lines. SCt; Saraca dives + Calamus tetradactylus community. SCb; Sterculia lanceolata + Chloris barbata community. FAe; Ficus superba + Acanthus ebracteatus community. CDc; Clausena excavata + Desmos cochinchinensis community.