Research Article |
Corresponding author: Patrizia Giangregorio ( patrizia.giangregorio@isprambiente.it ) Academic editor: Klaus Henle
© 2023 Patrizia Giangregorio, Nadia Mucci, Anita J. Norman, Luca Pedrotti, Stefano Filacorda, Paolo Molinari, Göran Spong, Francesca Davoli.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Giangregorio P, Mucci N, Norman AJ, Pedrotti L, Filacorda S, Molinari P, Spong G, Davoli F (2023) Performance of SNP markers for parentage analysis in the Italian Alpine brown bear using non-invasive samples. Nature Conservation 53: 105-123. https://doi.org/10.3897/natureconservation.53.86739
|
Determination of parentage provides valuable information for the conservation of wild populations, for instance, by allowing the monitoring of breeding success and inbreeding. Between 1999 and 2002, nine brown bears (Ursus arctos) were translocated to augment the remnant population of a few surviving individuals in the Italian Alps, but only part of them reproduced, with a higher inbreeding risk occurrence in the long-time. Currently, in the Alpine population, parentage tests are assessed through the analysis of 15 microsatellite loci (STRs), but the reduction of genetic variability in future generations will need the use of additional informative markers. Single nucleotide polymorphisms (SNPs) have been proven to be useful and reliable in individual identification and family reconstruction; moreover, they can perform well on low-quality samples. In this study, we analysed 51 SNPs to generate a SNP multilocus genotype dataset of 54 Alpine brown bears (Ursus arctos) and compared its performance in parentage analysis with the validated STR dataset. We found that SNPs alone are not sufficient to determine parentage relationships, but the combination of SNPs and STRs provided unambiguous parentage assignments. The combined panel also performed better than STRs when true parents were not present in the dataset and, consequently, showed higher values of assignment probabilities.
Colony, FRANz, markers combination, microsatellites, monitoring, Ursus arctos arctos
Parentage determination may greatly aid the management of wild populations of conservation concern (
Microsatellite loci, also known as short tandem repeats (STRs), are multi-allelic and highly polymorphic markers that have been routinely used in the last decade for parentage analysis in wild populations, also by starting from non-invasive samples (
Single nucleotide polymorphisms (SNPs) are another type of marker of increasing popularity for many conservation genetic studies. They are polymorphic sites dispersed into the genome; differently from STRs, they can be easily scored, record high genotyping success and low error rates (
Furthermore, the recently-emerged microfluidic genotyping platforms have shown very low copy number detection thresholds and are, thus, particularly suitable for the amplification of poor-quality DNA (
The Italian Alpine bear population (Ursus arctos) lives in two separate areas: the first subpopulation inhabits the central Italian Alps, while the second, in eastern Italy, constitutes the expansion front of the Dinaric Mountain population (Fig.
Brown bear distribution in the Italian Alps and neighbouring countries (from Skrbinšek et al. (2018)). Permanent presence, reproduction (red squares) – areas where cubs were confirmed within the last three years; permanent presence without reproduction (orange squares) – areas where bears have been present for at least three years over the last five years; sporadic presence (yellow squares) – areas where bear presence has been documented for fewer than three seasons in the last five years’ period.
Although the status of the brown bear is categorised as “least concern” in its worldwide distribution area (
Despite the fact that the robustness of the STR protocol has been providing helpful information in the kinship analysis (Suppl. material
Recently, a 96 × 96 SNP-chip comprising 85 autosomal SNPs, seven sex chromosome markers and four mtDNA markers was developed by
Here, we evaluate: i) the performance of the 51 SNPs in parentage analysis using non-invasive hair samples and ii) their reliability as compared to 15 STRs utilised in the monitoring of Alpine brown bear (
Hairs were collected in the Italian area of bear presence (Fig.
Sampling procedures followed the guidelines provided by the interregional action plan for the conservation of brown bears in the Italian Alps (PACOBACE) (
STR genotyping using methods developed by
DNA aliquots (n = 74) were sent to the Swedish University of Agricultural Sciences (SLU), to be amplified on the Biomark platform (Fluidigm Corporation, San Francisco, USA) with the 96 × 96 SNP panel developed by
We replicated a proportion of the samples (16 samples were replicated once, while 13 were replicated three times – see Suppl. material
Deviations from the Hardy-Weinberg equilibrium (HWE) were computed using the exact test in Genepop (
The reliability of parentage analysis is maximised when all individuals of the family trio (dam, sire and offspring) are sampled and fully genotyped. However, poor and degraded DNA can lead to incomplete or missing genotypes. The development of a protocol preserving the maximum number of individuals while minimising the effect of incomplete genotypes is advantageous. To test for the incidence of these limiting factors, two sample datasets were created, the first including samples with a SNP call rate ≥ 70% and a second with a reduced number of individuals, only those with a SNP call rate ≥ 90%. Parentage relationships were evaluated using FRANz v.2 (
Moreover, kinship analysis was also tested in Colony v. 2.0.6.4 (
Parameters used in FRANz v.2 and Colony v. 2.0.6.4 software, have been set following the procedure described in
To test the reliability of parental assignments, family trios obtained with SNPs were compared with those obtained formerly with STR. This was done by calculating the number of congruent, missing and incongruent parentage assignments in addition to significance values in the detected family trios. Missing assignments caused by the lack of the true parents in the SNP dataset were also deemed “congruent”. Finally, we combined SNP and STR genotypes in a single dataset and compared the results obtained through SNPs and STRs alone.
Out of the 51 autosomal SNPs analysed, eight did not amplify, 15 were monomorphic, 11 showed unclear cluster affiliation or unusual clustering patterns, 5 showed call rates ≤ 70% and one showed a departure from the Hardy Weinberg equilibrium and were subsequently removed from the analyses. A total of 45 SNPs was thus retained for further analysis. Out of the 74 genotyped individuals, 54 were retained (including five founders and three bears from the remnant Dinaric population) as they had SNP genotyping call rates ≥ 70% (mean call rate was 85%), while 20 were rejected. Out of these 54 bears, 41 (including three founders and two Dinaric bears) showed a percentage of call rate ≥ 90% (mean call rate 97%). Details on genotyping success are shown in Suppl. material
The genotyping of 45 SNPs correctly identified 54 individual bears and sex determination, based on six SNPs on the sex chromosomes, confirmed STR-based results in 42 out of 54 cases (77%). The remaining 12 cases did not show incongruent results, but partially missing data at SNPs on the sex chromosomes prevented the sex determination. Amongst the 54 samples, 12 were replicated four times and showed 92% positive PCR amplifications amongst loci and 92% amongst samples. Allelic dropout interested only 1.6% of loci (mean value = 0.04) and 1.4% of samples (mean value = 0.02). Amongst 45 SNPs, 27 showed the three genotypic representatives (e.g. AT/TT/AA). The following statistics on variability and parentage analyses were performed using the reference dataset of 15 STRs, the total amount of 45 SNPs and the reduced dataset of 27 most variable SNPs. In addition, datasets formed by the combination of 15 STRs with 27 SNPs and 15 STRs with 45 SNPs were also processed.
Summary statistics for single markers are shown in detail in Suppl. material
PID and PIDsibs values for increasing locus combination in the brown bear Alpine population. The values are calculated for the five marker sets: 27 most variable SNPs (showed the three genotypic representatives), the reference dataset of 15 STRs, the total amount of 45 SNPs, the combination of 15 STRs with 27 SNPs and the combination of 15 STRs with 45 SNPs.
Marker summary statistics. The mean number of loci typed in 51 brown bear samples from the Central Italian Alps. (N), the mean number of alleles per locus (Na), the mean effective number of alleles (Ne), Shannon’s information index (I), observed (Ho) and expected (He) heterozygosity. Standard error values (SE) are in brackets.
MARKER SETS | N (SE) | Na (SE) | Ne (SE) | I (SE) | Ho (SE) | He (SE) |
---|---|---|---|---|---|---|
15 STRs | 51.00 (0.000) | 4.53 (0.291) | 3.25 (0.223) | 1.25 (0.084) | 0.73 (0.045) | 0.66 (0.040) |
27 SNPs | 47.30 (0.443) | 2.00 (0.000) | 1.74 (0.049) | 0.59 (0.023) | 0.43 (0.026) | 0.41 (0.020) |
45 SNPs | 47.18 (0.405) | 2.00 (0.000) | 1.58 (0.044) | 0.52 (0.022) | 0.38 (0.022) | 0.34 (0.019) |
27 SNPs & 15 STRs | 48.26 (0.348) | 2.90 (0.215) | 2.28 (0.141) | 0.83 (0.059) | 0.54 (0.032) | 0.50 (0.026) |
45 SNPs & 15 STRs | 47.88 (0.342) | 2.63 (0.159) | 2.00 (0.114) | 0.70 (0.049) | 0.47 (0.028) | 0.42 (0.025) |
Parentage tests were performed in two independent analyses excluding the five founder individuals in a total of individuals with call rates ≥ 70% (n = 49, 100%) and individuals with call rates ≥ 90% (n = 38, 77.5%). Amongst bears with call rates ≥ 70% and ≥ 90%, 2 out of 21 (9.5%) and 7 out of 17 (41.1%) of true parents (detected using the STR-based reference data) were removed from the dataset because of the filtering process, respectively. As a result, three (3.2%) and 30 (27.7%) assignments with the true parent were impossible to detect in the two analyses.
Parental assignments using bears with a call rate ≥ 70% and the complete set of 45 SNPs resulted in 8.16% of missing assignments and 7.14% inconsistencies with the STR reference data. Similar results were obtained using the reduced panel of 27 SNPs resulting in 11.22% of missing assignments and 6.12% of inconsistencies. This slight difference is probably due to the greater reliability of the 27 SNPs compared to the 45 SNPs as the former are characterised by the presence of all the three allelic forms. Variation was detected in samples with call rates ≥ 90%: the percentage of inconsistencies is similar (7.89% using 45 SNPs and 9.21% using 27 SNPs), but no missing assignments were found.
Using the two combinations of 15 STRs and 45/27 SNPs, amongst bears with call rates ≥ 70%, 8.16% and 6.12% had missing assignments and no inconsistencies were found. All parental assignments identified using bears with a call rate ≥ 90% were concordant with the STR reference data, despite the absence of a high proportion of parents in the dataset. On bears with SNP call rate ≥ 70%, the 15 STRs showed congruent results on all family trios, while bears with SNP call rate ≥ 90% led to one missing assignment (1.30%) and one incongruent assignment (1.30%). Proportions of congruent, incongruent and missing sire/dam assignments are summarised in (Table
Results of parental assignments using FRANz v.2. Values and percentages of correct, incongruent and missing parental assignments of 49 bear genotypes with call rate ≥ 70% (a) and 38 bear genotypes with call rate ≥ 90% (b) in the Italian Alpine brown bear population. Results are reported for each subset of SNP/STR marker. The total number of assignments to be determined for 49 (n = 98) and 38 (n = 76) bears genotypes are shown on the bottom row.
Call rates ≥ 70% | 27SNPs | 45SNPs | 15STRs | 27SNPs&15STRs | 45SNPs&15STRs |
Congruent | 81 (82.65%) | 83 (84.69%) | 98 (100%) | 92 (93.87%) | 90 (91.8%) |
Not assigned | 11 (11.22%) | 8 (8.16%) | 0 | 6 (6.12%) | 8 (8.16%) |
Incongruent | 6 (6.12%) | 7 (7.14%) | 0 | 0 | 0 |
TOT n assignments | 98 | 98 | 98 | 98 | 98 |
Call rates ≥ 90% | 27SNPs | 45SNPs | 15STRs | 27SNPs&15STRs | 45SNPs&15STRs |
Congruent | 69 (90.78%) | 70 (92.10%) | 74 (97.36%) | 76 (100%) | 76 (100%) |
Not assigned | 0 | 0 | 1 (1.31%) | 0 | 0 |
Incongruent | 7 (9.21%) | 6 (7.89%) | 1 (1.31%) | 0 | 0 |
TOT n assignments | 76 | 76 | 76 | 76 | 76 |
As expected, no parents were found for the three bears of Dinaric origin and all parental assignments were confirmed using Colony 2.0.6.4. Amongst congruent assignments, no mismatches were found and the mean number of common loci typed in the family trios was 58.7 out of 60 (min = 56, max = 60). Individual details about the combination of SNP and STR markers individuals’ assignment are reported in Suppl. material
Parental assignment probabilities were calculated for bears with the 36 most reliable genotypes (the Dinaric bears being excluded) using the five marker subsets. Results are displayed in Fig.
Parental assignment probabilities using FRANz v.2. The probabilities are calculated for the 36 most reliable bear genotypes born in the Central Italian Alps (the three bears of Dinaric origin are excluded) using the five marker sets described in Table
Our study demonstrated that a combination of SNPs and STRs provided robust assessments of parentage in the Italian Alpine brown bear population and performed better than STRs when a high proportion of true parents was not present in the dataset. The absence of parents in the dataset simulates a common situation in long-term monitoring projects of expanding populations, in which not all individuals are usually sampled. Interestingly, the numbers of assignments congruent with the reference STR-based data and probability values do not differ significantly when using the reduced set of the most variable 27 SNPs (He = 0.43) or the complete set of 45 SNPs (He = 0.38).
This result indicates that 27 SNPs, in combination with 15 STRs, are sufficient to considerably enhance data reliability compared to the use of 15 STRs alone. Combinations of SNPs and STRs were found to be more efficient than a higher number of SNPs alone and also in other species as in the African penguin (Spheniscus demersus;
Conversely, 45 SNPs were not sufficient to determine parentage relationships in the Alpine population, although a similar number of SNPs were found to be adequate for assigning parents by
Due to the wide-roaming of young male bears, international cooperation amongst labs involved in the monitoring of the species is pivotal for the conservation of the species in the Alps.
The long-term and intense monitoring of the brown bear population in the Italian Alps allowed us to empirically evaluate the performance of SNP markers in a wild population using non-invasive samples to assess family relationships. This information is usually difficult to achieve because field data (such as telemetry and direct observations of females with cubs), multiple sampling of individuals and multiple amplification of STR loci over years, are rarely available to confirm parentage assignments. In this study, reproductive data obtained from genetic and field data were available from more than a decade of research and management efforts and were always concordant with the reproductive biology of the species (see
Genetic markers may also be less variable and informative when applied to a different population than the one for which they were developed; these SNPs were selected for being informative in the Scandinavian population and, as expected, only a portion was highly variable (51/96 = 53%). Ascertainment bias, due to the SNPs being selected for the Scandinavian bear population, likely contributed to the lower power of parentage assignment. Some of the 45 SNPs used in this study had low minor allele frequencies for the Alpine population, lowering their discretionary power.
In addition, a substantial number of studies concluded that SNP markers are entirely appropriate for parentage analyses, but the empirical data, thus far, indicate that a suite of 100–200 SNPs is generally needed to provide resolving power equal to or better than that provided by the available STR markers for the species under consideration (
The integration of an additional set of SNPs specifically developed for the Alpine population would likely improve the effectiveness of parentage analyses, solving the problem of the low number of variable SNPs identified in this study.
In this study, when using a lower call rate threshold (≥ 70%), a few incongruent parental assignments were found and a few assignments were missed. These errors are probably due to a combination of two factors: the lower number of common loci typed in the correct family trio and a few genotyping errors amongst SNPs. Our SNP results highlighted the importance of using special precautions when working with non-invasive samples, such as pre-selecting samples with high call rates (≥ 90%). Additionally,
We also underline that SNPs have some intrinsic disadvantages, especially when using non-invasive samples: since SNPs are bi-allelic, it is not straightforward how to recognise samples containing DNA from multiple individuals and rules concerning the number of replicates and call rates needed to obtain reliable genotypes are lacking.
More SNPs are needed to perform parentage analysis with an information content comparable or superior to that obtained through STRs (
Despite the difficulties that may manifest when working with non-invasive samples and given the mentioned ascertainment bias, our results showed that a combination of 27 SNPs and 15 STRs was an effective panel in identifying parentage relationships in an isolated brown bear population, although several half- or full siblings amongst putative parents are present. The use of SNPs in parentage analysis is thus promising even if it should be evidenced that multiple factors could contribute to jeopardise the reliability of the results.
Considering the data obtained in this study, we provided simple guidelines to perform efficient parentage analysis in wild populations using non-invasive samples with STRs and SNPs: a) amplifying a congruent number of STRs and determining sex through the amplification of sex-specific regions for all collected samples should be the first step. The STR amplification can be used to discard bad-quality samples and identify single individuals; b) amongst multiple samples of the same individual, the one with the lower genotyping error rate and higher positive amplifications with STRs may be chosen for SNP genotyping; c) a SNP genetic data bank, including all putative parents, can be developed. 96 × 96, 48 × 48 or 192 × 24 plates can be used on the Fluidigm Biomark Platform, depending on the number of individuals and SNP availability; d) parentage analysis can be performed using FRANz, combining an appropriate number of SNP and STR markers, to allow for multi-generational analysis.
However, our data highlighted the need of using good-quality samples (e.g. call rates ≥ 90%, given our results) with a low likelihood of allelic drop-out. Indeed, parental assignments are particularly vulnerable to genotyping problems, as parent-offspring pairs must share at least one identical allele at each locus (
We acknowledge all the management authorities and people who gave major contributions to the collection of samples analysed in this study: the Autonomous Province of Bolzano, the Autonomous Province of Trento, the Autonomous Region of Friuli-Venezia Giulia, the general management of agricultural, forest, and fishing resources of the Region Friuli Venezia Giulia, the Lombardia Region, the staff of the provincial police force of Bergamo, Lecco, Brescia and Sondrio, the Veneto Region, the staff of Belluno and Verona Province. We also thank Helena Königsson (SLU) for having professionally analysed SNPs.
The authors have declared that no competing interests exist.
No ethical statement was reported.
No funding was reported.
Conceptualization: FD, GS, PG, NM. Data curation: GS, LP, PM, SF, PG, FD. Formal analysis: PG, GS. Investigation: NM, AJN, FD. Methodology: AJN, NM, GS. Project administration: FD, NM. Resources: LP, GS, SF, PM, AJN. Supervision: FD, GS, NM. Validation: NM, AJN, FD, GS. Writing - original draft: PG. Writing - review and editing: PM, LP, NM, SF, AJN, GS, FD.
Patrizia Giangregorio https://orcid.org/0000-0003-1064-4494
Nadia Mucci https://orcid.org/0000-0002-7522-7213
Anita J. Norman https://orcid.org/0000-0002-9499-758X
Stefano Filacorda https://orcid.org/0000-0002-0984-2373
Paolo Molinari https://orcid.org/0009-0009-8556-246X
Göran Spong https://orcid.org/0000-0002-1246-5046
Francesca Davoli https://orcid.org/0000-0003-3231-2415
The data that supports the findings of this study are available in the supplementary material of this article.
Genetic and field data
Data type: genetic and field data (word document)
Explanation note: Sample information, parentage relationships, marker summary statistics, FRANz v. 2 and Colony v. 2.0.6.4 output data.