Genetic diversity of Castanea sativa an endangered species in the Hyrcanian forest

Castanea sativa Mill. is one of the most endangered tree species in Iran where it is represented by small fragmented populations in the north of the country. 18 simple sequence repeat (SSR) loci (10 nuclear and 8 chloroplastic) were used to evaluate the genetic diversity and population structure of C. sativa from the Hyrcanian forest. For nuclear SSR, the number of alleles detected per locus ranged from 1 to 5 and observed heterozygosity (HO) was between 0.125 and 1.000. Analysis of molecular variance (AMOVA) indicated a high level of variation within populations (84%) and low levels between populations (16%). Based on structure analysis, the four studied populations were divided into two main clusters that have genetic distance Fst = 0.3. The Shafaroud population was separated in the first cluster, Siyahmazgi, Qalehroudkhan and Veysroud were placed in the second cluster. The UPGMA analysis confirmed the results of Structure analysis, separating the Shafaroud population from the others. The 8 chloroplast SSR loci used to screen the populations showed no polymorphism. In General, low nuclear genetic diversity, no polymorphism in cpDNA and considerable genetic differentiation among populations in short geographical distance represent a serious genetic erosion threat for C. sativa in the Hyrcanian forest, even hinting at an ongoing extinction vortex. Therefore, due to significant decline in genetic diversity, it is essential to introduce constraints protection upon the areas of distribution of all four populations of this species in Iran.


Introduction
The Hyrcanian forests stretch in an arc along the southern shores of the Caspian Sea from the Talish region in Azerbaijan to Golestan National Park in Iran and between latitudes in Azerbaijan Republic and in Iran (Payne and Miller 1994).Studies show that the occurrence of many Arcto-Tertiary relict elements, such as Zelkova carpinifolia (Pall.)K. Koch, Parrotia persica (DC.)C.A. Mey., and Pterocarya fraxinifolia Spach, has led biogeographers to the consensus that the Caspian forest has been an important refugium of temperate broad-leaved trees during the Quaternary glaciations (Tralau 1963;Zohary 1973;Probst 1981;Leroy and Arpe 2007).In terms of conservation and biodiversity, 44% of the total known plant species of Iran (3234 out of 7300 species ;Akhani 2006) are presented in only 6% of the Iranian surface area (Hyrcanain forst).Only ca.280 species from ca. 500 endemic and sub-endemic of Iranian vegetation are present in the Hyrcanian forest (Akhaniet al. 2010;Zarafshar et al 2010;Yousefzadeh et al 2014).
Sweet chestnut (Castanea sativa Mill.) is one of the rare and critically endangered tree species in the Hyrcanian forest (Jalili and Jamzade 1999), reported for the first time by Jazirei (1961).Unfortunately, the number of habitats and density of this species were significantly reduced due to blight fungus, livestock grazing, and seed collection by villagers (Yousefzadeh et al. 2014).Now there are only four small isolated populations of chestnut that remain in the Hyrcanian forest; they are located in the areas of Veysroud (V), Shah balut Mahaleh Lahijan (S), Shafarud (Sh) and QalehRoud khan (R) (Alipoor et al. 2015).Although the knowledge of genetic diversity is the first step to provide appropriate strategies for conservation of genetic resources, there is little information about chestnut in Iran, especially on its population genetic diversity.
The availability of DNA-based markers provides efficient and reliable means for evaluating biodiversity among plant genomes (Karp et al. 1996).Molecular markers such as random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), simple sequence repeat (SSR) and inter-simple sequence repeat (ISSR) are frequently used for evaluating the genetic variation of plants (Baye 2012;Karp et al. 2012;Frascaroli et al. 2013;Tam et al. 2014).
Several studies were performed on different species of the genus Castanea, including C. sativa populations, using molecular markers (Yamamoto et al. 1998;Botta et al. 1999;Gobbinet al. 2007;Lang et al. 2007;Pereira-Lorenzo et al. 2011;Mellanoet al. 2012;McClearyet al. 2013); Lusini et al. (2014) estimated the genetic diversity and spatial structure of Bulgarian C. sativa populations by SSRs and indicated a generally high level of genetic diversity but little divergence among populations.Studies by Quintana et al. (2015) on C. sativa in Spain showed an unusual degree of genetic isolation.Fixation index estimates and AMOVA data are supportive of an unexpectedly high level of genetic differentiation in El Bierzo, larger than that estimated in a previous study with a broader geographical scope.Torello Marinoni et al. (2013) revealed that four gene pools contributed to the formation of the population of C. sativa in northwest Italy.Mattioni et al. (2013) studied on a largescale the genetic diversity of C. sativa by SSR and the results showed a genetic divergence between the eastern (Greek and Turkish) and western (Italian and Spanish) populations.Two gene pools and a zone of gene introgression in Turkey were revealed.The inferred population structure showed a significant geographical correspondence with the hypothesized glacial refugia and rules out the migration of the chestnut from Turkey and Greece to Italy.The homogeneous gene pool observed in Italy and Spain may have originated from common refugia along with human-mediated colonization.Fineschi et al. (2000) studied haplotypes diversity of C. sativa by analyzing restriction fragment length polymorphism (PCR-RFLP) of chloroplast and mitochondrial genome regions; they detected no polymorphism for the single mitochondrial analyzed region, while a total of 11 different chloroplast (cp) haplotypes were scored.The distribution of the chloroplast DNA haplotypes revealed low geographical structure of the genetic diversity throughout southern European countries.
Among marker types, SSRs are attractive because they are codominant and many loci are available in comparison with isozyme/allozyme systems (Nelson 2009).SSRs have been widely used for the study of the genetic diversity of various plants with low taxonomic levels (species, populations or even clones).The objective of this study was to establish management strategies for the conservation of genetic resources of C. sativa in the Hyrcanian forest by (1) evaluating the genetic diversity and the differentiation of four small isolated populations of C. sativa, and (2) comparing the levels of genetic variability within and among C. sativa populations with the genetic diversity of plants with similar characteristics.

Plant material, DNA extraction, simple sequence repeat (SSR) amplification and polymerase chain reaction (PCR)
Leaf samples were collected in the Hyrcanian forest from four small isolated populations of C. sativa located in the areas of Veysroud (V), Shahbalut Mahaleh Lahijan (S), Shafarud (Sh) and QalehRoud khan (R).The names of populations, geographical positions, and the number of individuals sampled per population are given in Table 1 and Fig. 1.In order to avoid investigating clones or close relatives, sampled individuals within a population were separated by at least 30 meters.The leaves were frozen in liquid nitrogen and ground to a fine powder using a pestle and mortar.Total genomic DNA was isolated from the ground powder using a protocol adapted from Murry and Thompson (1980).Eight chloroplast SSR and 10 nuclear SSR loci (Table 2) were chosen based on their level of polymorphism for evaluating the genetic diversity of C. sativa in Hyrcanian forest.
Forward primers were labelled with a fluorochrome (6-FAM, HEX, NED or PET).Amplification of the DNA was performed using a Biorad (ICycler) thermocycler with the following parameters: (a) initial denaturation at 94 °C for 3 min; (b) 28 cycles of denaturation at 94 °C for 30 sec, primer annealing at the adequate temperature for each primer pair for 45 sec and extension at 72 °C for 1:30 min; (c) final extension at 72 °C for 30 min.Fragment analysis was carried out using a 3130xl Genetic Analyzer.Data and allele calling were elaborated with the software Gene Mapper.

Genetic diversity, population structure and bottleneck tests
Genetic variation assessment was conducted for individual microsatellite markers using GenAlEx 6.501 (Peakall and Smouse 2006;Peakall and Smouse 2012).Observed and expected heterozygosities (H o , H e , respectively) (Nei 1973) and number of alleles (Na) were calculated.An exact test for Hardy-Weinberg equilibrium (HWE) was calculated with GENEPOP 4.2 (Raymond and Rousset 1995;Rousset 2008).The frequency of null alleles at each locus was estimated by the square root method (Nei 1987).Analyses of molecular variance (AMOVA) were carried out in GenAlEx with two different analyses of distance, the number of different alleles (F st ) based on the infinite allele model.Gene flow parameter, Nm (the product of the effective population number and rate of migration among populations) was calculated from F st as N m = (1-F st )/F st for cpSSR data or N m = 0.25 × (1-F st )/F st for the nuclear SSR data (Hamilton and Miller 2002).R st parameter was used to measure genetic differentiation (Slatkin 1995), so as to include molecular information relating to the size of differences between the alleles in the differentiation estimates.
Genetic distance between populations was calculated according to Nei (1978).Cluster analysis based on genetic distance was performed by UPGMA (Unweighted Pair Group Method) analysis, using the Statistica software (STATSOFT Inc 1993).Genetic distances (1000 bootstraps) were computed as D = (1−proportion of shared alleles) by Microsat software (Minch 1997).
Genetic structure of populations was defined by STRUCTURE version 2.3.1 software (Pritchard et al. 2000).The admixture model was applied and allele frequencies were assumed to be correlated.Data analysis was carried out with an introduction period (burn-in) of 100 000 repeats for each hypothesis and ten trials of 20 5 Monte Carlo Markov Chain (MCMC) replications.
Since a better estimator of K, the number of homogeneous gene pools of origin for the population studied, is the modal value of ΔK (Evanno et al. 2005), this parameter was calculated by Structure Harvester software (Earl and vonHoldt 2011) and used to select the optimal K value.
Evidence of recent population bottlenecks was assessed using the program BOTTLENECK 1.2 (Cornuet and Luikart 1996).We used both a strict stepwise mutation model (SMM) (Kimura and Ohta 1978) and a two-phase model (TPM) (Di Rienzo et al. 1994) in which 90% of the microsatellite mutations followed the strict SMM and 10% produced multistep changes (Estoup and Cornuet 1999).In order to determine whether deviations of observed heterozygosity relative to that expected at drift-mutation equilibrium were significant (α = 0.05), Wilcoxon sign-rank tests (Luikart et al. 1998a) were applied.For a qualitative indicator of population bottlenecks (Luikart et al. 1998b), a mode-shift in allele frequency distribution was applied.

Genetic diversity based on nuclear and chloroplast simple sequence repeat (SSR)
The analysis of 10 nuclear microsatellite primers identified a total of 114 alleles.The number of effective alleles varied from 1.000 (QpZAG110 and CsCAT3 loci) to 3.555 (EMC38 locus).The mean of expected and observed heterozygosity were 0.483 and 0.563, respectively.The highest (0.719) and lowest (monomorphic) expected heterozygosity (H e ) were found in loci EMC38 and QpZAG110, and in locus CsCAT3, respectively.The highest rate of the observed heterozygosity (H o ) was 1.000 (EMC38) and the lowest was 0.125 observed in loci CsCAT41, QpZAG110, CsCAT3.The lowest effective number of alleles was observed in S population (1.903) and the highest in the SH population (2.435).The average observed and expected heterozygosities in R population were 0.538 and 0.463, respectively, and in S 0.675 and 0.559, respectively.In SH population, these parameters were 0.675 and 0.559, and in V they were 0.475 and 0.477, respectively (Table 3).The results of analysis of molecular variance (AMOVA Table 4;) showed that 84% of the total genetic variation was found within populations and 16% was among populations.Also the low Φ value (0.163) related to R st , is the indication of low differentiation between populations.
The 32 individuals were analysed at 8 cpSSR loci but no polymorphism was found in chloroplast SSR loci.

Genetic differentiation and population structure
Nei's distance calculated between sites showed the maximum value (0.330) between R and V populations (Table 5), which means these two populations have the highest rate of genetic diversity; the minimum diversity (0.010) was observed between R and SH, R and S, SH and V, which is in agreement with the results provided by F st value.Differentiation (F st ) and gene flow calculated for each locus (Appendix 2, available as a Supplementary file at https://doi.org/10.14214/sf.1705)showed average F st value of 0.198 and average gene flow (Nm) = 2.590.Locus CsCAT6 had the highest Nm, 11.696, and QpZAG110 the lowest value, 0.153.The 32 chestnut individuals were further studied for population stratification using the STRUCTURE program.Nuclear SSR data were analyzed with possible cluster number (K-value) ranging from 1 to 7. A sharp signal was found at K = 3, thus indicating that three gene pools shaped the genetic structure of the population analyzed.To check the composition of each population and each individual with respect to each population, further analysis was therefore carried out based on K = 3. Structure analysis suggested differentiation grouping the genotypes in 3 gene pools, approximately matching the geographic areas for 3 out of 4 provenances (Fig. 2).In spite of the ability of STRUCTURE to separate 3 gene pools, F st value (0.3) showed little differentiation between the  populations.As STRUCTURE could not provide data for K = 1, we rejected the result showing that the population was divided into 3 groups.There was one peak in the estimate of the log-likelihood of the cluster number (L(k)) since the highest likelihood was for K = 3, and L(k) either consistently increased or showed an erratic pattern with increasing variance, with all individuals admixed and the proportion of any individual assigned to each subpopulation remaining roughly similar.The Evanno criterion, ΔK (Ganopoulos et al. 2012), was not relevant as it can only be computed for K ≥ 2 and does not enable comparison of results from K = 1.For K > 2, the value of ΔK remained close to 0 in this study.The assignation of an individual to a specific gene pool was provided by   a membership probability of q i (the mean proportion of ancestry).Genotypes with a membership probability lower than 70% were considered to belong to more than one gene pool.13 genotypes (41%) showed a strong component derived from one specific gene pool (Blue color), while only 8 genotypes (25%) (Green color) resulted from different groups.The green gene pool included most trees from the SH population.The blue gene pool included most of the S trees and some of the V populations.The Red gene pool included individuals from R and V populations (Table 6).More precisely, the assignment proportions of each individual to population are reported in Fig. 3. Cluster analysis (Fig. 4) separated the 4 populations into two main clusters at a genetic distance = 0.3.SH population was alone in the first cluster, S and R and V were placed in the second cluster.The UPGMA analysis thus confirmed the results of Structure analysis, separating the SH population from the others.

Hardy-Weinberg equilibrium (HWE) and Bottleneck analyses
For most microsatellite loci there was no deviation from the HWE equilibrium (Appendix 1, available as a Suppl.file at https://doi.org/10.14214/sf.1705).Both tests (Wilcoxon and Sign) showed signs of genetic bottleneck in the V population.In this population, the number of expected loci with additional heterozygosity was 0.08 (for TPM model) and 0.034 (SMM model), which is less than the number of loci with additional observed heterozygosity in other regions (Table 7).Therefore, the null hypothesis that population is in mutation-drift equilibrium is rejected.

Discussion
Intra-population genetic diversity is important for the long-term persistence of a species and the decrease of its extinction risk when meeting future environmental changes (Kahilainen et al. 2014).By analyzing 10 microsatellite loci we identified a total 114 alleles which shows all the loci were highly polymorphic .The number of alleles per locus in our study was less than in other related papers on Castanea species (Beccaro et al. 2012;Marinoni et al. 2003;Stilwell et al. 2003).A mean of 2.625 alleles per locus was identified from the existing literature in Castanea genetic diversity.The Na of 1.75 in our study is lower than the Na in C. sativa (A = 2.05) previously described (Huang et al. 1994).The difference is more likely due to the very small size and fragmentation of Castanea populations in Iran.The genetic consequences of these phenomenon is lack of gene flow through seed and pollination leading to many changes in alleles frequency (genetic drift) and shrinking population to alleles and genotype reduction (genetic erosion) (Oostermeijer et al. 2003).According to the map of distribution of chestnuts in the world, it is clear that Hyrcanian populations are far away from the nearest populations (Armenia), which leads to a loss of gene exchange between sites.With this direction, Yousefzadeh et al. (2014) expressed the hypothesis that the long time isolation and absence of gene flow between the Iranian population of Castanea and its nearest population may have led to the creation of different taxa of C. sativa, at least at a subspecies level in the north of Iran.Mean expected (H e ) and observed (H o ) heterozygosities were 0.483 and 0.563, rather close to data by Boccacci et al. (2004) on C. sativa populations in Italy, where H e and H o were 0.592 and 0.667, respectively.On the contrary the Shannon-weaver index was lower  (Beccaro et al. 2012).In fact, Beccaro et al. (2012) and Huang et al. (2012) showed a high degree of diversity, ranging in the Castanea populations between 0.647 and 0.721, in Southern Switzerland and China, respectively.In addition, the genetic variation of the Ozark chinkapin was found to be relatively high with most of the heterozygosity harbored within populations (Dane et al. 1999).Also, geographic separation and natural barriers and transportation of seeds by human beings can influence allele diversity (Alipoor et al. 2015).The highest genetic distance was observed between V and SH populations and the lowest genetic distance was observed between R and SH populations.Since Iranian chestnut pollen is dispersed only in the local population, gene flow between the different populations of chestnut is reduced or null; this leads to isolating population and inbreeding.This factor reduces the genetic diversity and increases the similarity between chestnut populations in Hyrcanian forest.Our results agreed with the studies conducted by Villani et al. (1991) in Italy on the same species, with a lack of differentiation among chestnut populations.Lusini et al. (2014) estimated the genetic diversity and spatial structure of Bulgarian C. sativa populations by SSRs and the results indicated a generally high level of genetic diversity, but little divergence among populations.In this direction, very little genetic differentiation among Iranian populations of C. sativa was observed, except for the Siyahmazgi population.No polymorphism in chloroplasts SSR was detected.
The high similarity of C. sativa populations in Iran indicates that they very likely originated from a common ancestor gene pool.As a result, small populations developed in isolation and under inbreeding conditions, which has led to the reduction of genetic diversity of chestnuts stands and has maintained a substantially high similarity of Hyrcanian populations of chestnut in the north of Iran.
On the other hand, deviations from the HW equilibrium were observed in some of microsatellite loci, especially in EMC38.This may be due to the increase in heterozygosity of ecological mechanisms and the influence of natural selection (Freelan 2005).Both tests (Wilcoxon and Sign) in the Visrud population showed signs of genetic bottleneck.In this population, the number of expected loci with additional heterozygosity was 0.08 (for TPM model) and 0.034 (SMM model), which is less than the number of loci with additional observed heterozygosity in other regions.

Conclusion
High level of admixture among Castanea populations (STRUCTURE results) and pairwise genetic distances indicated a low level of genetic differentiation among C. sativa populations in the Hyrcanian forest, with slightly greater differentiation between the Shafaroud population and the other three populations.Regarding the huge loss of chestnut trees due to chestnut blight disease in the north of Iran for over 30 years, there is evidence that a genetic bottleneck has occurred.Signs of genetic bottleneck and also the very small size of the chestnut areas will lead to greater vulnerability of the Hyrcanian chestnut populations.Hence, in order to better conserve the chestnut populations in Iran, it is necessary to put a protection imposition upon the areas of their distribution.

Fig 4 .
Fig 4. Dendrogram depicting the distribution of genotypes of the four populations of C. sativa in Hyrcanian forest based on UMGMA method.R = QalehRoudkhan region; S = SiyahMazgi region; SH = Shafaroud region; V = Veysroud region.

Table 1 .
The geographical characteristics of four populations of C. sativa in Hyrcanian forest and the sample size in each population.

Table 2 .
Sequences of Castanea SSR primer pairs, PCR annealing temperature and PCR expected product size.

Table 3 .
Genetic variability within four C. sativa populations based on SSR markers.

Table 4 .
Analyses of molecular variance (AMOVA) for four populations of C. sativa from Hyrcanian forest by Nuclear SSR (NU SSR).Statistics include sums of squared deviations (SS); mean squared deviations (MS), variance component estimates (Est.Var.), the percentage of the total variance contributed by each component; estimator of relative genetic differentiation based on fraction of total variance of allele size between two subpopulations (R st ) and the probability of obtaining a more extreme component estimate by chance alone.

Table 5 .
Pairwise estimated of Nei's genetic distance and the calculated F st based on 10 nuclear SSR markers among four populations of C. sativa in Hyrcanian forest.

Table 6 .
The proportion membership of each individual of the four populations in each of the 3 gene pools identified by Structure analysis.

Table 7 .
Results of BOTTLENECK tests elaborated on the four C. sativa populations analysed at 10 polymorphicnuclear microsatellite loci.