The Polymorphism Analyses of Short Tandem Repeats as a Basis for Understanding the Genetic Characteristics of the Guanzhong Han Population

The short tandem repeat (STR) loci are polymorphic markers in the combined DNA index system (CODIS) and non-CODIS STR loci. Due to the highly polymorphic characteristic of STR loci, they are popular and widely used in forensic DNA typing laboratories. In this study, 22 STR loci (1 CODIS, 21 non-CODIS STR loci) and an Amelogenin locus were genotyped and analyzed in 590 unrelated individuals of the Guanzhong Han population. None of the 22 STR loci deviated from the Hardy–Weinberg equilibrium, and all the loci were in the linkage equilibrium state. We observed 247 alleles, and the corresponding allelic frequencies ranged from 0.0008 to 0.3695 in the Guanzhong Han population. The combined power of discrimination and the cumulative exclusion probability was 0.999 999 999 999 999 999 999 999 999 346 36 and 0.999 999 999 709 74, respectively. The results including Nei's DA genetic distance, multidimensional scaling analysis, and principal component analysis showed that the Guanzhong Han population has closer genetic affinities with Northern Han, Chengdu Han, and Xinjiang Hui groups from China based on allelic frequencies of 15 overlapped STR loci from Guanzhong Han and 13 reference groups. The present results indicated that Microreader™ 23sp ID kit included highly polymorphic loci, and it could be well used for individual identification, paternity testing, and population genetics in the Guanzhong Han population.


Introduction
Short tandem repeats (STRs), as the molecular genetic marker of DNA length polymorphism, are widely spread in the human genome and often used to study population genetics, individual identification, and paternity testing by forensic researchers [1,2]. The US Federal Bureau of Investigation selected the 13 highly polymorphic autosomal STR loci as the core loci of the combined DNA index system (CODIS) in 1991, which were called the CODIS STR loci. Since then, many commercial STR kits have been manufactured that contained these 13 CODIS STR loci, such as AmpFℓSTR Identifiler kit and PowerPlex® 16 System kit [3]. Researchers from different countries have conducted different population studies based on these kits for the human genetic analysis and individual identification in forensic practice [4][5][6][7][8]. In order to improving the power of identification and reducing the probability of mismatching, seven polymorphic STR loci have been added to 13 CODIS STR loci, turning it to 20 CODIS STR loci [9]. However, it did not meet the purpose of forensic researchers sometimes. Most of researchers turned their attentions to non-CODIS STR loci. Compared with the CODIS STR loci, the advantage of non-CODIS STR loci is that there are a large number of highly polymorphic loci to choose from for researchers. Therefore, the combined CODIS STR loci and non-CODIS STR loci can not only improve the efficiency of individual identification of the whole system but also improve the performance of paternity testing.
The Han nationality is the largest of 56 ethnic groups in China, accounting for 91.59% of the national population [3,10,11]. Han individuals are widely distributed in China with different cultural backgrounds and genetic characteristics. Previous researches reported that there were significant differences in allelic frequency distributions of some STR loci in Han populations in different regions [12][13][14]. Therefore, STR loci could be used to study population genetics of Han nationality. Han populations in different regions have different research values for researchers with different research purposes. It is not only of great significance to the forensic medical research, but also the effective expansion of population genetic data and clarified the genetic characteristics and genetic relationships among Han populations in different regions.
The Guanzhong region is located in the central part of Shaanxi province containing the five cities of Xi'an, Tongchuan, Baoji, Xianyang, and Weinan. It accounts for 55623 square kilometers with a permanent population of more than 23 million [15]. Xi'an, known as Chang'an in ancient times, is the birthplace of Chinese Silk Road and one of the four ancient capitals of the world. Today, the Guanzhong region is an important passage connecting the south and north regions of Shaanxi province [16]. And the Microreader™ 23sp ID system (Suzhou Microread Genetics, Suzhou, Jiangsu, China) has been proved to be a useful tool for forensic practice, including 21 non-CODIS STR loci, one CODIS STR locus, and one gender identification locus (Amelogenin) [17]. In this research, the Microreader™ 23sp ID system was used to study the genetic polymorphisms of 22 autosome STR loci for the Han population in the Guanzhong region, and the effectiveness of this system was further verified. At the same time, 13 groups were selected as reference groups in this study based on 15 shared STR loci with the purpose of understanding the genetic relationships among the groups from China, including Xinjiang Hui [18], Northern Han [10], Southern Han [19], Chengdu Han [20], Xinjiang Uygur [21], Xinjiang Kazakh [22], Xinjiang Mongolian [23], Hainan Li [24], Zhejiang She [25], Guangdong Han [26], Hainan Han [27], Qinghai Tibetan [28], and Changsha Hui groups [29].

Samples Collection.
According to the research purpose, the blood samples of 590 healthy volunteers in the Guanzhong region were collected and made into dried blood spot specimens. Before blood samples were collected, 264 male and 326 female volunteers who claimed that they were unrelated at least three generations understood and signed the informed consents. This project had been supported from the human and ethical committees of Xi'an Jiaotong University and Southern Medical University before it started.
2.2. Multiple PCR and Genotyping. All samples were directly and simultaneously amplified with 22 STR loci and the Amelogenin gender locus using Microreader™ 23sp ID kit without the DNA extraction. Multiplex PCR with the 25 μl volume was performed by the GeneAmp® PCR 9700 Thermal Cycler (Applied Biosystems, Foster City, CA, USA). Then, PCR products labeled with fluorescent dyes were size-separated, detected using ABI PRISM® 3130xL Genetic Analyzer (Applied Biosystems, Foster City, CA, USA), and the results were analyzed by GeneMapper® ID-X 1.3 software (Applied Biosystems, Foster City, CA, USA). All the conditions and reagents in PCR and capillary electrophoresis were the same as previously reported [17,18].
2.3. Statistical Analyses. Allelic frequencies and forensic parameters including probability of exclusion (PE), polymorphism information content (PIC), matching probability (MP), power of discrimination (PD), and observed heterozygosity (Ho) were calculated by modified PowerState spreadsheet (version 1.2). And it was used to analyze P values of Hardy-Weinberg equilibrium (HWE) tests as well. According to specification, ARLEQUIN software (version 3.5) [30] was used to calculate the expected heterozygosity (He). Linkage disequilibrium (LD) analyses of all pairwise loci among 22 autosomal STRs were performed by ARLEQUIN software (version 3.5) and SHEsis online software [31] in this study. Based on the allelic frequencies of 15 overlapped STR loci of the Guanzhong Han population and other 13 reference groups, the locus-by-locus P values were calculated by ARLE-QUIN software (version 3.5) using analysis of molecular variance (AMOVA) method, and the calculations of Nei's D A genetic distances [32] were performed by DISPAN software. The ggplot2 package was used to draw a triangular heat map to show the values of Nei's D A more intuitively by R statistical software (version 3.0.2). A Neighbor-Joining tree (NJ-tree) and a circular phylogenetic tree were constructed by MEGA version 3.697 [33] and iTOL online software (https://itol .embl.de/itol.cgi) based on the values of Nei's D A distances with the purpose of illustrating the genetic relationships among 14 different groups. In addition, an unrooted tree was plotted by PHYLIP software (version 3.69) on the basis of the allelic frequencies of 15 same STR loci. The multidimensional scaling analysis (MDS) diagram was plotted using an algorithm in PASW statistics software (version 18). In addition, the principal component analysis (PCA) diagram was constructed by MVSP software (version 3.1) based on allelic frequencies of 15 shared STR loci.

Forensic Efficacies of 22 STRs in the Guanzhong Han
Population. All the 22 STR loci were successfully genotyped in 590 unrelated samples. The P values of 22 STR loci for the HWE tests in the Han population resided in the Guanzhong region were presented in Table S1. Only one locus (D6S477) deviated from HWE due to its P value with lower than the significance level of 0.05, while the others were higher than the significance level. P values of the LD tests were presented in Table S2, and the results showed that the P values of 25 pairs of 231 pairs in 22 STR loci were less than 0.05. The pairwise actual correlation coefficient (r 2 ) values of 22 STR loci were less than 0.01, demonstrating that there were no strong relationships between pairwise 2 BioMed Research International loci ( Figure S1). There were no significant deviations from HWE (P ≤ 0:05/22) and linkage equilibrium (P ≤ 0:05/231) after using Bonferroni correction, indicating that these loci were independent of each other and could be performed in subsequent calculations using the multiplication rule. Allelic frequencies and forensic parameters of 22 STR loci were counted by Modified PowerState version 1.2 spreadsheet and shown in Table 1 and Figure 1, respectively. A total of 247 alleles were detected in this study, and the corresponding allelic frequencies ranged from 0.0008 to 0.3695. In the variations of 22 STR loci, the D4S2366 locus was the low-  Figure S3 showed that polymorphisms of the 22 STR loci were relatively high in the Shaanxi Han population, followed by the Huaxia Platinum System [34], and the AGCU 21+1 STR system [15].

Analyses of Genetic Differences among 14
Groups. The locus-by-locus P values of the 15 same loci were calculated using AMOVA method and presented in Table 2 to explore the differences between Guanzhong Han and 13 reference groups. After Bonferroni correction (0:05/105 = 0:0005), there were significant differences between Guanzhong Han and Xinjiang Uygur, Xinjiang Kazakh, Hainan Li, Zhejiang She, Hainan Han, Xinjiang Hui, Qinghai Tibetan, Xinjiang Mongolian, Southern Han, Northern Han, and Changsha Hui populations at 12, 11, 11, 11, 6, 3, 2, 2, 2, 1, and 1 STR loci, respectively. The Nei's D A values of the pairwise groups were shown in Table S3, and the intuitive heat map was shown in Figure 2, in order to explore the genetic distances among 14 groups. In the triangular heat map, the gradient from white to blue to purple represented the D A values from small to median to large. The Hainan Li and Changsha Hui showed the maximum Nei's D A value (0.1136), while Xinjiang Hui and Northern Han showed the minimum Nei's D A value (0.0029). As far as the Guanzhong Han population, it showed the longest genetic distance with the Changsha Hui group (D A = 0:0950), on the contrary, it presented the shortest genetic distance with the Northern Han population (D A = 0:0030).

Three Phylogenetic Trees and Multidimensional Scaling
Analysis of 14 Different Groups. The genetic relationships among different groups were intuitively exhibited in trees. The NJ-tree which was shown in Figure 3(a) was constructed by MEGA software based on Nei's D A distances, and the circular phylogenetic tree presented in Figure 3(b) was drawn by iTOL software. An unrooted tree was constructed by PHYLIP Software v3.69 with the allelic frequencies of 15 overlapped STR loci and shown in Figure 3(c). NJ-tree has two major branches, one of which was the Changsha Hui and the other major branch contained three important sub-branches. The first minor branch was the Qinghai Tibetan group; the second one included Xinjiang Mongolian, Xinjiang Kazakh, and Xinjiang Uygur groups; however, the Han populations of six different regions in China, Xinjiang Hui, Hainan Li, and Zhejiang She groups formed the third subbranch. The two populations closest to the studied Guanzhong Han were the Northern Han and Chengdu Han populations. What is more, compared with other groups in Xinjiang region, Guanzhong Han had a smaller genetic distance to the Xinjiang Hui group. Similar results were observed in circular tree and unrooted tree.
Next, in order to further explore the genetic relationships of 14 populations, we constructed a MDS plot based on the pairwise Nei's D A distances, which was presented in Figure 4. The Changsha Hui group was located in the lower left quadrant of the plot, which was far away from other groups. The Xinjiang Mongolian, Xinjiang Kazak, and Xinjiang Uygur groups were located at the upper right in the plot; and Zhejiang She, Hainan Li, and Hainan Han populations were in the lower right part of the plot, whereas the other seven groups (Qinghai Tibetan, Xinjiang Hui, Guanzhong Han, Chengdu Han, Northern Han, Southern Han, and Guangdong Han groups) were at the middle line in the plot, and the Northern Han located above the middle line was the closest population to the Guanzhong Han population. The MDS result showed that Guanzhong Han had relatively close genetic relationships with Northern Han, Chengdu Han, and Xinjiang Hui groups, which was consistent with the results of three trees.

Principal Component Analysis among the 14
Populations. MVSP software was used to construct two PCA plots based on the allelic frequencies of 15 overlapped STR loci with the aim of demonstrating the genetic relationships among the 14 groups. The first, second, and third proportions of principal components accounted for a total of 47.484%. Different colors represented different populations in the two plots, and different shapes represented different populations or different regions; that is, the solid circle represented the Han population, the upward triangle represented the Tibetan group, the downward triangle represented the Hui group, the hollow circle represented the Li group, and the diamond represented the ethnic groups 3 BioMed Research International    Figure 1: Forensic parameters of 22 STR loci in the Han population which dwelt in the Guanzhong region of Shaanxi province. The abscissa was the names of those loci, the ordinate was the values of forensic parameters (from 0 to 1), and the bottom was the names of the forensic parameters.

Discussion
Due to the complexity of population backgrounds and geographical features in China, population genetic researches of different ethnic groups in China have attracted much attentions in recent years [35]. Since ancient times, the Guanzhong region has always been a very important place in terms of the geographical location and natural resources. The core city of the Guanzhong region is Xi'an (Chang'an in ancient times) which is one of the important birthplaces of Chinese civilization, and it is the starting point of the overland Silk Road. What is more, at least a dozen dynasties established their capitals here in history. Therefore, as the center of ancient economic prosperity and cultural exchange, Guanzhong is one of the regions where researchers paied close attention to. Therefore, this is the reason why we chose the Guanzhong Han population as the object of this research. The 22 STR loci conformed to the HWE in the Guanzhong Han population and all pairwise STR loci did not deviated from linkage equilibrium, which could be used for subsequent population genetic analyses. In addition, it would not be considered as linkage if the physical distance between two STR loci is more than 10 Mb in the human genome [36,37]. The cumulative probability could be calculated because all the 22 STR loci were located on different chromosomes in this study which was exhibited non-linkage and the P values and r 2 values of pairwise STRs in LD tests revealed that there was no LD in all pairwise loci at 22 STR loci.   H an Z h e ji a n g S h e Q in g h a i T ib e ta n H a in a n H a n C h a n g s h a H u i Xi nj ia ng Ka za kh X in ji a n g U y g u r X in ji a n g H u i

Hainan Li
Xin jian g Mo ngo lian Northern Han C h e n g d u H a n G u a n z h o n g H a n  BioMed Research International A total of 247 alleles were found in the Guanzhong Han population. Among these 247 alleles, the minimum allelic frequency value was 0.0008, and the maximum allelic frequency was 0.3695. Generally speaking, a STR locus with PIC value greater than 0.5 indicates that it may provide high genetic information in the studied group. The Ho value of STR locus used in forensic genetic research should be more than 0.8 for forensic individual identification and paternity testing. In this study, the average PIC value was 0.7851, and the mean Ho value was 0.8088, which were similar to the values of other 13 reference groups based on 15 overlapped STR loci ( Figure S2). In addition, we evaluated the efficacy of this kit by comparing the forensic parameters of 22 STR loci included in this kit in the Guanzhong Han population with that of these STR loci in two commercial kits (AGCU 21+1 STR system and Huaxia Platinum system) reported in the Shaanxi Han population. The average PIC, Ho, PD, and PE values of 22 STR loci in this study were higher than these of the other two kits, showing that the 22 STRs were highly polymorphic in the Guanzhong Han population, which could also be better used in the population genetic research. The CPD value was 0.999 999 999 999 999 999 999 999 999 346 36, which was much larger than that of the 13 CODIS STR loci (0.999 999 999 999 9851) [38], indicating that the 22 autosomal STR loci could be effectively used in individual identification and paternity testing in the Guanzhong Han population.
Allelic frequencies of the 15 STR loci were used to analyze the population differences between the Guanzhong Han and 13 reference groups published previously using AMOVA method in ARLEQUIN software, and the results showed that these loci with significant differences had population differentiation abilities between the Guanzhong Han population and the other 13 groups, which were suitable for the comparative 9 BioMed Research International study among populations [15]. Heat map based on Nei's D A distances showed that the closest genetic distance to the Guanzhong Han population was the Northern Han (D A = 0:0030) in this study, followed by the Southern Han (D A = 0:0040) and the Xinjiang Hui (D A = 0:0043). The three trees (rooted tree, unrooted tree, and circular tree) were constructed using three different softwares, and all the results indicated that the genetic distances between the Han populations in different regions were relatively close. MDS represented the spatial distribution of clusters in different populations and reflected the genetic differentiation pattern among populations to a certain extent. The genetic relationships between populations could be analyzed directly through the spatial distances between different populations in the MDS diagram. In the PCA diagram, the relationships between discrete points formed in the two-dimensional space on basis of allelic frequencies of 15 STR loci in every population were sufficient to reflect the genetic differentiations among populations. The results of MDS and two PCA plots were the similar as those of the phylogenetic trees in this research.
Previous research using HLA loci has explored the relationships between the Guanzhong Han population of

10
BioMed Research International Shaanxi province, Northern Han, and Southern Han populations and pointed out that these populations had closer genetic relationships [39]. Obviously, the result of the present study was consistent with the above result. The reasonable explanation of the close genetic relationships between the Guanzhong Han population and Han populations in different regions was that the Guanzhong region was an economic and cultural center in ancient times, and many immigrants chose to live here. Moreover, intermarriages between Han individuals in different regions of China have led to widespread gene exchange among different Han populations. There were close genetic relationships between Han populations and Xinjiang Hui group, but Han populations were far from the Changsha Hui group, which might be related to the complex origin of the Hui group. This phenomenon has been reported in previous literatures on the studies of Hui group [29,[39][40][41][42]. The present results on population genetics for the Guanzhong Han population provided valuable genetic information and reference data for the subsequent genetic relationship study of the Guanzhong Han population and also added the novel population data to the database of the Chinese Han nationality.

Conclusion
In this research, 22 autosomal STR loci were used to successfully typed and collected genetic information from 590 Han individuals dwelling in the Guanzhong region. The CPD and CPE values showed that the 22 STR loci could be applied to forensic individual identification and paternity testing for the Guanzhong Han population in daily forensic DNA cases. The results of the phylogenetic analyses indicated that there were relatively close genetic relationships between the Guanzhong Han population and the other Han populations in different regions of China and Xinjiang Hui group. In order to further reveal ancestral components for the Guanzhong Han population, more genetic markers will be used for detection and analysis in the future.

Data Availability
Data supporting the results of this study are available from the corresponding author upon request.

Ethical Approval
This project was approved by the Ethics Committees of Xi'an Jiaotong University and Southern Medical University before it started, and it was carried out in strict accordance with the ethical research principles of Xi'an Jiaotong University and Southern Medical University.

Consent
Informed consent was obtained from all individual participants included in the study.

Conflicts of Interest
There are no conflicts of interests to declare.