Haplotype Map of Sickle Cell Anemia in Tunisia

β-Globin haplotypes are important to establish the ethnic origin and predict the clinical development of sickle cell disease patients (SCD). To determine the chromosomal background of β S Tunisian sickle cell patients, in this first study in Tunisia, we have explored four polymorphic regions of β-globin cluster on chromosome 11. It is the 5′ region of β-LCR-HS2 site, the intervening sequence II (IVSII) region of two fetal (G γ and A γ) genes and the 5′ region of β-globin gene. The results reveal a high molecular diversity of a microsatellite configuration describing the sequences haplotypes. The linkage disequilibrium analysis showed various haplotype combinations giving 22 “extended haplotypes”. These results confirm the utility of the β-globin haplotypes for population studies and contribute to knowledge of the Tunisian gene pool, as well as establishing the role of genetic markers in physiopathology of SCD.


Introduction
The haplotype of the -globin gene cluster located on chromosome 11 has been used widely to obtain information about human variation, genetic relationship, and evolutionary analysis. The gene responsible for sickle cell disease (SCD) [ 6(A3)Glu → Val, GAG → GTG] [1] has been found to be associated with five different restrictions haplotypes (HR). These haplotypes are designated the Benin (BEN), Bantu, or Central African Republic (BAN or CAR), Senegal (SEN), Cameroon (CAM), and Arab-Indian (ARB) types, according to the geographical area in which they are most commonly found [2][3][4][5][6]. Such a geographic prevalence of the gene associated with specific haplotypes has been argued to demonstrate the independent origin of the mutation in these regions [5,7]. This assumption has been rejected by others who uphold the fact that the origin of the mutation is unicentric [8].
Previous attempts to correlate these haplotypes as predictors of clinical phenotypes observed in SCD have not been successful. We speculate that the distribution and number of RFLP sites used historically to define -haplotypes are not sufficient to define the full range of genetic variations in this region. To test our hypothesis, we performed a polymorphism genotyping framework analysis across the -globin cluster.
In the present study, we report the molecular investigations of four repeats sequences configurations (AT) N 12 (AT) motif within the 5 HS2 region of -LCR site, (TG) (CG) motif within IVSII region of fetal globin gene ( G and A ), and (AT) T motif within 5 region of -globin gene region of Tunisian chromosomes with five different RFLP-haplotypes (HR) described previously by Imen et al. [10]. Besides, we have searched an association between these genetic markers in order to determine a specificity for the Tunisian chromosome. Indeed, the "extended haplotype" (HE), regrouping RFLP and sequence haplotype (HS), is present in each of the ethnic groups as specific to chromosome and could be involved in the phenotypic expression of the disease.    [11]. The following framework polymorphisms were investigated by polymerase chain reaction (PCR) and direct sequencing: (AT) N 12 (AT) repeat configurations within the 5 HS2 region of -LCR site [12,13], (TG) (CG) configurations in the IVSII region of fetal globin gene ( G and A ) [14], and (AT) T repeat configuration in the 5 region of -globin gene [15] (Figure 1), using respective couples of primers as summarized in Table 1. Tunisian RFLP haplotypes (HR) described previously in Imen et al. have been used in this study [10].

Statistical Analysis.
Given some of our genotypes that have an unknown gametic phase and include a large number of alleles, Arlequin, a program for the analysis of population genetic data, was used to perform a likelihood method for the analysis of linkage disequilibrium between the genetic marker configurations in each chromosome. Statistical significance was setat < 0.05 [16].
The relationship between restriction haplotype and genetic markers was investigated by the uses of a PCA (principal component analysis) analysis. This analysis reduces a large number of variables to a few orthogonal variables called principal components (PC), which describe the largest covariance in the data analyzed as allele frequencies [17].

Framework Analysis.
Molecular data and allele frequencies concerning the microsatellite configurations, observed in chromosomes, are described in Figure 2. The (AT) N 12 (AT) configurations of 5 HS2 region of -LCR were named from L1 to L13, the (TG) n (CG) m configurations of IVSII region of G gene were named from G1 to G7, the (TG) n (CG) m configurations of IVSII region of A gene were named from A1 to A7, and the (AT) T configurations 5 region of -globin gene were named from B1 to B8.

(AT) x N 12 (AT) y Motif in 5 HS2 Region of -LCR Site.
In this study, we used the results previously published in the article by Ben Mustapha et al. [13]. Figure 2 shows the existence of several variations in the 5 -LCR HS-2 for chromosomes and the L6 (AT) 8 N 12 GT(AT) 7 configuration was predominant in the studied sickle cell disease population with 62.11% of the total alleles.

(TG) n (CG) m Motif in IVSII Region of Fetal
Globin ( G and A ) Gene. Seven different microsatellite configurations of the (TG) n (CG) m motif were found among chromosomes in the IVSII region of G -globin gene ( Figure 2). One novel sequence configurations G2 sequence * 6.61% Reference 1.83%  to G7, those of IVSII-A were named between A1 and A7, and those 5 of -globin gene were named from B1 to B8. L1, G1, A1, and B1 are the reference sequence configurations from the HUMHBB * . * The nucleotides are numbered according to the HUMHBB 73308 bp GenBank, Version: U01317.1 GI:455025.

(AT) x T y Motif in 5 Region of -Globin Gene.
Several microsatellite configurations of the (AT) x T y motif in the 5globin gene studied among chromosomes were shown. The most frequent configuration was B2 (AT) 9 T 4 , it is specific to Tunisian chromosomes with an allele frequency of 49.25% ( Figure 2).

Marker Combinations
3.5.1. Arlequin Results. The Arlequin results, by the linkage disequilibrium test between genetic markers, showed an association between microsatellites motifs in regions of 5 HS2 -LCR, IVSII of fetal gene ( G and A ) and 5 of -globin gene defining a sequences haplotypes (HS). However, the data confirmed that the HR is in strict linkage disequilibrium with HS. Indeed, each association constitutes a haplotype which appears to be specific to Tunisian chromosome named "extended haplotype" (HE). The extended haplotype designates possible combinations obtained by the linkage disequilibrium test; it was performed first by "ARLEQUIN" and afterwards by PCA. The results of possible combinations grouping the HR and HS, summarized in Table 2, show 22 "extended haplotypes. " In fact, the Benin HR was associated with the configuration L6 in 5 -LCR-HS2, G2 in IVSII Gglobin gene, A5 in IVSII A -globin gene, and B2 in 5 -globin gene ( Figure 3).

PCA Results.
In order to verify and complete our Arlequin study, we used the PCA analysis. It was applied to visualize the distribution of polymorphic regions according to restriction haplotypes (HR) of SCD individuals and to achieve an adequate condensation of the information. It was performed to establish an overview about the correlation between the HR and the polymorphic regions, namely, (AT) x N 12 (AT) y microsatellite configurations of 5 -LCR HS2 region, (TG) n (CG) m configurations of IVSII region of fetal genes ( G and A ), and (AT) x T y configurations of 5 region of -globin gene (Figures 4 and 5).
The linkage disequilibrium test was performed by "ARLEQUIN" software (Table 2) after being performed by PCA (Figure 3) to deduce the significance as value. Probability values of < 0.05 were considered to be statistically significant.
The data revealed that about 66% of the total variation could be explained by the first two PCs in Figure 4. Additionally, the PCA in Figure 5 revealed that the two first PCs together accounted for 80% of the total observed variability (data not shown) according to "Statgraphics plus 5.0" software. Results, from the PCA of Figure 4, indicate that Benin HR (B) is shown to be near to L1, L6, L10, and L13 configurations of -LCR HS2 region. Figure 5 shows that B haplotype is additionally near to G2 and G7 configurations of IVSII-G region, to A5 configuration of IVSII-A region, and to B1, B2, B3, B4, B5, and B6 configurations of 5 -globin gene. Thus, B haplotype found close to these configurations shows a positive correlation. Afterwards, the linkage disequilibrium test was performed to deduce a unique and important association between Benin and microsatellites configurations defining a combined haplotype, named the "extended haplotype": "L6-G2-A5-B2" (Figure 3). Analyzing Figures 4 and 5, we observe that A1 haplotype is near to L4, L5, and L12 configurations of -LCR HS2 region, G1 and G3 of IVSII-G region, A6 of IVSII-A , and B7 configuration of 5 -globin gene. Table 2 of "ARLEQUIN" test shows that haplotype A1 could be associated with the following (HS): L12-G3 or G2-A6-B5 or B2 or B7 or B8. However, the linkage disequilibrium test according to "ARLEQUIN" data and PCA correlations shows four possible combinations between A1 and HS; the deduced "extended haplotypes" are L12-G3-A6-B2; L12-G3-A6-B5; L12-G2-A6-B2; and L12-G2-A6-B5 ( Figure 3).
The A, A2, and Bantu haplotypes which are atypical ones are shown to be too near to one another referring to PCA results (Figure 4), since some configurations are in common such as L2, L3, L7, L8, and L11 of -LCR HS2 region.
Furthermore, Bantu haplotype is relatively near to G4 and A7 configurations of IVSII regions of fetal gene ( G and A ), respectively, while no configurations of these regions, mentioned above, are shown to be associated with A and A2 haplotypes.
Be noted that we have observed, in PCA analysis, the correlation with only one haplotype, while several configurations of polymorphic regions have been displayed to be associated with other haplotypes according to the significant value ( < 0.005), such as G2 configuration which seems to be associated with only B haplotype ( Figure 5), whereas considering value, it will be also associated with atypical haplotypes A ( = 0.0050), A1 ( = 0.0373), and A2 ( = 0.0405).
Additionally, it is important to mention that some microsatellite configurations, present initially in PCA plots ( Figures 4 and 5), are not shown in Table 2, because of value which is lower than 0.005 ( < 0.005) with studied restriction haplotypes, so the association is insignificant. According to the linkage disequilibrium test (Figure 3), some correlations could be illustrated here, between A haplotype and HS, namely, the HE: L2 or L8-G2-A5-B4 or B6, between A2 haplotype and HS, namely, the HE: L9 or L8-G2-A5-B2, and between Bantu haplotype and HS, namely, the HE: L6 or La or L11-G4-A7-B2 or B5 or B7.
Afterwards, the genomic relationship between the studied haplotypes was presented in a dendrogram according to PCA data ( Figure 6) to verify previous observations and suggestions in order to confirm the accordance with our hypothesis that is published in our previous publication Imen et al. [10].
The neighbor-joining tree in Figure 6 was constructed according to maximum likelihood of the microsatellites configurations at -globin locus in 5 -HS2 LCR region, IVSII region of (G and A) -globin gene, and 5 region of -globin gene among five restrictions haplotypes. This dendrogram showed three main levels. The first one highlights that B haplotype is the farthest one (distance of B = 60) compared to the others 6 Disease Markers (distance of A and A2 = 28, Bantu = 36, and A1 = 54), hence so to conclude that B haplotype is the ancestral haplotype. Through the second level, we found that A1 haplotype could be the common ancestor of A, A2, and Bantu haplotypes, since it appears upon this group while it is the farthest one. The third level shows that A and A2 haplotypes diversified from the same ancestor, the Bantu haplotype, and that the genomic distance between these haplotypes is very close and their ramifications are short. Based on these findings, we can suggest that these haplotypes are outcome of crossing over.

Discussion
There are five major -globin cluster haplotypes in the world: four in Africa (Senegal, Benin, Bantu or CAR, and Cameroon) and one in Asia (Arabian-Indian). The Benin haplotype has been reported to be associated with genes in Algeria, Tunisia, Egypt, Jordan, Lebanon, Sicily, Greece, and Turkey [18][19][20][21], confirming a common genetic background for all the North African alleles. Historic data from these countries indicate that the Benin gene has traveled from Central West Africa to North Africa and various Mediterranean countries [3].
In Tunisia, sickle cell anemia haplotype was reported for the first time by Abbes et al. [20] and later by Imen et al. [10] describing six haplotypes: Benin is the most common, Bantu, and four atypical haplotypes [10].
Previously, RFLP analysis has been used as the common approach to establish -haplotypes. Thus, our prior study [10] showed that the chromosome background, characterized by several atypical haplotypes, suggests that the restriction sites are not equally informative to know its origin. Indeed, we need further analysis focusing on other polymorphisms, namely, the microsatellites configurations linked to theglobin cluster, which defines the sequence haplotypes (HS).
The sequencing results revealed a great molecular heterogeneity of microsatellites repetitions with the predominance of some motifs which are specific to chromosome such as the following configurations L6 (AT) 8 N 12 GT(AT) 7 within the 5 HS2 region of -LCR site, the two newest configurations G2 (TC)(TG) 9 (AG)(TG) 2 (CG) 2 and A5 (TC) 2 (TG) 9 (CG) 2 CACG(TG) 7 within IVSII region of the fetal globin genes ( G and A ), respectively, and the configuration (AT) 9 T 4 with the 5 region of -globin gene, confirming the diversity of Tunisian chromosome. Therefore, our findings demonstrate a great molecular diversity of sequences haplotypes (HS) and specificity to Tunisian chromosomes. Furthermore, to study the origin of ethnic groups, we need to correlate the HS with HR. Since the HS is represented by a major microsatellite configuration. Here we describe a discordance between the HR and the HS; the HS alone cannot establish the genetic map of sickle cell anemia in Tunisia and to describe the genetic structure of the Tunisian population, therefore, we must study the "extended haplotype" which is obtained by a combination between HS and HR.
The atypical haplotypes such as A and A2 haplotypes were associated with several motifs with common sequences: L3, L2, L7, L8, and L11 in the 5 HS2 region of -LCR site and with B2, B4, and B6 in the 5 region of -globin gene. Also, the (A1) haplotype is associated with L12 in 5 HS2 -LCR, with G1 and G3 in the IVSII G gene, and with B2 and B5 in 5 -globin gene. These results suggest that such specific genetic profiles may result from a crossing over at the Hot-Spot region between the -LCR site and -globin gene.
In fact, it has been demonstrated that a microsatellite sequences can appear in association with different haplotypes. The same microsatellite could be associated with specific haplotype. In general, involvement of the same polymorphism with more than one haplotype is most consistent with the crossing over of 5 into the -globin cluster, presumably within the region of increased recombination, the Hot-Spot region.
The PCA correlation plots show that A, A2, and Bantu haplotypes were associated with several and common configurations, unlike A1 and B haplotypes which seem to be independent, thus to suggest that these two latter haplotypes are specific to Tunisian SCD chromosome and not resulting from crossing over such as A, A2, and Bantu haplotypes.
The neighbor-joining tree was constructed according to maximum likelihood of the microsatellites configurations at -globin locus in 5 -HS2 LCR region, IVSII region of (G and A) -globin gene, and 5 region of -globin gene among five restrictions haplotypes. The results reveal that the (B) haplotype is the common ancestral restriction haplotype, the A1 is the ancestor of A, A2, and Bantu haplotypes, and the two haplotypes (A and A2) diversified from the same ancestor Bantu, suggesting that these haplotypes are outcome of crossing over. This finding comes to confirm our suggestion cited in the article previously published by Imen et al. [10].
In conclusion, our observations demonstrate that the traditional RFLP approach does not accurately reflect the complexity of the -locus. Our study is the first detailed genomic analysis of this region conducted on Tunisian patients using modern genomics techniques. The results suggest that microsatellite mapping is not a better approach for defining -haplotypes in our Tunisian population, but which can be used to correlate with the different clinical phenotypes observed in SCD and may have implications for basic research studies in globin gene regulation.
In the future, we intend to perform multifactorial analysis, namely, modifier gene analysis, of the genetic background of sickle cell chromosome, in order to improve our understanding of the natural history of sickle cell disease in SS patients. Our data also represents the first study in identification of the -haplotype distribution in Tunisian patients, which may be useful for the clinical handling and outcome of the sickle cell patients.