Principal Component and Cluster Analysis as a Tool in the Assessment of Tomato Hybrids and Cultivars

Determination of germplasm diversity and genetic relationships among breeding materials is an invaluable aid in crop improvement strategies. This study assessed the breeding value of tomato source material. Two commercial hybrids along with an experimental hybrid and four cultivars were assessed with cluster and principal component analyses based on morphophysiological data, yield and quality, stability of performance, heterosis, and combining abilities. The assessment of commercial hybrids revealed a related origin and subsequently does not support the identification of promising offspring in their crossing. The assessment of the cultivars discriminated them according to origin and evolutionary and selection effects. On the Principal Component 1, the largest group with positive loading included, yield components, heterosis, general and specific combining ability, whereas the largest negative loading was obtained by qualitative and descriptive traits. The Principal Component 2 revealed two smaller groups, a positive one with phenotypic traits and a negative one with tolerance to inbreeding. Stability of performance was loaded positively and/or negatively. In conclusion, combing ability, yield components, and heterosis provided a mechanism for ensuring continued improvement in plant selection programs.


Introduction
Knowledge about levels and patterns of genetic diversity can be an invaluable aid in crop breeding for diverse applications [1], including analysis of genetic variability in cultivars [2,3], identifying diverse parental combinations to create segregating progenies with maximum genetic variability for further selection [4], and introgressing desirable genes from diverse germplasm into the available genetic base [5].An understanding of the genetic relationships among lines can be particularly useful in planning crosses, in assigning lines to specific heterotic groups, and in precise identification with respect to plant varietal protection [6].Study of genetic diversity is the process by which variation among individuals or groups of individuals or populations is analyzed.Data often involves numerical measurements and, in many cases, combinations of different types of variables.Phylogenetic relationships based on morphophysiological data provide a way of making a relatively rapid assessment of the diversity present, so that a greater number of related operational taxonomic units (OTUs) [7] can be subsequently tested.
It is wellknown that maintenance or preservation of germplasm involves two principal considerations: (1) avoiding loss of genetic diversity and (2) avoiding costs [8].Active collections are geared to meet the needs of the users of germplasm.Therefore, growouts of cv.s aiming at seed increase are relatively frequent in order to be evaluated.Evaluations of germplasm collections have the highest priority among germplasm functions.Germplasm enhancement embraces those activities required to aggregate useful genes and gene combinations into usable phenotypes.These aggregates could be considered as the feedstocks for varietal development programs.At this point, the present paper supports an approach to discriminate the breeding value of tomato source material, that is, commercial singlecross hybrids or open-pollinated cultivars (cv.s), during assessment.It is based on passport data, that is, morphophysiological data, yield potential, stability of performance, heterosis, and combining ability (general, GCA and specific, SCA), by the use of cluster and principal component analyses as a means of identifying sources of yield-enhancing genes [9].

Source Material.
To assess tomato source material, the phylogenetic relationship within two different gene pool resources is suggested.The first source consists of singlecross hybrids, which have become the major segment of the modern tomato seed industry.The second source consists of open-pollinated well-adapted cv.s, which are mainly grown in the open field under lower-input systems.For hybrids, the phylogenetic relationship was based on agronomical data, that is, yield and quality components, on morphological data, that is, secondary phenotypic traits, and on inbreeding depression, while for cv.s, it was based on agronomical data, morphological data, and diallel-cross products of the cv.s, that is, heterosis, heterobeltiosis, and GCA and SCA constants.The hybrids are represented by the commercial hybrids Iron and Sahara (Geoponiko Spiti, Greek Seed Company) and by the experimental hybrid Theodora (National Agricultural Research Foundation, NAGREF Greece), and the open-pollinated cv.s by Artemida, Makedonia, Areti, and Olympia (NAGREF).
The hybrids Iron and Sahara were introduced for cultivation in the 1990s, and the cultivation area of these hybrids reached almost 0.2 of the area cultivated with tomato in Greece (Geoponiko Spiti, personal communication).Theodora is new hybrid and was developed by crossing the cv.sArtemida and Makedonia in the Agricultural Research Center of Northern Greece [10].Makedonia is an old cv., which was developed by pure line selection from a local population of the late 1950s.The cv.s Areti, Artemida, and Olympia are new cv.s;cv.Areti was developed in 1998 and cv.s Artemida and Olympia were developed by Christakis and Fasoulas [11,12].All the above materials are indeterminate types.

Assessment Procedure.
The experiments were conducted at the farm of the Agricultural Research Center of Northern Greece, near Thessaloniki, during 2003-2005.Randomized complete block designs (RCBDs) were used, with three replications, each consisting of 10 plants.In 2003, the hybrids Iron, Sahara, and Theodora were evaluated in comparison to their F 2 generations.In 2005, the cv.sArtemida, Makedonia, Areti, and Olympia were evaluated in comparison to their simple diallel hybrids with reciprocals, which were obtained in 2004.
For each entry yield potential, fruit quality, physiological disorders, and plant description were obtained from each plant individually.Fruit harvested was counted, graded into different classes according to quality standards and sensitivity to physiological disorders, and weighed.Fruit quality was averaged across a sample of two fruits per plant in the traits: resistance to pressure, total solids (TS), total soluble solids (TSS), and pH.Reported data on plant and fruit descriptors were taken according to the International Union for the Protection of New Varieties of Plants.Hybrid assessment included the inbreeding depression of each F 2 as the relative difference with reference to the hybrid [13], and the determination of undesirable traits, such as lack of stability of performance.The stability of performance was defined by the standardized mean (X/s = mean/standard deviation) of single plants [14,15].The cv. combining the largest mean yield (X) with the largest X/s is the most productive and stable across environments [14].For this reason, X/s is also a way of estimating genetic yield improvement [16].Variety assessment included the estimation of heterosis and heterobeltiosis of their diallel hybrids, the determination of undesirable traits, such as lack of stability of performance, and estimation of GCA effects and SCA constants from cv. diallel crosses.The heterosis and heterobeltiosis were calculated as the F 1 proportional performance compared to the average value of the parents and the best parent, respectively.The GCA and SCA were determined according to the Griffing [17] diallel-crossing system analysis Method 1, with parental values and reciprocal crosses.Crosses were considered as random effects.

Statistical Analyses.
All RCBD experiments were analysed by analyses of variance and tests of significance at P < 0.05 for each trait.For the determination of inbreeding depression, heterosis, heterobeltiosis, stability of performance, and combining abilities, the variables total and early yield were used.
The phylogenetic relationships were studied by UPGMA (unweighted pair group method arithmetic average) and by PCA (principal component analysis).Each hybrid or cv. was considered as one OTU [7].A number of 22 traits for each hybrid (Table 5) and 35 traits for each cv.(Table 6) were transformed to standardize units.A dissimilarity matrix (DIST coefficient), based on all traits, was created for each group from the transformed data using average taxonomic distance [7].The product moment correlation (CORR coefficient) for each group was also calculated.The DIST and CORR coefficients were calculated for all possible pairs to obtain the respective matrices and create the dendrograms.The cophenetic correlation [18] for each dendrogram was computed as a measure of goodness of fit (Mantel t-test) for the method of clustering used.Data transformations, matrices and dendrograms were calculated and visualized using NTSYS-pc software program [18].Moreover, the PCA [19] was applied on our data.Two and three principal components were extracted for hybrids and cv.s, respectively.The standardized data projected on principal components.Two-and three-dimensional plots of projections of hybrids and cv.s were configured, respectively.

Cluster Analysis.
DIST and CORR matrices for hybrids and cv.s are presented in Tables 1 and 2, respectively.The dendrograms were created on the basis of the DIST and CORR matrices, for hybrids and for cv.s, which grouped both sources similarly.The cophenetic correlation for both DIST and CORR matrices of hybrids was equal to r = 0.999, while the cophenetic correlation of a cv. was equal to r = 0.974 for DIST and r = 0.976 for CORR matrix.These values indicate a very good fit of the data to the clustering method used [18].Thus, only two dendrograms are presented, one

Principal Component Analysis (PCA)
. Tables 3 and 4 present the correlation of each hybrid and cv.y with the two and three PC's, respectively.In case of hybrids, PC 1 had maximum correlation with them and accounted for 62.93% of total variance, whereas PC 2 accounted for the rest.
According to the data, PC 1 separated hybrid Theodora from hybrids Iron and Sahara.This is because the last hybrids are with negative correlation on PC 1 and their projections in terms of PC 1 (Figure 3) are almost on it.PC 2 in turn separated hybrids Iron and Sahara.Iron is the only hybrid with a negative correlation to PC 2 and its projection in terms of PC 2 is almost on it.In the case of the cv.s (Table 4), PC 1 accounted for 49.15% of total variance, whereas PC 2 and PC 3 accounted for 29.63% and 21.23%, respectively.PC 1 separated cv.Artemida from cv.s Makedonia, Areti and Olympia.This is because the last cv.shave a negative correlation to PC 1 and their projections in terms of PC 1 (Figure 4) are almost on it.PC 2 in turn separated cv.Olympia from cv.s Areti and Makedonia, which were grouped in the same subgroup (Figure 2).Cv.Olympia was the only cv. with a negative correlation to PC 2 (Table 4) and its projection in terms of PC 2 is almost on it (Figure 4).Finally, PC 3 distinguished between cv.s Makedonia and Areti of the subgroup in the DIST dendrogram (Figure 2), since cv.Areti had a negative correlation to PC 3 (Table 4).Summarizing, the PCA confirmed in detail the grouping of the dendrograms based on either DIST or CORR matrices.Furthermore, PCA was more advantageous, because it detected the most important traits for the grouping.Since PC 1 and PC 2 explained the whole variability in hybrids and 78.77% of the total variance in cv.s, the most important traits for the separation are those with the biggest loading on PC 1 and PC 2 .Tables 5 and 6 present the traits which contributed in separating hybrids and cv.s, respectively.Bold and italic fonts were used to group traits with the highest positive and negative loading, respectively.The largest group with positive loading on PC 1 included yield and yield components (0.9169-0.9932), heterosis and heterobeltiosis (0.7970-0.9165), and GCA and SCA constants (0.9352-0.9779), whereas the largest negative loading was obtained mainly by qualitative and descriptive traits {(−0.7018)-(−0.9997)}.PC 2 revealed two smaller groups of traits, one with some phenotypic traits loading positively (0.7965-0.9993), and a second one with tolerance to inbreeding loading negatively {(−0.8991)−(−0.9926)}.The stability of performance loaded positively in total yield (0.9700−0.9810), but negatively in early yield {(−0.8560)−(−0.8772)},both for hybrids and cv.s.

Discussion
The management of genetic resources is the result of the effects of single alleles on various attributes of adaptation, yield, or quality of product, which are difficult to measure [20].Selection recognizes those attributes that contribute to survivability and causes alleles that govern such attributes to increase in frequency over generations.On this basis, popular genetic materials could form the breeders' initial material for developing cultivars.Breeding schemes from allogamous species were applied to autogamous species [21, (page 52)], just like tomatoes are.Passport data measured and described in the present study showed the entire genetic constitution of the hybrids or cultivars under consideration.Differences among them occur because of the original genes and past selection that created an assemblage of genes in the greater frequencies that are desired in modern hybrids or cultivars.The phylogenetic relationship of the hybrids Iron and Sahara (Figures 1 and 3) revealed related origin and, subsequently, does not support the identification of promising offspring in their crossing.The hybrid Theodora was grouped separately showing a lack of relationship with the two randomly chosen commercial hybrids.This may be an indication of a narrowing in the genetic base of tomato commercial materials, because the two hybrids cultivated in the Mediterranean region showed a close relationship.
In the case of cv.s (Figures 2 and 4), the dendrogram and the three-dimensional plot separated them according to origin, evolutionary process and selection effects during the breeding procedure.Thus, cv.Areti, which was extracted from the old hybrid Carmello (Sluis en Groot, Enkhuizen, Holland) in the environment of Makedonia, in the 1980s, was clustered together with the long-established cv.Makedonia in the same subgroup, indicating possible common ancestors and similar evolutionary processes.This may be the reason that the degree of heterosis and heterobeltiosis for total and early yield between them, reflecting differences in gene frequencies [22], is the lowest [21], that is, an indication positively related to their genetic divergence.Cv.Olympia, which was obtained by applying honeycomb selection [23] in the F 2 generation of the old hybrid Dombo (Bruinsma Seeds) in Southern Greece [12] in the same decade, was separated from the above despite the fact that it was included in the same main group, indicating different selection processes and also divergent gene pool resources.Finally, cv.Artemida, International Journal of Agronomy which was extracted by applying honeycomb selection in the F 2 generation of the newer hybrid Vision (Enza Zaden, Seed Company) in Southern Greece [11,12] in the 1990s was completely separate.The phenotypic and genetic distance among Artemida and the rest cv.s based on additive effects lead to the assumption that the choice of the certain cv. as germplasm may be correct [21].
PC 1 , which accounted for 62.93% of total variance of hybrids and for 49.15% of cultivars (Tables 3 and 4) is strongly associated with yield-related traits, such as yield components and yield stability.Heterosis and heterobeltiosis for total and early yield had a highly positive contribution to the separation of cv.s, as well as GCA and SCA (Table 6).This is in accordance with Hunter [24], who reported that heterosis and combining ability are reliable scientific methods of proof of genetic distance/conformity, and with Xiao et al. [25], who reported that heterosis indicates the genetic relatedness of crossed materials.Inbreeding depression reveal a highly negative load in separating hybrids (Table 5), thus contributing to the selection of hybrids with a desirable load of genes.Morphophysiological and qualitative traits were also contributed in the clustering of hybrids and cv.s.For hybrids, leaf dimensions, internodes' length, and fruit traits, such as equatorial and polar diameter, ribbing at peduncle end, size of blossom scar, green shoulder before maturity, intensity of green color of shoulder, firmness, locule number, pericarp thickness, puffiness, soluble and total solid, and pH appeared to be the primary traits (Table 5).Similar traits were loaded onto the principal components for cv.s (Table 6).All diallel crosses of cv.Artemida produced highly heterotic products [21].Heterosis probably also exists due to different allelic combinations at particular loci in each parent so that when they brought together in a hybrid combination, they complement each other, resulting in the expression of heterosis [26].These loci may not directly relate to observable traits but could have an effect on the physiology of the plant.In autogamous species, such as the tomato, the genetic variance is expected to derive mainly from additive effects (Matzinger, referred by [9]).Heterosis may not be of direct interest, but heterotic crosses could produce desirable transgressive segregants.Usually, experimental evidence is needed, especially from the analysis of F 2 and subsequent generations [9].In this study, discrimination among tomato genotypes based on geographical origin, and evolutionary and selection processes, was successful in clustering into the same group the long established cv.Makedonia with the derivatives of the old hybrids Carmello and Dombo, that is, cv.s Areti and Olympia, respectively.This data supplies sufficient information to determine if heterosis is correlated with the geographical origin of the parents.
Comparing the two source groups (although each one had few representatives), it is obvious that (a) the two commercial hybrids, randomly selected, with the sole criterion being the preference of the growers in the Mediterranean environment, showed a close relationship in comparison to an unrelated hybrid and (b) the cv.s selected with the criterion of have been developed in national research stations, that is, a narrow environment, showed broader differentiation, especially cv.Artemida, which showed GCA consisting of a valuable aggregate for public or private use.The remarks above bring forward the contentious issue of where selection should be carried out.All views would agree that testing must include the target environment [27].Perhaps the use of superior germplasm by breeding strategies to increase yields combined with improved cultural practices at the same time would offer a potential solution to this problem [27].
In conclusion, the flux of parental material in any breeding programme (private or public) is based on a working strategy, known as the assessment of the continual turnover of the cv.s.As older parents retreat, new ones enter from locally adapted cv.s and recombinant lines resulting from the F 2 of elite hybrids.The whole phylogenetic study of relationships between hybrids or cv.s showed that in an autogamous species, such as the tomato, combining ability, yield components, and heterosis was sufficient to give information about the genetic relationship among hybrids or cv.s and elucidate the list of materials which may provide a route for developing elite breeding products.

Figure 1 :Figure 2 :
Figure 1: Dendrogram produced by cluster analysis based on DIST (average taxonomic distance) matrix for hybrids.

2 Figure 3 : 3 Figure 4 :
Figure 3: Two-dimensional plot based on correlation of each hybrid with the two principal components.

Table 1 :
Matrices of average taxonomic distance (above diagonal) and product moment correlation (below diagonal) for hybrids.
for hybrids (Figure1) and one for cv.s (Figure2), both based on DIST matrices.The hybrids' dendrogram indicated a close relationship between the hybrids Iron and Sahara, while the hybrid Theodora was grouped individually.The cv.s' dendrogram showed a close relationship between cv.s Olympia, Areti and Makedonia, the relationship between cv.Areti and cv.Makedonia being even closer.The cv. Artemida was grouped individually.

Table 2 :
Matrices of average taxonomic distance (above diagonal) and product moment correlation (below diagonal) for cultivars.

Table 3 :
Correlation of each hybrid with the two principal components.

Table 4 :
Correlation of each cultivar with the three principal components.

Table 5 :
Loadings of the traits onto two principal components for hybrids.
a Stability of performance (X/s).

Table 6 :
Loadings of the traits onto three principal components for cultivars.