This paper aims at estimating pathological subjects from a population through various physical information using genetic algorithm (GA). For comparison purposes,
Most problems come out in nature are usually represented by mathematical models. To analyze those problems arisen in various fields of science, mathematical modeling has been considered as an important tool. Advent of computers, producing algorithms, and progress in computer programming have made life easier in solving intricate problems of science. This is also the case in problems encountered in biomechanics. To make the best biomechanical decisions, medical prediction plays a very important role for health providers. Specifically, many researchers have concentrated on analysis of the knee motion and many methods were designed to describe the range of motion of it [
As signified in the literature [
Analysis of the tibial motion is usually difficult for medical points of view. Although it is natural to come across attractive studies realized in the literature, the pathological interval of the tibial rotations has not been optimized through the physical information yet. Even though the conventional methods encountered in the assessment of the tibial rotations are still among the attractive topics in the academic society [
This paper predicts pathological subjects from a population through various physical information using the GA. Even though it has been considered for comparison purposes, the KM clustering algorithm has also been developed for the prediction. The developed framework of the GA is successfully applied to medical prediction problems and has achieved superior classification performance to the other competitive counterpart, the KM clustering algorithm. Dataset consisting of some physical factors (age, weight, and height) and tibial rotation values was provided from the work of Sari and Cetiner [
In this study, dataset for healthy subjects was provided from the work of Sari and Cetiner [
Scatter plot of the data consisted of age, weight, and height parameters.
In the data, tibial rotation values of each subject consisting of 4 components were given as right tibial external rotation (RTER), right tibial internal rotation (RTIR), left tibial external rotation (LTER), and left tibial internal rotation (LTIR). The rotation values were divided into 3 types as Type 1, Type 2, and Type 3 according to whether they were pathological or not, as seen in Table
Type values of each rotation and number of subjects.
RTER | RTIR | LTER | LTIR | |
---|---|---|---|---|
Type 1 ( |
39 | 33 | 37 | 51 |
Type 2 (20°–65°) | 391 | 423 | 357 | 414 |
Type 3 (>65°) | 24 | 28 | 90 | 19 |
Clusters and number of subjects.
Age | Weight | Height | Number of subjects | |
---|---|---|---|---|
Cluster 1 | >30 | - | - | 52 |
Cluster 2 |
|
|
|
249 |
Cluster 3 |
|
>60 | >1.70 | 183 |
Number of types in each cluster for every rotation type.
Cluster 1 | Cluster 2 | Cluster 3 | Total | ||
---|---|---|---|---|---|
RTER | Type 1 | 0 | 17 | 22 | 39 |
Type 2 | 50 | 183 | 158 | 391 | |
Type 3 | 2 | 49 | 3 | 24 | |
Total | 52 | 249 | 183 | 484 | |
|
|||||
RTIR | Type 1 | 1 | 7 | 25 | 33 |
Type 2 | 48 | 223 | 152 | 423 | |
Type 3 | 3 | 19 | 6 | 28 | |
Total | 52 | 249 | 183 | 484 | |
|
|||||
LTER | Type 1 | 1 | 16 | 20 | 37 |
Type 2 | 47 | 160 | 150 | 357 | |
Type 3 | 4 | 73 | 13 | 90 | |
Total | 52 | 249 | 183 | 484 | |
|
|||||
LTIR | Type 1 | 3 | 14 | 34 | 51 |
Type 2 | 49 | 218 | 147 | 414 | |
Type 3 | 0 | 17 | 2 | 19 | |
Total | 52 | 249 | 183 | 484 |
The pragmatic aim of this paper is to predict pathological subjects from a population through various physical information (age, weight, and height) using the GA. As the GA clustering is of the mentioned advantages like flexibility and no need for assumption, it has been preferred for the trustworthy data processing in this study. Additionally, the KM clustering algorithm has also been used to decide which one is better in the prediction. Thence, this study keeps the light on capability of the GA in predicting pathological subjects based on the existing data by exploring the links between the inputs and outputs. Since the GA has been implemented for the first time for clustering in the prediction of subjects that they are either pathological or not, this study is believed to be a very significant contribution.
Darwin’s theory of evolution has been a source of inspiration for many researchers in various disciplines. Many evolutionary algorithms have been developed using fundamental terms such as gene, natural selection, crossover, and mutation that Darwin put forward in his theory. One of the most important of these evolutionary algorithms is genetic algorithms (GAs). First, Goldberg and Holland [
Flow diagram of the GA.
For the solution space, random chromosomes with genes are created. The number of chromosomes generated for the solution indicates the size of the population. For example, the cluster of
The display of gene, chromosome, and population.
This step is the first step in which the principle of survival of the strong one begins to be implemented. At this stage, individuals are created to match each other in the future. The strongest candidates are determined according to the fitness values. According to the purpose of this algorithm, these candidates match each other and produce the highest quality of the generation. At the simplest level, if the problem is maximization, the individuals with the greatest fitness value are taken. Conversely, if the problem is minimization, this time and the individuals with the smallest fitness value are taken. The population of these individuals is called the transition population.
At this stage, a new generation is produced. High-quality individuals selected from natural selection are considered as parents and these individuals are matched to create new individuals. This mapping is created by replacing each individual gene sequence in each individual chromosome with each other. This process is called crossover. As an example, the second genes of Chromosome 1 and Chromosome 2 which have 4 genes will be matched and new individuals will be produced. This matching is illustrated in Figure
Sample of a crossover.
Sometimes, some genes may remain the same even if matching has repeatedly been carried out in the individuals to be matched. This situation prevents the formation of different individuals. So, it may not deliver the best solution. Although the probability of occurrence of this situation is very low, to prevent problems due to this situation, a very small change can be made in a gene of the created individuals. Thus, different individuals occur and future generations also become different. Two examples of mutations are shown in Figure
Examples of mutation on binary code and real code.
As can be seen from the figure, the mutations made in the binary codes are a general reverse translation process. This converts 0 to 1 or 1 to 0. This means that mutations in binary code can make a big difference in terms of gene diversity. When looking at real coded chromosomes, very small changes are made in the genes, depending on their value. The effect obtained with very small spins in the real code is equivalent to the large effect in the binary code.
Creating initial population, selecting strong individuals from this population (natural selection process), and creating high-quality generation by matching these strong individuals each other (crossover), the process of eliminating the problem of producing the same generation from similar genes (mutation) is repeated in each iteration. It is aimed at producing a better generation as a result of each iteration. When the specified number of iterations is reached, the algorithm is terminated and the optimum value is found.
The GA does not circulate at all points in solution space. In all steps, it cannot travel every point because it has randomness as in nature. The GA tries to predict the best by improving the randomly determined population. More details on the GA can be found, for instance, in [
The GA have been implemented for solving problems in many fields ranging from medical applications [
different individuals);
The GA investigates for the optimal solution together with its own processes like selection, crossover, and mutation. For clustering, the optimum solution is searched as many as the number of clusters. The distance is based on those optimum solutions. The optimum solutions are then considered to be cluster centers. The issue of finding center required in clustering algorithms is sorted out by using the GA. Although one encounters various GA clustering examples in the literature for different problems [
The
As clustering-based algorithm is based on the points that are the closest to each other, an objective function must be already given in the KM approach and thus the problem will be a minimization problem. The Euclidean distance is used in the algorithm as follows [
In this study, each one of all rotation values RTER, RTIR, LTER, and LTIR is divided into three types as Type 1, Type 2, and Type 3. For all types, success of Cluster 1, Cluster 2, and Cluster 3 has been observed.
For example, Type 1 values for RTER are 0, 17, and 22 for Cluster 1, Cluster 2, and Cluster 3, respectively. So, there are 39 subjects in total. These are 0.00%, 43.59%, 56.41%, respectively, as the percentage values from Table
Real cluster values and percentages of all tibial rotation types.
Real | |||||||||
---|---|---|---|---|---|---|---|---|---|
Cluster 1 | Percent (%) | Cluster 2 | Percent (%) | Cluster 3 | Percent (%) | Total | Percent (%) | ||
RTER | Type 1 | 0 |
|
17 |
|
22 |
|
39 |
|
Type 2 | 50 |
|
183 |
|
158 |
|
391 |
|
|
Type 3 | 2 |
|
49 |
|
3 |
|
54 |
|
|
|
|||||||||
RTIR | Type 1 | 1 |
|
7 |
|
25 |
|
33 |
|
Type 2 | 48 |
|
223 |
|
152 |
|
423 |
|
|
Type 3 | 3 |
|
19 |
|
6 |
|
28 |
|
|
|
|||||||||
LTER | Type 1 | 1 |
|
16 |
|
20 |
|
37 |
|
Type 2 | 47 |
|
160 |
|
150 |
|
357 |
|
|
Type 3 | 4 |
|
73 |
|
13 |
|
90 |
|
|
|
|||||||||
LTIR | Type 1 | 3 |
|
14 |
|
34 |
|
51 |
|
Type 2 | 49 |
|
218 |
|
147 |
|
414 |
|
|
Type 3 | 0 |
|
17 |
|
2 |
|
19 |
|
If all these evaluations are done for the GA by considering RTER again, the GA has found them to be 0, 17, and 22 that real values of Cluster 1, Cluster 2, and Cluster 3 for Type 2 are 0, 17, and 22, respectively. So, that is 100.00% success as seen from Table
Results of the GA for all tibial rotation types.
GA | |||||||||
---|---|---|---|---|---|---|---|---|---|
Cluster 1 | Percent (%) | Cluster 2 | Percent (%) | Cluster 3 | Percent (%) | Total | Percent (%) | ||
RTER | Type 1 | 0 |
|
17 |
|
22 |
|
39 |
|
Type 2 | 30 |
|
205 |
|
156 |
|
391 |
|
|
Type 3 | 1 |
|
48 |
|
5 |
|
54 |
|
|
|
|||||||||
RTIR | Type 1 | 1 |
|
2 |
|
30 |
|
33 |
|
Type 2 | 59 |
|
226 |
|
138 |
|
423 |
|
|
Type 3 | 2 |
|
21 |
|
5 |
|
28 |
|
|
|
|||||||||
LTER | Type 1 | 1 |
|
13 |
|
23 |
|
37 |
|
Type 2 | 38 |
|
161 |
|
158 |
|
357 |
|
|
Type 3 | 1 |
|
75 |
|
14 |
|
90 |
|
|
|
|||||||||
LTIR | Type 1 | 1 |
|
17 |
|
33 |
|
51 |
|
Type 2 | 9 |
|
222 |
|
183 |
|
414 |
|
|
Type 3 | 0 |
|
17 |
|
2 |
|
19 |
|
Results of the KM clustering for all tibial rotation types.
KM | |||||||||
---|---|---|---|---|---|---|---|---|---|
Cluster 1 | Percent (%) | Cluster 2 | Percent (%) | Cluster 3 | Percent (%) | Total | Percent (%) | ||
RTER | Type 1 | 0 |
|
2 |
|
37 |
|
39 |
|
Type 2 | 42 |
|
207 |
|
142 |
|
391 |
|
|
Type 3 | 4 |
|
43 |
|
7 |
|
54 |
|
|
|
|||||||||
RTIR | Type 1 | 0 |
|
1 |
|
32 |
|
33 |
|
Type 2 | 36 |
|
263 |
|
124 |
|
423 |
|
|
Type 3 | 2 |
|
23 |
|
3 |
|
28 |
|
|
|
|||||||||
LTER | Type 1 | 1 |
|
1 |
|
35 |
|
37 |
|
Type 2 | 36 |
|
242 |
|
79 |
|
357 |
|
|
Type 3 | 2 |
|
67 |
|
21 |
|
90 |
|
|
|
|||||||||
LTIR | Type 1 | 1 |
|
3 |
|
47 |
|
51 |
|
Type 2 | 36 |
|
251 |
|
127 |
|
414 |
|
|
Type 3 | 0 |
|
19 |
|
0 |
|
19 |
|
As in all optimization algorithms, the GA requires large number of elements to be able to produce accurate results. The real value of RTIR-Type 2 is 423. From these data, 48 subjects belong to Cluster 1, 223 subjects belong to Cluster 2, and 152 subjects belong to Cluster 3. In percent, Cluster 1, Cluster 2, and Cluster 3 are 11.35%, 52.72%, and 35.93%, respectively. The KM has produced these values as 36, 263, and 124; in percent, they are as follows: 8.51%, 62.18%, and 29.31%. The real RTIR-Type 2 has Cluster 1 value of 48 and a KM value of 36. It has been found to be 8.51%, while the real one is 11.35%, with the accuracy rate of 74.98. Yet, the KM has been found to be 263 (62.18%) and 124 (29.31%) for Cluster 2 and Cluster 3, respectively. Again, to evaluate the accuracy percentage, the real Cluster 2 value is 52.72% while the KM is found to be 62.18%. This is of accuracy rate 84.79%. In the same way, the real value of Cluster 3 is 35.93% while the value for the KM is 29.31%. Again, the accuracy rate is 81.58%.
If the same considerations are made for the GA, the RTIR-Type 2 values have been found to be 59, 226, and 138 for Cluster 1, Cluster 2, and Cluster 3, respectively. The produced values of the GA for the clusters are 13.95%, 53.43%, and 32.62%, respectively. As seen in Table
If all values are recovered, for the GA, accuracy rate of Cluster 1 for RTIR-Type 2 is 81.36% while it is 74.98% for the KM for the same parameters (see Table
Comparison of the GA and the KM rates.
Cluster 1 | Cluster 2 | Cluster 3 | |||||
---|---|---|---|---|---|---|---|
GA | KM | GA | KM | GA | KM | ||
RTER | Type 1 | - | - |
|
11.77 |
|
59.46 |
Type 2 | 59.97 |
|
|
88.40 |
|
89.88 | |
Type 3 | 50.00 | 50.00 |
|
87.76 |
|
42.90 | |
|
|||||||
RTIR | Type 1 |
|
- |
|
14.29 |
|
78.13 |
Type 2 |
|
74.98 |
|
84.79 |
|
81.58 | |
Type 3 | 66.67 | 66.67 |
|
82.62 |
|
50.02 | |
|
|||||||
LTER | Type 1 |
|
100.00 |
|
6.24 |
|
57.15 |
Type 2 |
|
76.54 |
|
66.12 |
|
52.68 | |
Type 3 | 25.00 |
|
|
91.79 |
|
61.94 | |
|
|||||||
LTIR | Type 1 | 33.33 | 33.33 |
|
21.42 |
|
72.34 |
Type 2 | 18.33 |
|
|
86.85 | 80.32 |
|
|
Type 3 | - | - |
|
89.47 |
|
- |
The accuracy rates are compared in Table
As an example, in Table
For a long time, the GA has been used as a very powerful algorithm in various problems of science. To the best knowledge of the authors, in the current paper the GA has been applied to the tibial rotation for the first time. It was tested if it would be successful in the field as is the case in a large kind of problems. The GA has been seen to produce very effective results in predicting the tibial rotation types through the physical information. The application to the current problem helps health providers to predict the type of the rotation, that is, pathological or nonpathological.
Clustering success was targeted by dividing each one of the rotation values RTER, RTIR, LTER, and LTIR into pathological (Type 1 and Type 3) or nonpathological (Type 2) classes. In the present problem, the number of clusters for the genetic algorithm is given by the user. Subjects are divided into 3 clusters (Cluster 1, Cluster 2, and Cluster 3) by considering age and weight parameters. Taking into consideration these values, the effect of physical information on the tibial rotations has been investigated. Then the results of the GA have been compared with the results of the KM clustering algorithm. In case of large of number of subjects, it has strikingly been seen that the GA has been found to be far more effective than the KM clustering algorithm for optimizing correctly the current tibial problem. It is noticeable that the dataset is consisting of subjects mostly younger than 30 years old; the current study may not be very decisive enough for that subjects who are older than 30.
This paper has predicted pathological subjects from a population through various physical information using the genetic algorithm. Unlike traditional approaches, the GA has thus accomplished to predict the types of the tibial rotation through several physical factors: age, weight, and height. Since the real values of each rotation type are known, the results of both the GA and the KM clustering algorithm are compared with these actual values. The clustering with the GA has been done for the first time in the prediction of tibial rotations. The simulation results have proven the superiority of the GA over the other competitive counterpart, the KM clustering algorithm. The GA has been seen to be very successful on optimizing the tibial rotation data assessments with many subjects even though the KM algorithm has similar effect with the GA in clustering with a small number of subjects. It has been concluded that findings are clinically expected to be very useful for health providers in organizing proper treatment programs for patients. For future research, this study could be divided into more clusters depending on the structure of the data but the structure of the current dataset is limited to have more clusters from medical point of view. In the forthcoming works, more clusterable and thus more illustrative results may be found with various datasets.
The authors declare that there are no conflicts of interest regarding the publication of this paper.