The item response data is the
The purpose of computer simulation is to model a certain phenomenon or incident virtually in an attempt to predict the results of a real-life situation. It is both a cost effective and time saving method for testing “what-if” scenarios? [
In educational evaluation, simulation is used to estimate item parameters and the abilities of examinees [
The key to simulation studies is to model problems in real life as realistically as possible. Also, to guarantee the validity and reliability of simulation results, simulation data is more important than anything else [
As computation performance improves, computer science technology has been introduced into areas where simulation data is generated. To guarantee high quality in the software industry, testing software is required. To achieve this objective, we need to test software thoroughly with adequate test data. The automatic generation of a test suite and its adequacy are the key issues when testing a software product. Some studies have been conducted to show that the field of software testing used a genetic algorithm (hereinafter GA) to generate automated software test data and improve the performance of tests [
The GA is a computational model based on the evolutionary process seen in the natural world. It is a global optimization technique developed by John Holland in 1975 and one of the techniques for solving optimization problems. The GA models the evolution of life and the evolutionary mechanism using engineering methods and uses them for solving problems and learning systems.
In the field of educational evaluation, the GA is used for test-sheet composition [
The GA is known to be effective in finding the optimal solution for NP-hard problems, and it has been utilized in areas dealing with item data, but not for generating item response data. In other words, studies have not validated item response data so far, or the GA has not been utilized in the field of educational evaluation in some cases. Accordingly, this paper is trying to verify whether the item response data generated for simulation studies are similar to real item response data. Also, the GA is used for proposing a method of generating item response data. To this end, item response data, generated using Monte Carlo and the GA, was compared. Item difficulty and item discrimination, representing the characteristics of the item response data of real examinees, were used to generate item response data. To evaluate how similarly the generated item response data represents the real item response data model, the item difficulty and discrimination of the generated item response data were compared with those of the real item response data.
Item response data is that which shows whether the examinees responded correctly or incorrectly to the items making up a test sheet. Based on test theory, item response data is used to estimate item characteristics and the ability of examinees. According to test theory, tests are analyzed indirectly by measuring the latent trait of people (specifically the test takers) and the items making up the test [
According to classical test theory, analysis is conducted based on the total score of the test tools, with the assumption that the observed score of the test is composed of the true score and error score. Also, as the true score of examinees cannot be known, the mean of the scores, obtained by infinitely repeating theoretically identical tests for the same examinees, is used to presume the true score. Item difficulty and item difficulty according to classical test theory are as follows.
Item response theory does not analyze an item based on the total test score, but, as each has an invariable unique trait, it is a test theory that analyzes the item based on the item characteristic curve (hereinafter ICC) indicating this attribute. Therefore, in item response theory, one of the most important concepts is the item characteristic curve. The ICC is a curve indicating the probability of correctly answering an item as shown in Figure
Item characteristic curve.
In Figure
Item response data is used for simulation studies based on IRT. IRT is a test theory for measuring the characteristics (
In the IRT model
The Monte Carlo method is a concept contrary to deterministic algorithms. It is a sort of randomized algorithm that uses random numbers to calculate the value of a function [
Accordingly, Monte Carlo method is a method of approximating the desired solution or law by using random numbers to create data and synthesizing the manipulated results of sufficient numbers or random experiments when a certain problem is given. In the field of educational evaluation, Monte Carlo method is also used to generate item response data of examinees [
The GA is one of the techniques used for probabilistic investigation, learning, and optimization. It is based on two salient theories of genetics. One is Charles Darwin’s theory of survival of the fittest; that is, those individuals who adapt well to nature will survive, and those who do not will die out, and the other is Mendel’s law; that is, the traits of descendants are inherited from the genes received from both parents [
To search for the optimal solution, multiple individuals will be generated; that is, solutions will be randomly selected from the solution set. This is called the initial population that is searched in order to find a solution through iterative selection, crossover, and mutation. To select excellent individuals, the evaluation function is used to evaluate how identical each individual is to the desired solution. Selection methods include the roulette wheel selection method, the expected-value selection method, the ranking selection method, and the tournament selection method.
In a study looking for an optimal solution through a GA, how to represent the optimal solution to a problem as a single individual and determining the standard for measuring how suitable each individual is to the desired optimal solution (i.e., the definition for the evaluation function) are the most important problems. And as the parametric values may affect the results, values will be predetermined for the GA. The procedure for the GA, which has been explained so far, is shown below [
One of the potential solutions to a problem is randomly selected and becomes an individual, and several individuals combine to become the population.
Use the evaluation function to calculate the fitness value of one individual. If the number of iterations (
Increase the probability of selecting individuals with a high fitness value and sample with replacement to reproduce the population.
Turn the selected individuals into a population with new information through crossover and mutation.
Replace the population having the new information with the initial population; then go back to Step
The GA has been frequently used to search for optimal paths, integrate data mining techniques, and determine optimal input variables. Among them, the studies on test data generation for software testing generate simulation data. In order to reduce the cost of manual software testing and concurrently increase the reliability of the testing processes researchers have tried to automate it [
The GA is more efficient than random testing in generating test data. Their efficiency will be measured as the number of tests required to obtain full branch coverage [
In the field of educational evaluation, some studies use the GA for test sheet composition. First, Hwang et al. [
Second, Ou-Yang and Luo [
This study evaluated the validity of the algorithms for generating item response data by comparing the item response data generated for simulation studies with real item response data. To this end, first, the difficulty and discrimination of each item based on the real item response data of students were calculated and sorted into criteria for comparison. The real data used for this study is the item response data consisting of 36 items. A total of 7,624 people participated in the study. We used four approaches to generating item response data to verify the effectiveness of the GA: using only random method (hereinafter RA) to generate item response data; randomly generating initial population while using the GA (hereinafter GARA):
randomly generating the initial population and using the GA based on real item parameters to generate item response data; using only Monte Carlo (hereinafter MC) to generate item response data:
using MC based on actual item parameters to generate item response data; using the GA to generate the initial population with Monte Carlo (hereinafter GAMC):
using Monte Carlo based on real item parameters to generate item response data and then using the GA to generate item response data.
The four approaches to generating item response data are shown in Figure
The four approaches to generating item response data.
To check if the generated item response data is similar to real item response data, root mean square error (hereinafter RMSE) was used as the measure for difficulty and discrimination based on classical test theory, and Kullback-Leibler divergence (hereinafter KLD) was used as the measure for difficulty and discrimination based on item response theory. The evaluation method is described in detail in Section
The purpose of this paper is to generate item response data most similar to actual data. The first step of using the GA to find the optimal solution is to define the chromosome structure. As the optimal solution found by the GA is the item response data, it must be possible to express the item response data as a chromosome structure [
Item response data and chromosome structure design.
The item response data is based on the responses made by
The fitness function is a measure index applied to judge the quality of the generated item response data for the GA. Most studies, which apply GAs to an item response data, used item difficulty and item discrimination, that is, information on item characteristics, in applying the fitness function [
This study defined the fitness function as the sum of the discrimination error and difficulty error as follows:
In general, the size of the initial population varies depending on the complexity of the problem to solve. The initial population size in this study is set as 20 for the generation of a set of items response data. This study set the size of the initial population as 20. The two methods of setting initial chromosomes for experiments are
The algorithm for randomly generating the initial population accepts the number of items, the number of examinees, and the probability of answering an item correctly as input values. Random values are generated in a
Input: number of items ( Output: item response data ( (1) (2) (3) (4) (5) (6) else (7) (8)
The Monte Carlo method accepts the ability of examinees
Input: number of items ( examinees ( Output: item response data ( (1) (2) (3) (4) (5) (6) (7) else (8) (9)
The process of the GA is shown in Figure
Flowchart of a GA.
Selection operators model the phenomenon of natural selection; that is, well-adapted individuals survive and generate the next generation while ill-adapted individuals die out. Selections are made based on the fitness function, and though there are several selection methods, the basic principle is that individuals with a higher level of fitness will have more opportunities to be generated in the next generation. There are various selection methods that are widely used, such as the fitness proportionate selection method, the roulette wheel selection method, the expected-value selection method, the tournament selection method, and the elitist preservation selection [
Crossover is used to generate new individuals by partially exchanging chromosomes between two individuals. Therefore, generation of individuals with better solutions through crossover is expected. In general, crossover is done when individuals exchange some genes. Depending on the types of coded genes, crossover can be defined differently. This study adopted the multipoint crossover method. Multipoint crossover involves two or more intersections. Two columns are randomly set up as intersections. Figure
Crossover operation between two chromosomes.
As the population is generated repeatedly, the children will become similar to the population. As a result, even if crossover is conducted, new individuals may not be generated at times. Mutation makes up for the limitations of crossover. Mutation is used to apply a certain mutation probability to the genes of individuals and change the value of the alleles. It is a kind of local random search that generates new individuals. Specifically, mutation randomly picks a certain point in the chromosomes and changes its property. For example, if the selected value is 0, it will be changed to 1, and if it is 1, it will be changed to 0. This is illustrated in Figure
Mutation operation.
The classical test theory is a method of using examinees’ total scores to analyze items. The procedure is fairly straightforward and estimation can be calculated easily, but it has a weakness: depending on the characteristics of the group of examinees, the parameters of items (difficulty and discrimination) vary. According to item response theory, as the unique characteristics of an item are revealed because the estimated item parameters are unchanged due to the characteristics of the group of examinees, the precision of item analysis and estimation of examinees’ ability will be enhanced. Accordingly, this study used not only the item parameters based on classical test theory, but also item parameters based on the item response theory to measure the accuracy of the algorithm.
According to classical test theory and item response theory, item analysis was conducted with regard to the item response data derived by the four methods (RA, GARA, MC, and GAMC), and the difficulty and discrimination of each item were obtained. We compared the real item parameters and the generated item parameters and carried out two experiments to evaluate the results of generated item response data by four approaches.
As a method of comparing item parameters,
We adopted RMSE to compare item parameters based on classical test theory. RMSE is used when handling the difference between the estimated value (i.e., value predicted by a model) and the value observed in real-life situations. In RMSE, each difference is called a residual, and RMSE is used when residuals are synthesized with a single measure [
We adopted the KLD as a measure for comparing item parameters based on item response theory. As with the method based on classical test theory, RMSE may also be used for comparing the difference between the real value and the estimated value of item parameters based on item response theory. However, unlike item parameters based on classical test theory, item parameters based on item response theory include the ability of the examinees. Accordingly, rather than simply comparing real values and observed values, as it is possible to consider the ability of examinees when the probability distribution obtained by parameters is compared, a more accurate comparison will be possible. KLD is used to calculate the difference between the two probability distributions [
RMSE compares the value difference between two finite groups composed of discrete components. In other words, the size of the value of RMSE is proportionate to not only the difference of each element, but also the number of elements. As the ICC based on the item response is a continuous probability variable, however, it has infinite elements, so we need to convert it into a discrete random variable for computer calculations. For a discrete random variable to be similar to a continuous random variable, we may increase the sampling size, but the computation cost will increase proportionately. For this study we calculated KLD for 400 sample points at an interval of 0.02 between −4 and 4 to handle the ICC as a discrete random variable. As RMSE and KLD are different from each other in terms of sample size, the values obtained using respective measurement methods are used to compare relative differences from the standard value within respective measurement methods.
Table
Real classical test theory parameters and parameters obtained by RA, GARA, MC, and GAMC.
Item number | Real parameters | RA | GARA | MC | GAMC | |||||
---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| |
1 | 0.50 | 0.37 | 0.49 | 0.21 | 0.49 | 0.21 | 0.51 | 0.49 | 0.50 | 0.39 |
2 | 0.38 | 0.37 | 0.50 | 0.22 | 0.50 | 0.21 | 0.32 | 0.46 | 0.36 | 0.39 |
3 | 0.57 | 0.46 | 0.50 | 0.22 | 0.50 | 0.22 | 0.59 | 0.61 | 0.56 | 0.45 |
4 | 0.51 | 0.36 | 0.49 | 0.18 | 0.49 | 0.21 | 0.52 | 0.48 | 0.51 | 0.29 |
5 | 0.45 | 0.33 | 0.50 | 0.23 | 0.50 | 0.22 | 0.43 | 0.44 | 0.45 | 0.34 |
6 | 0.74 | 0.46 | 0.50 | 0.19 | 0.50 | 0.20 | 0.80 | 0.56 | 0.77 | 0.48 |
7 | 0.72 | 0.38 | 0.49 | 0.20 | 0.49 | 0.21 | 0.80 | 0.48 | 0.75 | 0.39 |
8 | 0.62 | 0.57 | 0.51 | 0.19 | 0.51 | 0.19 | 0.64 | 0.69 | 0.61 | 0.58 |
9 | 0.42 | 0.35 | 0.50 | 0.23 | 0.50 | 0.23 | 0.39 | 0.48 | 0.42 | 0.37 |
10 | 0.75 | 0.43 | 0.50 | 0.21 | 0.50 | 0.23 | 0.81 | 0.51 | 0.75 | 0.40 |
11 | 0.67 | 0.53 | 0.49 | 0.26 | 0.49 | 0.26 | 0.70 | 0.65 | 0.66 | 0.53 |
12 | 0.66 | 0.49 | 0.50 | 0.19 | 0.50 | 0.18 | 0.70 | 0.61 | 0.67 | 0.50 |
13 | 0.66 | 0.49 | 0.49 | 0.21 | 0.49 | 0.21 | 0.70 | 0.62 | 0.65 | 0.49 |
14 | 0.59 | 0.51 | 0.49 | 0.22 | 0.49 | 0.22 | 0.61 | 0.65 | 0.59 | 0.53 |
15 | 0.62 | 0.44 | 0.50 | 0.23 | 0.50 | 0.22 | 0.66 | 0.58 | 0.62 | 0.44 |
16 | 0.68 | 0.52 | 0.49 | 0.23 | 0.49 | 0.24 | 0.73 | 0.65 | 0.69 | 0.52 |
17 | 0.75 | 0.59 | 0.50 | 0.19 | 0.50 | 0.18 | 0.78 | 0.69 | 0.73 | 0.59 |
18 | 0.63 | 0.51 | 0.50 | 0.21 | 0.50 | 0.21 | 0.66 | 0.64 | 0.63 | 0.52 |
19 | 0.43 | 0.42 | 0.50 | 0.23 | 0.50 | 0.23 | 0.41 | 0.53 | 0.43 | 0.44 |
20 | 0.79 | 0.50 | 0.49 | 0.23 | 0.49 | 0.23 | 0.83 | 0.58 | 0.81 | 0.51 |
21 | 0.46 | 0.47 | 0.49 | 0.23 | 0.49 | 0.23 | 0.44 | 0.58 | 0.44 | 0.47 |
22 | 0.41 | 0.34 | 0.49 | 0.20 | 0.49 | 0.20 | 0.38 | 0.45 | 0.41 | 0.35 |
23 | 0.58 | 0.52 | 0.50 | 0.22 | 0.50 | 0.21 | 0.60 | 0.66 | 0.58 | 0.53 |
24 | 0.73 | 0.56 | 0.50 | 0.21 | 0.50 | 0.20 | 0.77 | 0.67 | 0.72 | 0.52 |
25 | 0.76 | 0.59 | 0.49 | 0.21 | 0.49 | 0.22 | 0.79 | 0.67 | 0.61 | 0.30 |
26 | 0.52 | 0.41 | 0.50 | 0.18 | 0.50 | 0.18 | 0.52 | 0.55 | 0.52 | 0.37 |
27 | 0.76 | 0.61 | 0.50 | 0.23 | 0.50 | 0.27 | 0.78 | 0.71 | 0.74 | 0.55 |
28 | 0.48 | 0.32 | 0.49 | 0.21 | 0.49 | 0.21 | 0.47 | 0.44 | 0.47 | 0.34 |
29 | 0.57 | 0.56 | 0.49 | 0.21 | 0.49 | 0.23 | 0.59 | 0.70 | 0.57 | 0.57 |
30 | 0.73 | 0.59 | 0.49 | 0.21 | 0.49 | 0.20 | 0.76 | 0.70 | 0.72 | 0.59 |
31 | 0.67 | 0.60 | 0.50 | 0.24 | 0.50 | 0.24 | 0.69 | 0.73 | 0.67 | 0.61 |
32 | 0.77 | 0.56 | 0.49 | 0.19 | 0.49 | 0.20 | 0.81 | 0.66 | 0.77 | 0.55 |
33 | 0.58 | 0.57 | 0.49 | 0.23 | 0.49 | 0.23 | 0.60 | 0.70 | 0.58 | 0.58 |
34 | 0.66 | 0.50 | 0.50 | 0.21 | 0.50 | 0.21 | 0.69 | 0.64 | 0.66 | 0.51 |
35 | 0.58 | 0.54 | 0.50 | 0.20 | 0.50 | 0.20 | 0.60 | 0.68 | 0.58 | 0.56 |
36 | 0.52 | 0.56 | 0.50 | 0.23 | 0.50 | 0.23 | 0.53 | 0.69 | 0.53 | 0.57 |
|
||||||||||
RMSE | 4.95 | 9.69 | 4.95 | 9.61 | 1.07 | 4.25 | 0.41 | 0.84 | ||
14.64 | 14.56 | 5.32 | 1.25 |
When we used Monte Carlo to set the initial population and applied GAs, difficulty was lowered from 1.07 to 0.41, and discrimination was reduced from 4.25 to 0.84. It turned out that GAs made these values converge on the real item parameters. In other words, given an arbitrary item response data, it can be confirmed that real item response data will be found through crossover and mutation.
In item response theory, we can derive the item characteristic curve (ICC) by using the difficulty and discrimination values. The item characteristics curve is a probability distribution that indicates the probability of examinees’ correctly answering items according to their abilities. The curve is determined by difficulty and discrimination. Figure
Real item response theory parameters and parameters obtained by RA, GARA, MC, and GAMC.
Item number | Real parameters | RA | GARA | MC | GAMC | |||||
---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| |
1 | 0.060 | 0.713 | 0.200 | 0.148 | 0.198 | 0.149 | 1.114 | 0.210 | 0.050 | 0.790 |
2 | 0.826 | 0.762 | 0.075 | 0.191 | 0.077 | 0.185 | 1.234 | 0.173 | 0.774 | 0.891 |
3 | −0.247 | 1.022 | 0.097 | 0.186 | 0.097 | 0.196 | 1.635 | 0.183 | −0.260 | 0.988 |
4 | −0.011 | 0.675 | 0.232 | 0.130 | 0.120 | 0.172 | 1.057 | 0.241 | −0.013 | 0.523 |
5 | 0.399 | 0.633 | 0.046 | 0.210 | 0.050 | 0.192 | 0.968 | 0.147 | 0.374 | 0.668 |
6 | −1.110 | 1.143 | 0.047 | 0.145 | 0.048 | 0.142 | 1.733 | 0.181 | −1.221 | 1.229 |
7 | −1.251 | 0.842 | 0.228 | 0.160 | 0.165 | 0.179 | 1.310 | 0.174 | −1.439 | 0.880 |
8 | −0.392 | 1.507 | −0.228 | 0.127 | −0.211 | 0.137 | 2.209 | 0.161 | −0.378 | 1.567 |
9 | 0.583 | 0.678 | 0.023 | 0.186 | 0.020 | 0.214 | 1.160 | 0.192 | 0.513 | 0.767 |
10 | −1.204 | 1.046 | 0.019 | 0.165 | −0.017 | 0.185 | 1.492 | 0.223 | −1.363 | 0.935 |
11 | −0.644 | 1.321 | 0.151 | 0.246 | 0.145 | 0.257 | 2.001 | 0.179 | −0.650 | 1.314 |
12 | −0.674 | 1.132 | 0.150 | 0.130 | 0.153 | 0.127 | 1.731 | 0.215 | −0.721 | 1.192 |
13 | −0.673 | 1.138 | 0.153 | 0.165 | 0.153 | 0.165 | 1.761 | 0.219 | −0.662 | 1.151 |
14 | −0.345 | 1.183 | 0.314 | 0.189 | 0.309 | 0.192 | 1.865 | 0.164 | −0.348 | 1.282 |
15 | −0.530 | 0.971 | 0.097 | 0.186 | 0.101 | 0.177 | 1.503 | 0.202 | −0.564 | 0.950 |
16 | −0.705 | 1.339 | 0.246 | 0.161 | 0.244 | 0.163 | 2.004 | 0.206 | −0.764 | 1.281 |
17 | −0.878 | 1.888 | 0.044 | 0.145 | 0.048 | 0.131 | 2.696 | 0.193 | −0.861 | 1.717 |
18 | −0.489 | 1.214 | 0.055 | 0.174 | 0.054 | 0.176 | 1.859 | 0.202 | −0.497 | 1.277 |
19 | 0.471 | 0.851 | 0.102 | 0.192 | 0.101 | 0.194 | 1.371 | 0.203 | 0.390 | 1.015 |
20 | −1.170 | 1.521 | 0.116 | 0.196 | 0.115 | 0.199 | 2.114 | 0.196 | −1.315 | 1.495 |
21 | 0.287 | 1.037 | 0.185 | 0.192 | 0.176 | 0.195 | 1.646 | 0.212 | 0.304 | 1.126 |
22 | 0.661 | 0.668 | 0.185 | 0.139 | 0.183 | 0.141 | 1.067 | 0.138 | 0.607 | 0.708 |
23 | −0.277 | 1.245 | −0.071 | 0.209 | −0.071 | 0.208 | 1.928 | 0.166 | −0.284 | 1.284 |
24 | −0.870 | 1.629 | 0.128 | 0.152 | 0.131 | 0.149 | 2.367 | 0.132 | −0.914 | 1.336 |
25 | −0.900 | 1.989 | 0.119 | 0.187 | 0.113 | 0.197 | 2.705 | 0.189 | −0.836 | 0.528 |
26 | −0.029 | 0.841 | 0.159 | 0.126 | 0.146 | 0.137 | 1.360 | 0.215 | −0.068 | 0.738 |
27 | −0.875 | 2.170 | 0.088 | 0.187 | 0.061 | 0.300 | 3.053 | 0.701 | −0.930 | 1.505 |
28 | 0.237 | 0.585 | 0.257 | 0.158 | 0.260 | 0.156 | 0.940 | 0.191 | 0.259 | 0.659 |
29 | −0.229 | 1.436 | 0.221 | 0.182 | 0.153 | 0.211 | 2.278 | 0.223 | −0.223 | 1.469 |
30 | −0.804 | 1.897 | 0.347 | 0.171 | 0.350 | 0.169 | 2.733 | 0.241 | −0.785 | 1.679 |
31 | −0.570 | 1.813 | 0.028 | 0.225 | 0.029 | 0.222 | 2.723 | 0.189 | −0.574 | 1.707 |
32 | −0.984 | 1.872 | 0.166 | 0.130 | 0.163 | 0.132 | 2.687 | 0.189 | −1.076 | 1.567 |
33 | −0.242 | 1.476 | 0.160 | 0.208 | 0.164 | 0.204 | 2.326 | 0.254 | −0.240 | 1.541 |
34 | −0.629 | 1.198 | 0.092 | 0.178 | 0.093 | 0.176 | 1.864 | 0.191 | −0.662 | 1.210 |
35 | −0.259 | 1.355 | −0.030 | 0.160 | −0.030 | 0.158 | 2.104 | 0.205 | −0.251 | 1.407 |
36 | −0.024 | 1.446 | 0.012 | 0.172 | 0.012 | 0.180 | 2.292 | 0.213 | −0.045 | 1.520 |
|
||||||||||
KLD | 810.08 | 779.37 | 88.29 | 55.97 |
ICC graph for item number 1.
Even if the GA is applied, the method of randomly setting the initial population was least accurate in both the classical test theory and item response theory. If the initial population is good in GAs, the probability of finding the optimal solution will increase [
Table
Variation of fitness function value.
Generation | GARA | GAMC |
---|---|---|
1 | 16.2200 | 5.2779 |
10 | 16.2180 | 2.9801 |
20 | 16.2157 | 2.0759 |
30 | 16.2124 | 1.9571 |
40 | 16.2108 | 1.8969 |
50 | 16.2072 | 1.8909 |
60 | 16.2051 | 1.8890 |
70 | 16.2024 | 1.8890 |
80 | 16.2011 | 1.8890 |
90 | 16.2008 | 1.8890 |
100 | 16.1974 | 1.8890 |
Fitness value change by generation.
As a matter of fact, the score distribution of examinees is a normal distribution, so there are many examinees around the mean, and the number of examinees diminishes as the total score increases or decreases. In the real item response data, there are examinees with diverse scores. However, if the population is randomly initialized, data will be uniformly distributed, and the uniform distribution will increase uncertainty. Saroj et al. [
The purpose of this study is to prove the effectiveness of the GA in generating item response data. To this end, we compared the item response data we generated using conventional Monte Carlo method with the item response data we generated using the GA. As comparison methods, we used RMSE for item parameters based on classical test theory and applied KLD to item parameters based on item response theory to compare the differences in the probability distribution.
The experiment results showed that the GA can be used to effectively create item parameters of generated item response data similar to real item parameters. Even though GAs are used, if the initial population is randomly set up, however, it was confirmed that the convergence speed is slow. As the random method does not guarantee the diversity of genes in two-dimensional item response data, the running cost for finding optimal solutions will increase. If the GA is applied for generating an item response data, it turned out to be most effective to set up the initial population with Monte Carlo and then apply the GA. In other words, the item response data, generated by the Monte Carlo, can be thought of as having gone through a process of seeking optimal solutions through the GA. This study found that we must use the GA to generate data similar to real item response data, but we must use Monte Carlo to generate the initial population. This study is meaningful in that we found that the GA contributes to generating more realistic data for simulation.
The authors declare that there is no conflict of interests regarding the publication of this paper.