Evaluating measurement equivalence (also known as differential item functioning (DIF)) is an important part of the process of validating psychometric questionnaires. This study aimed at evaluating the multiple indicators multiple causes (MIMIC) model for DIF detection when latent construct distribution is nonnormal and the focal group sample size is small. In this simulationbased study, Type I error rates and power of MIMIC model for detecting uniformDIF were investigated under different combinations of reference to focal group sample size ratio, magnitude of the uniformDIF effect, scale length, the number of response categories, and latent trait distribution. Moderate and high skewness in the latent trait distribution led to a decrease of 0.33% and 0.47% power of MIMIC model for detecting uniformDIF, respectively. The findings indicated that, by increasing the scale length, the number of response categories and magnitude DIF improved the power of MIMIC model, by 3.47%, 4.83%, and 20.35%, respectively; it also decreased Type I error of MIMIC approach by 2.81%, 5.66%, and 0.04%, respectively. This study revealed that power of MIMIC model was at an acceptable level when latent trait distributions were skewed. However, empirical Type I error rate was slightly greater than nominal significance level. Consequently, the MIMIC was recommended for detection of uniformDIF when latent construct distribution is nonnormal and the focal group sample size is small.
In recent years, the use of differential item functioning (DIF) has also been referred to as measurement equivalence, which has been widely used to validate psychological assessment instruments such as quality of life [
Several methods have been developed for identifying DIF in test items. All DIF detection methods fall under the parametric and nonparametric methods. MantelHaenszel, standardization, and simultaneous item bias test are important nonparametric methods while item response theory, logistic and ordinal logistic regression, multiplegroup analysis, and multiple indicators multiple causes (MIMIC) are important parametric methods for DIF testing [
Multiplegroup analysis and MIMIC are two approaches of structural equation modeling, which have been widely used to assess DIF by many applied researches [
Previous simulation studies have investigated some MIMIC model properties in DIF detection including the structure of data, the scale of response categories, DIF pattern, differences in mean, and variance of latent trait distribution [
Typically, in DIF testing of medical studies, two groups are assumed to be labeled as the reference and focal groups, where patients are often placed in the latter group. A common problem in medical and psychological studies is the small sample size, particularly in the focal group, where access to patients or rare disease patients is difficult. Furthermore, the small sample size prevents wasting of time and money [
Skewness of latent trait distribution, also referred to as latent construct, is an important point that needs to be considered in DIF detection [
A Monte Carlo simulation study is an essential tool for assessing the behavior of MIMIC model under various conditions. This study is the first simulationbased investigation to assess MIMIC model for DIF detection, when latent construct distribution is nonnormal and focal sample size is small. We have discussed the advantages and disadvantages through a series of simulations.
Two types of DIF can be identified and denoted as uniform and nonuniform [
UniformDIF detection with MIMIC model is performed with regressing potential DIF items and latent variable (
A MIMIC model for detecting uniformDIF in 5item scale. Rectangles are observed variables; circles are latent variables;
Ordinal responses were generated from the graded response model (GRM) [
In this study, we have assumed two groups that were labeled as reference and focal groups. The five factors in this simulation study were investigated: reference to focal group sample size ratio, magnitude of the uniformDIF effect, scale length, the number of response categories, and latent trait distribution.
Sample size ratio between the reference and focal groups was set at R100/F100, R200/F100, R300/F100, R400/F100, and R500/F100. Medium and severe uniformDIF were also simulated by adding 0.5 and 1 to
Distributions intended for latent trait in the reference and focal groups.
Condition  Ability distribution  

Reference group  Focal group  
1 

Beta (1, 4) 
2 

Beta (0.5, 4) 
3 

Beta (4, 1) 
4 

Beta (4, 0.5) 
5  Beta (1, 4)  Beta (1, 4) 
6  Beta (0.5, 4)  Beta (0.5, 4) 
7  Beta (4, 1)  Beta (4, 1) 
8  Beta (4, 0.5)  Beta (4, 0.5) 
9  Beta (1, 4)  Beta (4, 1) 
10  Beta (0.5, 4)  Beta (4, 0.5) 
11  Beta (4, 1)  Beta (1, 4) 
12  Beta (4, 0.5)  Beta (0.5, 4) 
13 


In total, we generated 780 (
Nonconvergence situation is one of the most common problems during estimation in MIMIC model. The small sample size, not positive definite matrices, and out of bounds estimates are three important causes of nonconvergence situation in MIMIC model [
Statistical power is defined by the ratio of the number of times DIF was correctly identified by MIMIC method across replications. For calculating the power, we have assumed that item 1 has uniformDIF. The Type I error rate, also referred to as false positives, was assessed by the proportion of times that DIF was incorrectly identified in the 1000 replications [
The CatIrt and Lavaan packages in R version 3.21 software were used to generate data from GRM model and fitting MIMIC model for DIF testing, respectively [
By increasing the sample size, the power of MIMIC model was systematically increased; however, there was no pattern in Type I error. The results of this study showed that when latent trait distribution in the reference group was the standard normal or latent trait distribution in the reference and focal groups was the same, a sample size of 500 for graded items with 3 ordered categories of response (R400/F100) and 300 for items with 5 and 7 categories of response (R200/F100) suffices. Refer to Tables
When other circumstances stayed fixed, increase in the magnitude of DIF led to improved MIMIC model power in detecting uniformDIF: 20.35% in total and 24.28% and 16.42% in increasing the magnitude of DIF from medium to severe in 5item and 10item scales, respectively. In such situation, Type I error did not change significantly.
Increasing the scale length from 5 to 10 items caused an increase of approximately 3.47% in the power of MIMIC model for detecting uniformDIF. According to our results, increase in the number of items from 5 to 10 led to improvement of the MIMIC model power for detecting medium uniformDIF: 6.79% in total, 8.78% in 3point response scale, 5.90% in 5point response scale, and 5.71% in 7point response scale. In this situation, Type I error rate was changed slightly (2.76%).
Increase in the number of items from 5 to 10 led to decreased Type I error rate of MIMIC model for detecting severe uniformDIF: 2.87% in total, 1.56% in 3point response scale, 1.01% in 5point response scale, and 6.03% in 7point response scale. In this circumstance, the power was changed about 0.15%.
When other conditions remained constant, increase in the number of response categories led to improved MIMIC model power in detecting uniformDIF: 4.83% in total and 5.66%, 1.52%, and 7.33% in increasing the number of response categories from 3 to 5, from 5 to 7, and from 3 to 7, respectively.
Simultaneously, when other conditions were fixed, increasing the number of response categories led to a decrease in the Type I error MIMIC model in detecting uniformDIF: 5.66% in total and 2.73%, 5.47%, and 8.80% in increasing the number of response categories from 3 to 5, from 5 to 7, and from 3 to 7, respectively.
Skewness in the latent trait distribution led to a slight change in the magnitude of Type I and power of MIMIC model for detecting uniformDIF. When latent trait distributions were normal (condition 13), moderate (conditions 1, 3, 5, 7, 9, and 11), and highly skewed (conditions 2, 4, 6, 8, 10, and 12), mean powers of MIMIC model to detect uniformDIF were 0.920, 0.917, and 0.915; with Type I error, they were 0.054, 0.059, and 0.069, respectively. When latent trait distributions were normal, moderate, and highly skewed, mean powers of MIMIC model to detect medium uniformDIF were 0.842, 0.837, and 0.835; with Type I error, they were 0.054, 0.059, and 0.069, respectively. When latent trait distributions were normal, moderate, and highly skewed, mean powers of MIMIC model to detect severe uniformDIF were 0.998, 0.997, and 0.995; with Type I error, they were 0.054, 0.060, and 0.069, respectively.
In most scenarios, when latent trait in the reference group was normal distribution or latent trait distribution in the reference and focal groups was the same (all conditions except 10 and 12), Type I error was less than 0.06 and power of MIMIC model was at an acceptable level (greater than 80%). Therefore, we can conclude that MIMIC model had a robust to skewness in latent trait. In conditions 10 and 12, when latent trait distribution in one group was highly positively skewed and in another group was highly negatively skewed or vice versa. MIMIC model was at its lowest power and the greatest Type I error in discovering uniformDIF.
We performed all 390 different scenarios’ simulation for the small magnitude of DIF (magnitude of DIF was 0.25). Under the best circumstances, when we had larger sample size (R500/F100), the 10item scale, severe uniformDIF, 7point ordinal responses, and the latent trait distribution in both groups were normal, and power and Type I error were 0.489 and 0.055, respectively. So given that the MIMIC model was not appropriate for detecting small uniformDIF, we refrained from describing the results.
All 1000 replications met the convergence criteria when latent trait distribution had a normal or skewed distribution. In all scenarios, goodnessoffit indices such as Root Mean Square Error of Approximation (RMSEA), Root Mean squared Residual (RMR), Tucker–Lewis Index (TLI), Comparative Fit Index (CFI), and GoodnessofFit Index (GFI) were in an acceptable level. Space management prevented us from presenting the results for goodness of fit for all the simulation in detail.
Tables
The mean of Type I error rates and power of MIMIC model for detecting medium uniformDIF in 5item scale.
Distributions for 
3 categories  5 categories  7 categories  Distributions for 
3 categories  5 categories  7 categories  

Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  
R100/F100  R: N (0, 1) 
0.637  0.086  0.688  0.065  0.755  0.057  R: N (0, 1) 
0.615  0.061  0.711  0.057  0.719  0.064 
R200/F100  0.701  0.051  0.800  0.061  0.837  0.041  0.699  0.085  0.812  0.076  0.853  0.064  
R300/F100  0.750  0.057  0.857  0.052  0.861  0.055  0.747  0.080  0.837  0.064  0.876  0.070  
R400/F100  0.794  0.062  0.866  0.047  0.886  0.046  0.786  0.069  0.855  0.056  0.890  0.060  
R500/F100  0.784  0.065  0.889  0.056  0.917  0.057  0.789  0.075  0.886  0.069  0.901  0.041  


R100/F100  R: N (0, 1) 
0.638  0.058  0.698  0.059  0.759  0.060  R: N (0, 1) 
0.658  0.063  0.744  0.061  0.790  0.058 
R200/F100  0.748  0.059  0.845  0.064  0.874  0.055  0.779  0.075  0.833  0.074  0.873  0.072  
R300/F100  0.809  0.063  0.884  0.060  0.887  0.064  0.801  0.057  0.874  0.060  0.905  0.058  
R400/F100  0.831  0.044  0.871  0.047  0.916  0.058  0.808  0.069  0.894  0.059  0.911  0.072  
R500/F100  0.819  0.053  0.910  0.060  0.922  0.040  0.846  0.064  0.910  0.059  0.926  0.056  


R100/F100  R: Beta (1, 4) 
0.616  0.053  0.725  0.050  0.710  0.057  R: Beta (0.5, 4) 
0.612  0.045  0.690  0.063  0.723  0.067 
R200/F100  0.729  0.047  0.820  0.056  0.866  0.069  0.738  0.050  0.825  0.056  0.832  0.054  
R300/F100  0.752  0.046  0.858  0.046  0.860  0.048  0.757  0.046  0.852  0.057  0.876  0.046  
R400/F100  0.775  0.057  0.875  0.050  0.877  0.051  0.775  0.051  0.862  0.053  0.882  0.039  
R500/F100  0.815  0.050  0.883  0.046  0.899  0.035  0.785  0.044  0.878  0.050  0.895  0.042  


R100/F100  R: Beta (4, 1) 
0.639  0.049  0.731  0.061  0.750  0.057  R: Beta (4, 0.5) 
0.663  0.044  0.743  0.044  0.772  0.062 
R200/F100  0.761  0.053  0.842  0.057  0.870  0.066  0.754  0.054  0.865  0.063  0.848  0.041  
R300/F100  0.775  0.053  0.893  0.050  0.886  0.046  0.811  0.057  0.902  0.058  0.905  0.057  
R400/F100  0.802  0.048  0.888  0.055  0.907  0.050  0.805  0.057  0.907  0.046  0.923  0.039  
R500/F100  0.833  0.051  0.895  0.059  0.938  0.045  0.845  0.046  0.908  0.053  0.933  0.047  


R100/F100  R:Beta (1, 4) 
0.608  0.073  0.723  0.054  0.739  0.069  R: Beta (0.5, 4) 
0.650  0.089  0.709  0.070  0.735  0.080 
R200/F100  0.748  0.069  0.835  0.060  0.869  0.070  0.738  0.115  0.848  0.079  0.836  0.062  
R300/F100  0.767  0.071  0.856  0.072  0.871  0.058  0.779  0.120  0.868  0.084  0.900  0.074  
R400/F100  0.778  0.083  0.875  0.079  0.897  0.075  0.784  0.114  0.866  0.084  0.902  0.058  
R500/F100  0.836  0.077  0.877  0.074  0.920  0.055  0.809  0.101  0.878  0.092  0.909  0.084  


R100/F100  R: Beta (4, 1) 
0.582  0.054  0.685  0.073  0.711  0.071  R: Beta (4, 0.5) 
0.587  0.078  0.654  0.080  0.699  0.080 
R200/F100  0.708  0.085  0.803  0.074  0.839  0.074  0.685  0.100  0.769  0.084  0.803  0.062  
R300/F100  0.724  0.075  0.845  0.062  0.868  0.055  0.716  0.103  0.819  0.106  0.853  0.070  
R400/F100  0.754  0.091  0.858  0.062  0.881  0.050  0.728  0.119  0.854  0.082  0.857  0.070  
R500/F100  0.763  0.087  0.872  0.073  0.902  0.050  0.744  0.131  0.849  0.096  0.885  0.069  


R100/F100  R: N (0, 1) 
0.609  0.058  0.708  0.055  0.770  0.066 

R200/F100  0.705  0.052  0.828  0.049  0.855  0.053  
R300/F100  0.785  0.060  0.852  0.057  0.896  0.048  
R400/F100  0.830  0.045  0.878  0.067  0.895  0.043  
R500/F100  0.803  0.060  0.881  0.062  0.900  0.046 
The mean of Type I error rates and power of MIMIC model for detecting severe uniformDIF in 5item scale.
Distributions for 
3 categories  5 categories  7 categories  Distributions for 
3 categories  5 categories  7 categories  

Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  
R100/F100  R: N (0, 1) 
0.968  0.084  0.994  0.064  0.998  0.057  R: N (0, 1) 
0.963  0.060  0.992  0.055  0.991  0.063 
R200/F100  0.982  0.055  0.998  0.061  0.999  0.042  0.980  0.084  1.000  0.076  1.000  0.065  
R300/F100  0.991  0.056  1.000  0.052  1.000  0.055  0.986  0.078  1.000  0.065  1.000  0.070  
R400/F100  0.990  0.062  0.998  0.046  1.000  0.046  0.979  0.069  0.999  0.054  1.000  0.059  
R500/F100  0.989  0.066  1.000  0.056  1.000  0.059  0.987  0.074  0.999  0.070  1.000  0.041  


R100/F100  R: N (0, 1) 
0.990  0.059  0.997  0.058  1.000  0.061  R: N (0, 1) 
1.000  0.065  0.997  0.061  0.998  0.058 
R200/F100  0.999  0.059  1.000  0.062  1.000  0.057  0.999  0.073  1.000  0.074  1.000  0.071  
R300/F100  0.998  0.063  1.000  0.059  1.000  0.063  0.999  0.057  1.000  0.060  1.000  0.058  
R400/F100  0.999  0.045  1.000  0.047  1.000  0.059  0.999  0.069  1.000  0.060  1.000  0.072  
R500/F100  0.998  0.052  1.000  0.060  1.000  0.040  1.000  0.064  1.000  0.060  1.000  0.056  


R100/F100  R: Beta (1, 4) 
0.972  0.049  0.997  0.048  0.998  0.056  R: Beta (0.5, 4) 
0.966  0.042  0.989  0.063  0.997  0.067 
R200/F100  0.990  0.047  0.999  0.057  0.999  0.067  0.987  0.050  0.998  0.056  0.996  0.055  
R300/F100  0.990  0.050  0.999  0.046  1.000  0.048  0.986  0.046  0.999  0.056  1.000  0.046  
R400/F100  0.998  0.057  1.000  0.052  1.000  0.052  0.986  0.051  0.999  0.054  0.999  0.038  
R500/F100  0.997  0.050  0.999  0.046  1.000  0.035  0.984  0.042  1.000  0.047  1.000  0.042  


R100/F100  R: Beta (4, 1) 
0.988  0.048  0.998  0.061  0.998  0.058  R: Beta (4, 0.5) 
0.986  0.046  0.999  0.044  0.997  0.061 
R200/F100  0.998  0.052  1.000  0.058  1.000  0.067  0.996  0.056  1.000  0.062  1.000  0.041  
R300/F100  0.997  0.055  1.000  0.049  0.999  0.045  0.999  0.057  1.000  0.058  1.000  0.056  
R400/F100  0.998  0.048  1.000  0.055  1.000  0.050  0.998  0.057  1.000  0.046  1.000  0.039  
R500/F100  0.998  0.051  1.000  0.059  1.000  0.045  0.996  0.047  1.000  0.052  1.000  0.047  


R100/F100  R: Beta (1, 4) 
0.991  0.071  0.996  0.054  0.999  0.069  R: Beta (0.5, 4) 
0.994  0.092  0.997  0.074  0.999  0.08 
R200/F100  0.998  0.069  1.000  0.060  1.000  0.071  0.998  0.114  1.000  0.080  1.000  0.062  
R300/F100  0.998  0.071  0.999  0.072  1.000  0.059  1.000  0.121  0.999  0.084  1.000  0.073  
R400/F100  1.000  0.082  1.000  0.077  1.000  0.076  0.999  0.114  1.000  0.084  1.000  0.056  
R500/F100  1.000  0.077  1.000  0.074  1.000  0.055  1.000  0.103  1.000  0.092  1.000  0.083  


R100/F100  R: Beta (4, 1) 
0.958  0.054  0.992  0.073  0.993  0.070  R: Beta (4, 0.5) 
0.939  0.079  0.986  0.081  0.995  0.077 
R200/F100  0.981  0.084  0.999  0.076  1.000  0.075  0.966  0.099  0.996  0.086  0.999  0.06  
R300/F100  0.978  0.076  0.999  0.063  1.000  0.055  0.972  0.104  1.000  0.106  1.000  0.069  
R400/F100  0.991  0.093  0.999  0.061  1.000  0.050  0.982  0.122  0.998  0.081  0.999  0.071  
R500/F100  0.990  0.088  0.999  0.073  1.000  0.050  0.970  0.131  0.999  0.095  1.000  0.068  


R100/F100  R: N (0, 1) 
0.979  0.058  0.997  0.054  1.000  0.063  
R200/F100  0.996  0.051  1.000  0.049  1.000  0.053  
R300/F100  0.998  0.057  0.999  0.060  1.000  0.048  
R400/F100  0.995  0.045  1.000  0.068  1.000  0.044  
R500/F100  0.999  0.061  0.999  0.061  1.000  0.045 
The mean of Type I error rates and power of MIMIC model for detecting medium uniformDIF in 10item scale.
Distributions for 
3 categories  5 categories  7 categories  Distributions for 
3 categories  5 categories  7 categories  

Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  
R100/F100  R: N (0, 1) 
0.665  0.051  0.740  0.072  0.780  0.052  R: N (0, 1) 
0.671  0.068  0.768  0.061  0.803  0.062 
R200/F100  0.777  0.054  0.884  0.074  0.885  0.055  0.777  0.079  0.883  0.066  0.881  0.068  
R300/F100  0.827  0.050  0.894  0.054  0.927  0.053  0.817  0.084  0.894  0.062  0.920  0.051  
R400/F100  0.833  0.056  0.911  0.054  0.937  0.053  0.832  0.061  0.913  0.073  0.946  0.066  
R500/F100  0.842  0.062  0.929  0.043  0.949  0.061  0.839  0.072  0.930  0.071  0.950  0.050  


R100/F100  R: N (0, 1) 
0.710  0.053  0.776  0.065  0.790  0.049  R: N (0, 1) 
0.743  0.058  0.789  0.061  0.829  0.062 
R200/F100  0.823  0.049  0.901  0.066  0.902  0.057  0.840  0.065  0.888  0.069  0.917  0.074  
R300/F100  0.857  0.056  0.907  0.062  0.938  0.054  0.868  0.065  0.914  0.058  0.937  0.061  
R400/F100  0.877  0.061  0.925  0.054  0.949  0.058  0.867  0.081  0.947  0.068  0.952  0.077  
R500/F100  0.901  0.060  0.946  0.049  0.956  0.05  0.906  0.056  0.940  0.059  0.966  0.063  


R100/F100  R: Beta (1, 4) 
0.708  0.058  0.756  0.051  0.807  0.058  R: Beta (0.5, 4) 
0.673  0.059  0.780  0.048  0.812  0.062 
R200/F100  0.793  0.042  0.870  0.062  0.880  0.052  0.786  0.053  0.862  0.049  0.881  0.053  
R300/F100  0.837  0.047  0.929  0.050  0.929  0.054  0.837  0.049  0.891  0.049  0.913  0.057  
R400/F100  0.841  0.050  0.911  0.057  0.925  0.044  0.833  0.045  0.921  0.052  0.938  0.055  
R500/F100  0.879  0.068  0.934  0.064  0.958  0.059  0.853  0.043  0.919  0.043  0.940  0.048  


R100/F100  R: Beta (4, 1) 
0.701  0.067  0.783  0.051  0.825  0.057  R: Beta (4, 0.5) 
0.703  0.052  0.788  0.053  0.835  0.048 
R200/F100  0.806  0.044  0.882  0.056  0.917  0.056  0.824  0.059  0.896  0.061  0.909  0.052  
R300/F100  0.868  0.037  0.934  0.053  0.929  0.048  0.879  0.054  0.926  0.053  0.931  0.053  
R400/F100  0.847  0.055  0.931  0.046  0.948  0.042  0.879  0.047  0.940  0.053  0.962  0.043  
R500/F100  0.889  0.056  0.935  0.065  0.961  0.064  0.886  0.050  0.948  0.064  0.958  0.046  


R100/F100  R: Beta (1, 4) 
0.701  0.060  0.754  0.064  0.813  0.064  R: Beta (0.5, 4) 
0.687  0.086  0.773  0.077  0.815  0.066 
R200/F100  0.809  0.076  0.871  0.071  0.893  0.065  0.813  0.127  0.880  0.083  0.893  0.092  
R300/F100  0.857  0.075  0.928  0.057  0.932  0.053  0.842  0.121  0.908  0.081  0.924  0.081  
R400/F100  0.833  0.081  0.920  0.066  0.940  0.058  0.845  0.109  0.929  0.080  0.946  0.084  
R500/F100  0.896  0.094  0.923  0.080  0.954  0.074  0.872  0.113  0.936  0.092  0.959  0.079  


R100/F100  R: Beta (4, 1) 
0.680  0.088  0.739  0.053  0.798  0.054  R: Beta (4, 0.5) 
0.627  0.095  0.707  0.078  0.779  0.063 
R200/F100  0.755  0.068  0.849  0.063  0.865  0.06  0.738  0.098  0.816  0.075  0.860  0.073  
R300/F100  0.793  0.056  0.910  0.067  0.915  0.061  0.802  0.133  0.871  0.096  0.891  0.082  
R400/F100  0.799  0.079  0.903  0.076  0.913  0.055  0.797  0.135  0.883  0.081  0.906  0.086  
R500/F100  0.842  0.093  0.917  0.080  0.939  0.075  0.816  0.108  0.890  0.083  0.917  0.072  


R100/F100  R: N (0, 1) 
0.696  0.052  0.786  0.054  0.817  0.058  
R200/F100  0.785  0.055  0.855  0.054  0.900  0.05  
R300/F100  0.860  0.054  0.910  0.048  0.926  0.057  
R400/F100  0.869  0.050  0.929  0.062  0.954  0.045  
R500/F100  0.870  0.052  0.922  0.059  0.971  0.055 
The mean of Type I error rates and power of MIMIC model for detecting severe uniformDIF in 10item scale.
Distributions for 
3 categories  5 categories  7 categories  Distributions for 
3 categories  5 categories  7 categories  

Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  Power  Alpha  
R100/F100  R: N (0, 1) 
0.975  0.051  0.998  0.071  1.000  0.051  R: N (0, 1) 
0.969  0.066  0.997  0.061  0.999  0.060 
R200/F100  0.991  0.054  1.000  0.075  1.000  0.055  0.986  0.079  0.998  0.065  1.000  0.069  
R300/F100  0.992  0.052  0.999  0.054  1.000  0.053  0.992  0.085  0.999  0.062  1.000  0.051  
R400/F100  0.997  0.056  1.000  0.054  0.999  0.053  0.990  0.061  1.000  0.073  1.000  0.065  
R500/F100  0.993  0.062  1.000  0.043  1.000  0.061  0.990  0.071  1.000  0.071  1.000  0.050  


R100/F100  R: N (0, 1) 
0.989  0.053  0.998  0.064  0.999  0.049  R: N (0, 1) 
0.997  0.057  1.000  0.062  0.999  0.062 
R200/F100  0.998  0.050  1.000  0.065  1.000  0.057  1.000  0.065  1.000  0.069  1.000  0.074  
R300/F100  1.000  0.056  1.000  0.062  1.000  0.054  1.000  0.065  1.000  0.057  1.000  0.061  
R400/F100  0.999  0.062  1.000  0.054  1.000  0.056  1.000  0.080  1.000  0.068  1.000  0.077  
R500/F100  1.000  0.061  1.000  0.049  1.000  0.050  1.000  0.056  1.000  0.060  1.000  0.063  


R100/F100  R: Beta (1, 4) 
0.991  0.058  0.997  0.051  0.998  0.059  R: Beta (0.5, 4) 
0.977  0.059  0.995  0.048  0.997  0.061 
R200/F100  0.989  0.044  0.999  0.062  1.000  0.051  0.991  0.053  1.000  0.049  0.999  0.053  
R300/F100  0.998  0.048  1.000  0.051  0.999  0.054  0.994  0.050  1.000  0.049  0.999  0.057  
R400/F100  0.994  0.050  1.000  0.057  1.000  0.045  0.991  0.044  1.000  0.052  1.000  0.055  
R500/F100  0.998  0.068  1.000  0.064  1.000  0.059  0.996  0.043  0.999  0.043  1.000  0.048  


R100/F100  R: Beta (4, 1) 
0.992  0.067  0.998  0.049  1.000  0.057  R: Beta (4, 0.5) 
0.998  0.053  1.000  0.054  1.000  0.049 
R200/F100  1.000  0.043  1.000  0.056  1.000  0.056  0.999  0.057  1.000  0.061  1.000  0.051  
R300/F100  0.998  0.037  1.000  0.053  1.000  0.048  0.998  0.054  1.000  0.053  1.000  0.053  
R400/F100  0.997  0.054  1.000  0.046  1.000  0.042  0.997  0.046  1.000  0.053  1.000  0.043  
R500/F100  1.000  0.056  1.000  0.065  1.000  0.064  0.998  0.050  0.999  0.063  1.000  0.046  


R100/F100  R: Beta (1, 4) 
0.997  0.059  0.996  0.063  1.000  0.065  R: Beta (0.5, 4) 
0.996  0.086  1.000  0.078  1.000  0.066 
R200/F100  1.000  0.078  1.000  0.070  1.000  0.065  0.999  0.127  1.000  0.083  1.000  0.092  
R300/F100  1.000  0.076  1.000  0.056  1.000  0.054  1.000  0.121  1.000  0.081  1.000  0.081  
R400/F100  0.999  0.081  1.000  0.066  1.000  0.059  1.000  0.110  1.000  0.080  1.000  0.084  
R500/F100  1.000  0.094  1.000  0.079  1.000  0.074  1.000  0.114  1.000  0.092  1.000  0.079  


R100/F100  R: Beta (4, 1) 
0.978  0.087  0.994  0.052  0.999  0.056  R: Beta (4, 0.5) 
0.955  0.094  0.989  0.080  0.995  0.063 
R200/F100  0.979  0.066  0.997  0.064  1.000  0.060  0.973  0.099  0.997  0.076  0.999  0.072  
R300/F100  0.988  0.055  0.999  0.068  1.000  0.061  0.977  0.132  0.997  0.097  0.998  0.082  
R400/F100  0.984  0.080  0.998  0.075  1.000  0.055  0.972  0.135  0.999  0.080  1.000  0.085  
R500/F100  0.990  0.093  0.998  0.079  1.000  0.075  0.984  0.108  0.999  0.083  1.000  0.072  


R100/F100  R: N (0, 1) 
0.993  0.052  0.998  0.056  1.000  0.059  
R200/F100  0.996  0.056  0.999  0.054  1.000  0.050  
R300/F100  0.998  0.055  1.000  0.048  1.000  0.057  
R400/F100  0.997  0.049  1.000  0.061  1.000  0.045  
R500/F100  0.996  0.052  1.000  0.059  1.000  0.056 
In this section, we explain the example of the questionnaire to assess the effect of small sample size on measurement equivalence of psychometric questionnaires in the MIMIC model.
The 12item General Health Questionnaire (GHQ12) is an appropriate instrument to assess Minor Psychiatric Disorders (MPD) during the previous month [
Of the 269 men participating in the study, 100 men were randomly selected. Among 502 women, samples with the size of 100, 200, 300, 400, and 500 were randomly chosen.
The results of fitting the MIMIC model to detect uniformDIF are shown in Table
Detection of uniformDIF for GHQ12 with MIMIC model.
GHQ12 items  M100/F100  M100/F200  M100/F300  M100/F400  M100/F500  

Item 1  Able to concentrate  −  −  −  +  + 
Item 2  Lost much sleep  −  −  −  −  − 
Item 3  Playing a useful part  −  −  −  −  − 
Item 4  Capable of making decisions  −  −  −  −  − 
Item 5  Under stress  −  −  −  −  − 
Item 6  Could not overcome difficulties  −  −  −  −  − 
Item 7  Enjoy your daytoday activities  −  −  −  −  − 
Item 8  Face up to problems  −  −  −  −  − 
Item 9  Feeling unhappy and depressed  −  −  −  −  − 
Item 10  Losing confidence  −  −  −  −  − 
Item 11  Thinking of self as worthless  −  −  −  −  − 
Item 12  Feeling reasonably happy  +  +  +  +  + 
The plus sign indicates item with DIF and the minus sign indicates item with freeDIF; the letter “M” means male and the letter “F” means female.
The present study provided a simulationbased framework to determine the statistical properties of MIMIC model when latent trait distribution was nonnormal and sample size was small.
Up to now, in most simulation researches, item responses were produced using the GRM when latent trait was normally distributed. However, in many psychological researches, the assumption of normality latent construct can frequently be violated in practice [
Under various combinations of latent trait distributions, the power of MIMIC model increased as the reference group sample size increased, but Type I error did not obey a specific pattern. This finding is consistent with those of previous studies that demonstrated when sample size increased, the power for detecting DIF increased [
The results from a research study indicated that increasing the number of items could lead to improvement in the power and decrease in the Type I error rate of MIMIC model for detecting uniformDIF. With respect to this, our results were in line with the results of several studies [
When the magnitude of uniformDIF was increased, the performance of MIMIC model improved; that is, the power increased and Type I error was reduced. This was an expected result, and similar results were reported in other studies [
Another important feature considered in this study was evaluation of the number of response categories that could affect the power of MIMIC model for detecting DIF. Our study shows that increased number of response categories resulted in a systematic increase in the power of MIMIC model for detecting uniformDIF. By increasing the number of items from 5 to 7, the MIMIC model power improved just 1.52% for detection of uniformDIF. Increasing the number of response categories creates problems for low educated participants; hence, we suggested 5point response scale that was more suitable for people with lower levels of education which was easier to interpret. Allahyari et al. recommended the minimum number of response categories for DIF analysis to be five [
Our study showed that the number of convergence MIMIC models did not depend on the skewness rate in latent construct distribution. In numerical analysis, the number of convergences could be affected by the method used for parameter estimation [
MIMIC model uses single latent covariance matrix for parameter estimation. Hence, in this model, it is assumed that the variance of latent factor is equal across the groups. Carroll concluded that violating the homogeneity of variance assumption could lead to inflated Type 1 error in DIF detection and increase in bias in estimating the factor loadings and the latent group mean difference [
There are many different methods to make DIF items. The most common technique for generating DIF items is adding a certain amount to all thresholds for the focal group which was used in this study. Although this issue is controversial, some authors point out that, by adding or subtracting a value asymmetrically to the parameters threshold, this action could affect performance model for DIF detection [
Finally, this study had some limitations which need to be taken into account. Previous simulation studies have shown that power of MIMIC model could be affected by the number of DIF items [
Our findings showed that, by increasing the number of response categories, the number of items, the magnitude of DIF, and sample size could lead to an increase in power of MIMIC model for uniformDIF detecting. This study revealed that MIMIC model in detection of uniformDIF was fairly robust to departure from the normal latent trait distribution assumption. When latent trait distributions were skewed, the power of MIMIC model in detection of uniformDIF was at an acceptable level. However, empirical Type I error rate was slightly greater than nominal significance level of 0.05. Consequently, this technique is appropriated for uniformDIF detection when latent trait distribution is nonnormal and the focal group sample size is small. Due to the insignificant effect on improving power by increasing the number of response categories from 5 to 7, we recommend 5point response scale for uniformDIF detection using MIMIC model, especially for participants with low levels of education. The results obtained from this study provide an appropriate guideline for further research. We recommend further studies to investigate the effect of the number of items with DIF and type of DIF on MIMIC model power when latent trait is skewed.
The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.
This work was extracted from the Ph.D. thesis of Jamshid Jamali and was supported by Grant no. 9410488 from Shiraz University of Medical Sciences, Shiraz, Iran. The authors would like to thank Ms. Narges Roustaei and Mr. Saeid Ghanbari for their valuable and constructive comments. Editing services of the Shiraz University of Medical Sciences Research Consultation Centre (RCC) are acknowledged.