IRT models are widely used but often rely on distributional assumptions about the latent variable. For a simple class of IRT models, the Rasch models, conditional inference
is feasible. This enables consistent estimation of item parameters without reference to the distribution of the latent variable in the population. Traditionally, specialized software has been needed for this, but conditional maximum likelihood estimation can be done using standard software for fitting generalized linear models. This paper describes an SAS macro %rasch_cml that fits polytomous Rasch models. The macro estimates item parameters using conditional maximum likelihood (CML) estimation and person locations using maximum likelihood estimator (MLE) and Warm's weighted likelihood estimation (WLE). Graphical presentations are included: plots of item characteristic curves (ICCs), and a graphical goodness-of-fit-test is also produced.
1. Introduction
Item response theory (IRT) models were developed to describe probabilistic relationships between correct responses on a set of test items and continuous latent traits [1]. In addition to educational and psychological testing, IRT models have been also used in other areas of research, for example, in health status measurement and evaluation of Patient-Reported Outcomes (PROs) like physical functioning and psychological well-being wich are typical in applications of IRT models. Traditional applications in education often use dichotomous (correct/incorrect) item scoring, but polytomous items are common in other applications.
Formally, IRT models deal with the situation where several questions (called items) are used for ordering of a group of subjects with respect to a unidimensional latent variable. Before the ordering of subjects can be done in a meaningful way, a number of requirements must be met.
Items should measure only one latent variable.
Items should increase with the underlying latent variable.
Items should be sufficiently different to avoid redundance.
Items should function in the same way in any subpopulation.
These requirements are standard in educational tests where (i) items should deal with only one subject (e.g., not being a mixture of math and language items), (ii) the probability of a correct answer should increase with ability, (iii) items should not ask the same thing twice, and (iv) the difficulty of an item should depend only on the ability of the student, for example, an item should not have features that makes it easier for boys than for girls at the same level of ability.
Let θ denote the latent variable, and let X¯=(Xi)i=1,…,I denote the vector of item responses. The two first requirements can be written as follows.
θ is a scalar.
θ↦E(Xi∣θ) is increasing for all items i.
One would expect two similar items to be highly correlated and to have an even higher correlation than what the underlying latent variable accounts for, and it is usual to impose the requirement of local independence
(iii)P(X¯=x¯∣θ)=∏i=1IP(Xi=xi∣θ), for all θ.
This requirement is related to the requirement of nonredundancy. The fourth requirement can be written as
(iv) P(Xi=xi∣Y,θ)=P(Xi=xi∣θ) for all items i and all variables Y.
The requirements (i)–(iv) are referred to as unidimensionality, monotonicity, local independence, and absence of differential item functioning (DIF), respectively. Fitting observed data to an IRT model enables us to test if these requirements are met. Evaluation of model fit is crucial and many fit statistics exist [2], but the issue of fit can also be addressed graphically.
This paper describes a SAS macro %rasch_cml that fits an IRT model, the polytomous Rasch model [3, 4]. The SAS macro is available from
biostat.ku.dk/~kach/index.html#cml.
It estimates item parameters, plots item characteristic curves, estimates person locations, and produces graphical tests of fit.
2. The Polytomous Rasch Model
Consider I items, where item i has mi+1 response categories represented by the numbers 0,…,mi. Let Xi be the response to item i with realization xi. For items i=1,…,I, the polytomous Rasch model is given by probabilities
(1)P(Xi=xi∣θ)=exp(xiθ+ηixi)Ki-1,
where η¯i=(ηih)h=1,…,mi is the vector of item parameters for item i, ηi0=0, for all i, and
(2)Ki=Ki(θ,η¯i)=∑l=0miexp(lθ+ηil).
A normalizing constant. An alternative way of parameterizing is in terms of the thresholds βik=-(ηik-ηik-1) for i=1,…,I and k=1,…,mi that are easily interpreted, since βik is the location on the latent continuum where scale where the probability, for item i, of choosing category k-1 equals the probability of choosing category k. This model was originally proposed by Andersen [5], see also [6]. Masters [7] called this model the Partial Credit Model and derived the probabilities (1) from the requirement that the conditional probabilities P(Xi=k∣Xi∈{k-1,k};θ), for k=1,…,mi, fit a dichotomous Rasch model:
(3)P(Xi=k∣Xi∈{k-1,k};θ)=exp(θ-βik)1+exp(θ-βik).
Using the assumption (iii) of local independence, the vector X¯=(Xi)i=1,…,I with realization that x¯=(xi)i=1,…,I,
(4)P(X¯=x¯∣θ)=exp(∑i=1I(xiθ+ηixi))K(θ)-1=exp(rθ)exp(∑i=1Iηixi)K(θ)-1,
where r=∑i=1Ixi and K(θ)=∏i=1IKi(θ,η¯i). By Neyman’s factorization theorem, it is clear from (4) that the sum of item responses R=∑i=1IXi is sufficient for θ. The joint log likelihood for a sample of v=1,…,N persons is given by
(5)l(η¯1,…,η¯I;θ¯)=∑v=1NRvθv+∑v=1N∑i=1Iηixvi-∑v=1NlogK(θv),
where θ¯=(θ1,…,θN)T. Jointly, estimating all parameters from (5) does not provide consistent estimates, since the number of parameters increase with the sample size. If our interest is estimating the item parameters the person parameters, can be interpreted as incidental or nuisance parameters [8].
3. Conditional Maximum Likelihood Estimation
The joint log likelihood function (5) can be written as
(6)l(η¯1,…,η¯I;θ¯)=∑v=1NθvRv+∑i=1I∑h=1miCih-logK,
where K=∏v=1N∏i=1IKi(θv,ηi), rv=∑i=1Nxvi is the total score observed for person v=1,…,N, and (Cih)i=1,…,I;h=1,…,mi defined by
(7)Cih=∑v=1N1(Xvi=h)
are the item margins. Note that from (5) it can be seen that the total score Rv is sufficient for the person location θv and that for each i=1,…,I the item margin (Cih)h=1,…,mi is sufficient for the item parameter ηi.
Restrictions are needed to ensure that the model (5) is identified since from (1) it is clear that for all (θ,ηi)(8)P(Xvi=xvi∣θ,ηi)=P(Xvi=xvi∣θ*,ηi*)
for (θ*,ηi*) defined by θ*=θ-k and
(9)ηih*=ηih+kh,
for h=1,…,mi.
To obtain consistent item parameters estimates marginal [9] or conditional [10] maximum likelihood estimation is used. The marginal approach to item parameter estimation assumes that the latent variables are sampled from a population and introduces an assumption about the distribution of the latent variable. The sufficiency property can also be used to overcome the problem of inconsistency of item parameter estimates. This can be done by conditioning on the sum Rv of the entire response vector X¯v=(Xv1,…,Xvk) yielding conditional maximum likelihood (CML) inference. For a vector X¯v=(Xv1,…,Xvk) from the Rasch model, the distribution of the score Rv=∑i=1kXvi is given by the probabilities
(10)Pr(Rv=r∣θ)=∑x¯∈X(r)Pr(X=x∣θ)=∑x¯∈X(r)exp(rθ+∑i=1kηixi)∏i=1kKi(ηi,θ),
where summation is over the set X(r) of all response vectors x¯=(x1,…,xk) with ∑i=1kxi=r. The probability can be written as
(11)Pr(Rv=r∣θ)=exp(rθ)∏i=1kKi(ηi,θ)∑x¯∈X(r)exp(∑i=1kηixi).
Let the last sum be denoted by
(12)γr=γr(η1,…,ηk)=∑x¯∈X(r)exp(∑i=1kηixi).
The score is sufficient for θ, and the item parameters can be estimated consistently using the conditional distribution of the responses given that the scores. The conditional distribution of the vector X¯v=(Xv1,…,Xvk) of item responses given the score is given by the probabilities
(13)Pr(X¯v=x¯v∣Rv=r,Θv=θv)=exp(∑i=1kηixvi)γr.
These do not depend on the value of θv, and the conditional likelihood function is the product
(14)LC(η1,…,ηk)=∏v=1nexp(∑i=1kηixvi)γrv.
Again a linear restriction on the parameters is needed in order to ensure that the model is identified. Maximizing this likelihood yields item parameter estimates which are conditionally consistent. If, for each possible response vector x¯=(x1,…,xk), we let n(x¯) denote the number of persons with this response vector and for each possible score r and n(r) denote the observed number of persons with this value of the score, this likelihood function can be written as
(15)LC(η1,…,ηk)=∏x¯exp(n(x¯)∑i=1kηi,xi)∏rγrn(r)
and using the indicator functions (Ivih)v=1,…,n;i=1,…,k;h=1,…,m, this likelihood function can be rewritten as
(16)LC(η1,…,ηk)=∏v=1nexp(∑i=1k∑h=1mXvihηih)∏rγrn(r),
yielding the conditional log likelihood function
(17)lC(η1,…,ηk)=∑i=1k∑h=1mX·ihηih-∑r=0kmnrlog(γr),
where X·ih=∑v=1nXvih are the sufficient statistics for the item parameters. These sufficient statistics, called item margins, are the number of persons giving the response h to item i. The item parameters in this model can be estimated by solving the likelihood equations that equate the sufficient statistics (X·ih)i=1,…,k;h=1,…,m to their expected values conditional on the observed value r¯=(r1,…,rn) of the vector R¯=(R1,…,Rn) of scores. These expected values have the form
(18)E(X·ihR¯=r¯)=∑v=1nPr(Xvi=hRv=rv),
and for an item i, these can be written in terms of the probabilities of having a score of r-h on the remaining items yielding
(19)E(X·ih∣R¯=r¯)=exp(ηih)∑r=0kmn(r)γr-h(i)γr.
Because these likelihood equations have the same form as those in a generalized linear model [11–13], the item parameters can be estimated using standard software like SPSS [14] or SAS [15].
4. Estimation of Person Locations
There are various ways of estimating the person locations. An important feature of the Rasch model is that the sum score R=∑i=1IXi is sufficient for θ and consequently that the likelihood function for estimating θv is proportional to the probabilities
(20)P(R=r∣θ)=∑x¯∈X(r)P(X¯=x¯∣θ)=exp(rθ)K(θ)-1∑x¯∈X(r)exp(∑i=1I∑h=0miηixi),
where, as before, summation is over the X(r)={x¯∣∑i=1Ixi=r} of all response vectors with the sum r. Now, define the γ-polynomials
(21)γr=γr(η¯1,…,η¯I)=∑x¯∈ArX(r)exp(∑i=1I∑h=0mixihηih)
to obtain the expression
(22)P(R=r∣θ)=exp(rθ)γrK(θ)-1.
Note from this that the normalizing constant K(θ) can be written as a function of the γ’s
(23)K(θ)=∏i=1I∑k=0miexp(kθ+ηik)=∑r=0Rexp(rθ)γr.
Calculation of the γ’s is thus essential for estimation of the person locations. A recursion formula is described in what follows. Let γr(i) denote the γ-polynomial based on the first i items. It is then possible to calculate γt(i+1) by the recursion formula
(24)γr(i+1)=∑xexp(ηi+1,x)γr-x(i)
since a total score of r on the items 1,…,i+1 must be obtained by scoring x on item i+1 and r-x on the items 1,…,i. The values of x in the summation in the formula above must be chosen in such a way that the sum of the first items is at most ∑k=1imk and that item i is at most mi implying that r cannot exceed mi. That is, γt(i) becomes
(25)γr(i+1)=∑x=r-∑k=1imkmin(mi+1,r)exp(ηi+1,r)γr-x(i).
Person locations can be estimated using maximum likelihood estimation or Bayes model estimation. A special case of the latter is socalled weighted likelihood estimation. Since the γ’s do not depend on θ, (22) is an exponential family where the likelihood equation for estimating θ is
(26)R=∂∂θ(log[∑r=0Rexp(rθ)γr])=E(Rθ),
and the maximum likelihood estimator (MLE) θ^ can be obtained by the Newton-Raphson algorithm. The probabilities (22) show that the score is increasing as a function of θ. For individuals who have obtained scores of zero or the largest possible score R=∑i=1Imi the probabilities (22) attain their maximum when θ is -∞ and ∞, respectively. The Bayes model estimator (BME) of θ is obtained by choosing a prior density g for the latent parameter and then maximizing the posterior density
(27)gx(θ)=P(X=x∣θ)g(θ)P(X=x)∝P(X=x∣θ)g(θ)
with respect to θ keeping item parameters and the observations fixed. The MLE described above is a special case corresponding to gω≡1. Choosing the prior as the square root of the Fisher information
(28)g(θ)=ℐ(θ)
results in the weighted maximum likelihood estimator (WLE) [16]. With this prior, one obtains an estimator with minimal bias and the same asymptotic distribution as the MLE. The equation to be solved in order to obtain the WLE is
(29)R=∂∂θ(log[∑r=0Rexp(Rθ)γr]-12logℐ(θ)),
and the Newton-Raphson algorithm can be used for this.
5. Implementation in SAS
The SAS macro %rasch_cml uses PROC GENMOD to estimate item parameters and PROC NLMIXED to estimate person locations. It writes person locations estimated by maximum likelihood estimation (MLE) and by weighted likelihood estimation (WLE) and their asymptotic standard errors in a data set. Furthermore, a copy of input data set with an added column containing the maximum likelihood estimates is created.
6. Simulation
Evaluating of model fit can be done by comparing what has been observed with simulated data describing what could have been observed under the model. The SAS macro %rasch_cml simulates data sets under the model. These are obtained by first simulating N person scores locations from the empirical score distribution and then simulating item responses. Let L denote the set of possible scores and for r∈L define Vr={v|Rv=r}⊂{1,…,N}. Let Nr=♯Vr denote the number of persons with each score. First simulate R1(S),…,RN(S) with probabilities P(Rv(S)=r)=Nr/N, and next simulate a data matrix (Xvi(S)) using the probabilities
(30)P(Xvi(S)=·∣Rv=Rv(S)).
This procedure is repeated a number of times yielding data matrices (Xvi(S)), S=1,2,…
7. Graphics
Three graphical representations are made by the SAS macro %rasch_cml: item characteristic curves (ICCs) that display the response probabilities along the latent continuum and two item fit plots. Let Nihr denote the number of persons with total score r giving the answer h to item i. For each combination (i,h)∈{1,…,I}×{0,1,…,mi}, the macro plots the observed proportion
(31)r↦NihrNr
as solid black dots and the expected proportions (the probabilities)
(32)r↦P(Xvi=h∣Rv=r)
as solid blue lines along with 95% confidence limits as dashed green lines. These plots are illustrated in Figure 2 and are closely related to plots of the ICCs
(33)θv↦P(Xvi=h∣θv),
because Rv is sufficient for θv.
The observed mean score function for item i is
(34)θ↦1Nr∑v∈VrXvi.
The simulated mean score function is obtained as follows by simulating item responses X1i(S),…,XNi(S) as described in Section 6 and calculating
(35)θ↦1Nr(S)∑v∈VrXvi(S),
where Nr(S)=♯{v∣∑i∈IXvi(S)=r}.
8. The SAS Macro
The Hospital Anxiety and Depression Scale (HADS) was designed as a brief instrument used to assess symptoms of anxiety and depression [17] and contains 14 items often scored as two seven-item subscales: “depression” (even numbered items) and “anxiety” (odd numbered items). The SAS macro is illustrated using data reported by Pallant and Tennant [18]. The first step is to create a data set
data inames;
input item_name $ item_text $ max Group @@;
cards;
AHADS1 anx1 3 1 AHADS3 anx3 3 2
AHADS5 anx5 3 3 AHADS7 anx7 3 4
AHADS9 anx9 3 5 AHADS11 anx11 3 6
AHADS13 anx13 3 7
;
run;
that describes the items: item_name is the name of the items, item_text are text strings attached to the items, max is the maximum item score for each item, and group are integers defining groups of items that have the same item parameters. Thus, all HADS items are scored 0,1,2,3, and they all have their own vector of item parameters. The macro is called using the statement
%rasch_cml(DATA=work.HADS, ITEM_NAMES=
inames, OUT=HADSTEST);
where DATA= specifies the data set to be analyzed, ITEM_NAMES= is the data set that describes the items and OUT= specifies a prefix for all output data sets generated by the macro (the default value is CML).
The SAS macro creates six data sets. The data set CML_logl contains the maximum value of the conditional log likelihood function. The data sets CML_par and CML_par_ci contain item parameter estimates, and the difference between them is illustrated by the (edited) output
item beta1 beta2 beta3
AHADS1 −3.75 0.17 0.82
:
AHADS9 −0.93 1.51 2.43
from CML_par and see Table 1 from CML_par_ci. Note that the threshold parameters (β’s) are the same.
Item parameter estimates from the data set CML_par_ci.
Lower
Upper
Item
Label
Cat
Estimate
CL
CL
AHADS1
eta11
1
3.75
3.15
4.35
AHADS1
eta12
2
3.58
2.95
4.21
AHADS1
eta13
3
2.76
2.12
3.39
:
AHADS9
eta71
1
0.93
0.64
1.22
AHADS9
eta72
2
−0.58
−1.01
−0.14
AHADS9
eta73
3
−3.00
−3.66
−2.35
:
AHADS1
beta11
1
−3.75
−4.35
−3.15
AHADS1
beta12
2
0.17
−0.14
0.48
AHADS1
beta13
3
0.82
0.45
1.19
:
AHADS9
beta71
1
−0.93
−1.22
−0.64
AHADS9
beta72
2
1.51
1.13
1.89
AHADS9
beta73
3
2.43
1.75
3.11
The data sets CML_pp_regr and CML_regr are copies of the input data set with added variables useful for latent regression [19]. The data set CML_theta contains MLE and WLE estimates of person locations and their standard errors.
Further options can be specified: ICC=YES yields a plot of the item characteristic curves for each item. The ICCs for HADS item 9 are shown in Figure 1. Specifying plotcat=YES creates plots of observed and expected item category frequencies stratified by the total score. This yields mi plots for item i as exemplified in Figure 2.
Item characteristic curves (ICCs) for HADS item 9 plotted with option ICC=YES. The curves intersect at the thresholds β91=-0.93, β92=1.51, and β93=2.43.
Observed and expected item category frequencies stratified by the total score plotted with option plotcat=YES.
Using the option plotmean=YES makes the macro plot item means against raw scores as solid black lines along with item means simulated under the model plotted as gray-dashed lines. The default number of simulations is 30, but this can be changed using the NSIMU= option. Figure 3 shows an example.
Item fit plot for HADS item 9 plotted with option plotmean=YES.
The plot shows the mean scores to be increasing with the total score, compared with requirement (ii), and that the variations observed in the data are well within the range of what would be expected under the model.
9. Discussion
Several proprietary software packages for fitting Rasch models exist, the most widely used being RUMM [20], Conquest [21], and WINSTEPS [22]. With the increasing use of IRT and Rasch models in new research areas where access to specialized proprietary software is limited, it is important to provide implementations in standard statistical software such as R and SAS. SAS macros for Rasch models already exist. The macros %anaqol [23] and %irtfit [24] encompass a wide range of IRT models. The SAS macro %anaqol computes Cronbach’s coefficient alpha [25], several useful graphical representations, and estimates the parameters for any of the five IRT models (the dichotomous Rasch model [3, 4], the Birnbaum (2PL) model [26], OPLM, the partial credit model [7], and the rating scale model [6]) using marginal maximum likelihood. The SAS macro %irtfit produces a variety of indices for testing the fit of IRT models to dichotomous and polytomous item response data; it does not perform an estimation of item parameters but requires these that have been estimated using other IRT model software programs. The R package eRm [27] is a flexible tool for these analyses, as are the SAS macros %anaqol [23] and %irtfit [24]. The SAS macro %irtfit encompasses a wide range of IRT models but does not estimate item, parameters. The SAS macro %anaqol is very useful, but some features are only available for dichotomous items and the implemented plots of empirical and theoretical ICCs do not show confidence limits.
It has previously been discussed how to implement a conditional estimation in SAS [15], but no software was provided. The macro described in this paper uses these ideas to provide a user-friendly tool for item analysis, with focus on graphics.
Because the macro uses the contingency table of item responses no responses must be missing; if the estimation procedure fails to converge, a warning or error message is printed. The plots of observed and expected counts in each score group can be interpreted as empirical versions of the item characteristic curves. However, when many score groups are small, as is often the case in applications, these plots are not helpful. Therefore, the macro produces a single item-level goodness-of-fit plot. Furthermore, it extends previously implemented macros in that the output and features are the same for dichotomous and polytomous item response formats and that it presents more graphics, specifically new goodness-of-fit plot where observed item means are compared to item means simulated under the model.
HambletonR. K.van der lindenW. J.GlasC. A. W.VerhelstN. D.FischerMolenaarTests of Fit for Polytomous Rasch ModelsRaschG.FischerG. H.MolenaarI. W.AndersenE. B.Sufficient statistics and latent trait modelsAndrichD.A rating formulation for ordered response categoriesMastersG. N.A rasch model for partial credit scoringNeymanJ.ScottE. L.Consistent estimates based on partially consistent observationsBockR. D.AitkinM.Marginal maximum likelihood estimation of item parameters: application of an EM algorithmAndersenE. B.Conditional inference for multiple-choice questionnairesKeldermanH.Loglinear rasch model testsTjurT.A connection between Rasch's item analysis model and a multiplicative Poisson modelAgrestiA.Computing conditional maximum likelihood estimates for generalized Rasch models using simple loglinear models with diagonals parametersTenVergertE.GillespieM.KingmaJ.Testing the assumptions and interpreting the results of the Rasch model using log-linear procedures in SPSSChristensenK. B.Fitting polytomous rasch models in SASWarmT. A.Weighted likelihood estimation of ability in item response theoryZigmondA. S.SnaithR. P.The hospital anxiety and depression scalePallantJ. F.TennantA.An introduction to the Rasch measurement model: an example using the Hospital Anxiety and Depression Scale (HADS)ChristensenK. B.BjornerJ. B.KreinerS.PetersenJ. H.Latent regression in loglinear Rasch modelsAndrichD.LyneA.SheridanB.LuoG.WuM. L.AdamsR. J.WilsonM. R.HaldaneS. A.LinacreJ. M.HardouinJ. B.MesbahM.The SAS macro-program %AnaQol to estimate the parameters of item responses theory modelsBjornerJ. B.SmithK. J.StoneC.SunX.CronbachL. J.Coefficient alpha and the internal structure of testsBirnbaumA. L.LordF. M.NovickM. R.Latent trait models and their use in inferring an examinee’s abilityMairP.HatzingerR.Extended rasch modeling: the eRm package for the application of IRT models in R