A Protocol for Advanced Psychometric Assessment of Surveys

Background and Purpose. In this paper, we present a protocol for advanced psychometric assessments of surveys based on the Standards for Educational and Psychological Testing. We use the Alberta Context Tool (ACT) as an exemplar survey to which this protocol can be applied. Methods. Data mapping, acceptability, reliability, and validity are addressed. Acceptability is assessed with missing data frequencies and the time required to complete the survey. Reliability is assessed with internal consistency coefficients and information functions. A unitary approach to validity consisting of accumulating evidence based on instrument content, response processes, internal structure, and relations to other variables is taken. We also address assessing performance of survey data when aggregated to higher levels (e.g., nursing unit). Discussion. In this paper we present a protocol for advanced psychometric assessment of survey data using the Alberta Context Tool (ACT) as an exemplar survey; application of the protocol to the ACT survey is underway. Psychometric assessment of any survey is essential to obtaining reliable and valid research findings. This protocol can be adapted for use with any nursing survey.


The Alberta Context Tool
Organizational context is "…the environment or setting in which people receive healthcare services, or in the context of getting research evidence into practice, the environment or setting in which the proposed change is to be implemented" ( [1], page 2��). Context is believed to in�uence the successful implementation of research evidence by nurses in healthcare settings internationally. However, there is little empirical evidence to support this claim. One reason for this is the absence of a robust measure of organizational context in healthcare. e Alberta Context Tool (ACT) was developed in 2006 to address this gap.
Underpinned by the Promoting Action on Research Implementation in Health Services (PARiHS) framework [1,2] and related literature [3,4], the ACT was constructed to measure healthcare providers' and care managers' perceptions of modi�able dimensions of organizational context; their responses can then be aggregated to provide nursing unit and/or organizational (e.g., hospital or nursing home or home care office) estimates of context. ree principles informed the development of the ACT: (1) use of the PARiHS framework and related literature to identify a comprehensive set of contextual concepts, (2) brevity-it could be completed in 20 minutes or less, and (3) a focus on modi�able (and therefore researchable) elements of context [5]. e survey now exists in four versions (acute-adult care, pediatrics, long-term care, and home care) and six forms: (1) regulated nursing care providers-registered nurses and licensed practical nurses; (2) unregulated nursing care providershealthcare aides; (3) allied health providers; (4) physicians; (5) practice specialists (e.g., clinical educators); and (6) unit care managers. It is being used in eight countries (Canada, United States, Sweden, Netherlands, United Kingdom, Republic of Ireland, Australia, and China) and is available in �ve languages (English, �utch, Swedish, Chinese, and French). e index version of the survey (English, acute care regulated nurses) contains 56 items representing eight dimensions and 10 concepts: (1) leadership, (2) culture, (3) evaluation, (4) social capital, (5) informal interactions, (6) formal interactions, (7) structural and electronic resources, and (8) organizational slack (representing three subconcepts: sta�, space, and time). �e�nitions of the eight dimensions, and a description of their operationalization, are presented in Table 1. Content validity (i.e., the extent to which the items adequately represent the content domain of the concept) was established by members of the research team responsible for its development and with expertise in the context �eld. No quanti�cation (e.g., content validity index) of content validity has been performed to date e instrument was originally developed for acute (adult) care and modi�ed for use in pediatrics, nursing homes, and home care. Response processes validity (i.e., how respondents interpret and expand on item content) was completed in all four settings [6][7][8].

Traditional Psychometric Assessment of the Alberta Context Tool
To date, two preliminary traditional psychometric assessments of the ACT have been published [5,9]. e �rst assessment used scores obtained from pediatric nurse professionals enrolled in a national, multisite study [5]. In that analysis, a principal components analysis (PCA) indicating a 13-factor solution was reported. Bivariate associations between instrumental research utilization (which the ACT was developed to predict) and a ma�ority of ACT factors as de�ned by the PCA were statistically signi�cant at the 5% level. Each ACT factor also showed a trend of increasing mean scores ranging from the lowest to the highest level of instrumental research use, adding additional validity evidence. Adequate internal consistency reliability using Cronbach's alpha coefficients was reported; alpha coefficients ranged from 0.54 to 0.91 [5]. A subsequent validity assessment was conducted on responses obtained from healthcare aides (i.e., unregulated nursing care providers) in residential long-term care settings (i.e., nursing homes) [9]. e overall pattern of the ACT data (which was assessed using con�rmatory factor analyses) was consistent with the hypothesized structure of the ACT. Additionally, eight of the ten ACT concepts were related, at statistically signi�cant levels, to instrumental research utilization, supporting its validity. Adequate internal consistency reliability was again reported with alpha coefficients for eight of ten concepts exceeding the accepted standard of 0.70 [9]. Additional details on both of these preliminary assessments is available elsewhere [5,9]. ere are now sufficient ACT data collected from nursing care providers (i.e., registered nurses, licensed practical nurses, and healthcare aides) and allied healthcare professionals across a variety of healthcare settings to conduct advanced psychometric assessments on scores obtained with the instrument. is will allow researchers and decision makers to use the survey, with greater con�dence, to inform the design and evaluation of context-focused interventions as a means of improving research use by nursing care and allied providers. In this paper, we present a protocol for advanced psychometric assessments of surveys that is based on the Standards for Educational and Psychological Testing (i.e., the Standards). We use the ACT, for which this protocol was developed, as an exemplar survey of which this protocol can be applied. Application of the protocol to the ACT is currently underway.

A Protocol for Advanced
Psychometric Assessment e Standards, considered best practice in the �eld of psychometrics [10], follows closely the work of American psychologist Samuel Messick [11][12][13], who viewed validity as a unitary concept with all validity evidence contributing to construct validity. Validation, in this framework, involves accumulating evidence from four sources (content, response processes, internal structure, and relations to other variables) to provide a strong scienti�c basis for proposed score interpretations. It is these interpretations of scores that are then evaluated for validity, not the instrument itself. e source(s) of evidence sought for any particular validation is determined by the desired interpretation(s) [14]. Content evidence refers to the extent to which the items included in an instrument adequately represent the content domain of the concept of interest. Response processes evidence refers to empirical evidence of the �t between the concept under study and the responses given by respondents on the item(s) developed to measure the concept. Internal structure evidence examines the relationships between an item set. Relations to other variables evidence examines relationships between the concept of interest (e.g., the 10 concepts in the ACT) and external variables (e.g., research utilization in the case of the ACT) that it is expected to predict or not predict, as well as relationships to other scales hypothesized to measure the same concept(s) [15]. �ur psychometric protocol speci�cally addresses: (1) data preparation (which is o�en necessary to recon�gure and merge multiple datasets to conduct advanced and rigorous psychometric analyses; there is little guidance in the literature on how to do this) and (2) advanced psychometric data analyses that are in line with the Standards. Robust psychometric analysis of survey data should involve examining the data for: (1) validity, (2) reliability, and (3) acceptability [16][17][18]. erefore, this protocol includes each of these components. Validity refers to the extent to which a measure achieves the purpose for which it is intended, and is determined by the "degree to which evidence and theory support the interpretations of test scores entailed by proposed users of tests" ( [15], page 9). Reliability refers to the consistency of measurement obtained when using an instrument repeatedly on a population of individuals or groups [15]. Acceptability refers to ease of use of an instrument [17]. While multiple reports and descriptions of these analyses can be located in the literature [15][16][17], several limitations are noted. First, there has been no attempt to synthesize the information into a usable protocol. Second, few reports mention acceptability, which is a core component of psychometrics. ird, most current psychometric literature in nursing and health services research includes descriptions of analyses based solely on Classical Test Score Measurement eory and that are "exploratory" in nature. For example, few reports explore alternatives to traditional (Cronbach's alpha) reliability testing. A rigorous assessment of reliability should go beyond Cronbach's alpha and also include an assessment of variances or standard deviations of measurement errors and item and test/scale information functions (using Item Response or Modern Measurement eory). Finally, with respect to validity, most publications limit their discussion to "types" of validity and report methods of limited robustness such as correlations and principal components analysis; little attention is given to rigorous multivariate assessments such as regression and structural equation modeling. A central reason we chose the Standards as the guiding framework for our protocol is because it provides a contemporary view of validity. Traditionally, three types of validity are oen discussed: content validity, criterion-related validity (which included concurrent and predictive validity), and construct validity. is holy trinity conceptualization of validity as labeled by Guion [19] has dominated nursing and health-related research method textbooks. While this way of conceptualizing validity has been useful, it has also caused problems and confusion. For example, it has led to compartmentalized thinking about validity, narrowing and limiting it to a checklist type of approach. It has made it "easier" to overlook the fact that construct validity is really the whole of validity theory, that is, that validity is really a unitary concept. It has also resulted in validity being viewed as a property of the measure (instrument) rather than a property of the scores obtained from a measure when it is used for a speci�c purpose with a particular group of respondents. erefore, in the psychometric protocol (presented next), we take a unitary approach to validity assessment.

Methods
e psychometric protocol presented in this paper addresses all three core components of survey psychometrics: acceptability, reliability, and validity. We focus on advanced aspects of validity (i.e., internal structure and relations with other variables' validity evidence) in order to construct robust validity arguments for survey data. e protocol is divided into two phases: (1) data preparation and (2) data analysis. ese phases will be applicable to psychometric assessment of all multi-item survey instruments.

Phase I: Data Preparation.
Robust psychometric assessment oen requires the combination of multiple data collections. We will conduct a psychometric analysis of ACT data across seven unique data collections (See Table 2). e data comprise: (1) various provider groups (healthcare aides, licensed practical nurses, registered nurses, and allied healthcare professionals); (2) settings: (adult hospitals, pediatric hospitals, nursing homes, and community care); and (3) survey administration modes (pen and paper, online, and computer assisted personal interview). In addition to data on the ACT, some of these collections also contain data on knowledge translation (de�ned as research utilization, which the ACT was developed to predict), individual factors (e.g., attitude towards research), care provider outcomes (e.g., burnout), and patient/resident outcomes (e.g., number of falls) which context (through research utilization) is hypothesized to predict. ese additional variables are necessary to perform advanced psychometric analyses on the ACT. Demographic data �les accompany all seven data collections. Collections 1-6 include items on knowledge translation; collections 1-4 include items on care provider outcomes; and collections 1-4 include data on patient/resident outcomes. e �rst phase of completing a comprehensive psychometric assessment using survey data from multiple sources is "data preparation". Substantive work is oen required to recon�gure multiple data collections for psychometric analysis. In the case of the ACT, we needed to merge data by provider subgroup to allow for separate (homogenous) analyses for healthcare aides, nurses, and allied healthcare professionals. is work involves detailed "mapping" of survey elements of all data �les to link items (including leadins, stems, and examples of concepts where they exist) and response scales across each data �le by provider subgroup, setting, and survey administration mode. e research team needs to meet regularly to discuss the mapping and address any concerns regarding where items can and cannot be combined to facilitate merging of data �les to create a �le from which the psychometric analyses can be conducted.
With the ACT, survey elements mapped included: interviewer instructions (where a computer assisted interview was undertaken in data collection), lead-in statements (e.g., In answering the following, please focus on….), stems (the standard introduction to the items), examples (e.g., number of resident falls is an example of the context concept of evaluation), survey items, response options, skip pattern instructions, and the order of items within an item set for a concept.

Phase II: Data
Analysis. All initial analyses described next will, in the case of ACT, be conducted for each provider subgroup: regulated nursing care providers (registered nurses, licensed practical nurses), unregulated nursing care providers (healthcare aides), and allied healthcare professionals. Subsequent analyses will be informed by initial analyses and may vary by provider group. Our aims with respect to psychometric assessment of the ACT (and those which frame our protocol) are as follows.
(1) To assess advanced psychometric properties of the ACT for regulated and unregulated nursing care providers and allied health providers by: (a) setting (adult and pediatric hospitals, nursing homes, home care), and (b) mode of administration (pen and paper, online, computer assisted personal interview); (2) To test the theoretical model underpinning the ACT; and (3) To assess performance of the ACT when data are aggregated to higher (e.g., nursing unit and organizational/hospital) levels.
ese aims are applicable to psychometric assessment of most survey instruments.

Objective 1: To Assess the Psychometric Properties of the ACT by Provider Subgroup, Setting, and Mode of Administration
Acceptability. We will assess acceptability of the ACT by examining missing data frequencies for all items and subscales (concepts). We will also assess, where available, the time taken to complete each subscale and the full survey [17,18,20].

Reliability.
Reliability information may be reported in terms of variances or standard deviations of measurement errors, in terms of item response theory test/item information functions, or more commonly, in terms of one or more coefficients. We will assess reliability by calculating internal consistency and information functions. We will calculate three internal consistency coefficients: (1) Cronbach's alpha; (2) Guttman split-half reliability; and (3) Spearman-Brown reliability. Internal consistency coefficients are indexes of reliability associated with the variation accounted for by the true score of an "underlying concept" [17], in our case, each ACT concept. Coefficients can range from 0 to 1; a coefficient of 0.70 is considered acceptable for newly developed scales while 0.80 or higher is preferred and indicates the items may be used interchangeably [17,20]. Information functions are a function of discrimination and item thresholds in item response theory; they present the amount of information provided by an item at a given trait level [21].

Internal Structure Validity.
We will conduct (1) item to total correlations on each ACT concept, (2) item total statistics on each ACT concept (see Table 1 for number of items in each ACT concept), and (3) con�rmatory factor analyses (CFA) on each ACT concept and on all ACT items combined.
From the item to total correlations, items will be �agged for discussion and further evaluation if an item correlates with its scale (concept) score below 0.30 [20]. From item-total statistics, items that, if removed, cause a substantial change in the scale Cronbach's alpha score will also be evaluated further and considered for future revision [22].
In developing the ACT, items were chosen to re�ect coordinated and meaningfully similar dimensions, but were intentionally chosen to be non-redundant. Hence, the ACT does not exactly match the unidimensional causal requirement of the factor model (tested by CFA). However, the coordination or clustering of meaningfully similar items by substantive similarity, and relevance to potential interventions, render factor speci�cations the closest statistical model for testing the ACT's internal structure. Further, the similarity of items within each contextual dimension (e.g., leadership, culture, evaluation) renders the CFA approach appropriate for a Standards assessment. We will therefore use CFA to determine how well the de�ned measurement models for each ACT concept (and all ACT items combined) �t our observed data. A 4-step approach will be used as follows.
(1) Model speci�cation (the proposed measurement model for each ACT concept and the complete ACT will be speci�ed), (2) parameter estimation (maximum likelihood estimation will be used), With respect to model �t, we will evaluate parameter estimates for direction, magnitude and signi�cance of e�ects. Recent discussions of structural equation model testing [23,24], state chi-square is the only appropriate model test, and have questioned the justi�ability of �t indices such as the root mean square error of approximation (RMSEA), the standardized root mean squared residual (SRMSR), and the comparative �t index (CFI). While we are inclined to agree with the critiques of the indices, we are hesitant to entirely disregard them due to their previous popularity and use [18,25,26]. Given the shiing statistical view of indices, we will report relevant index values in addition to chi-square to assist comparison to published measurement assessments but we will be cautious about basing conclusions on �t indices.

Relations with Other Variables Validity.
Prior to using modeling techniques to test the theoretical model underpinning the ACT (Objective 2), we will examine each ACT item (by scale) for its association with our demographic and dependent variables in the respective datasets (e.g., with research utilization and outcome variables such as healthcare provider health status and burnout). e statistical measure used will depend on the measurement level of the other variable (e.g., a correlation coefficient will be used to examine associations between ACT items and research use). Items within the same scale should correlate at similar magnitudes with the other variables being assessed. Items within a scale that display a pattern uncharacteristic of the other items in the same scale will be further scrutinized with respect to their relations with additional variables.

Objective 2: To Test the eoretical Model
Underpinning the ACT. e ACT was developed based on the premise that a more favorable context leads to higher research use and improved health outcomes of healthcare providers and consequently, improved patient and resident health outcomes (through research use). We will empirically test this theoretical premise using regression and structural equation models. We will construct a series of regression models that examine the relationships between the dimensions of the ACT as independent variables, and research utilization and other outcomes (e.g., care provider burnout) as dependent variables. We will then test a series of structural equation models (SEM) to empirically validate the theoretical (latentlevel) model underpinning the ACT. is will allow us to advance our psychometric assessment by simultaneously assessing both the measurement and the latent structures of the ACT.
Our SEM models will be speci�ed for each provider subgroup and tested according to the various: (a) settings (adult hospitals, pediatric hospitals, nursing homes, and home care) and (b) survey administration modes (where sample size is sufficient). e models will include demographic variables (as exogenous variables), ACT variables (as endogenous variables), and outcome variables, for example, research utilization (as �nal endogenous variables). We will follow the same 4-step approach previously identi�ed for CFA: (1) model speci�cation (the proposed measurement model for each ACT concept and the complete ACT will be speci�ed), (2) parameter estimation (maximum likelihood estimation will be used), (3) assessment of model �t, and (4) model modi�cation and retesting (as appropriate).

Objective 3: To Assess the Performance of the ACT with
Data Aggregated by Provider Subgroup to Care Unit and Organizational Levels. When developing the ACT, items within the various scales were constructed to direct respondents' attention to common experiences on a particular nursing unit or organization (hospital, nursing home, or residential home/office depending on the context of their care delivery) in order to ensure that the ACT was meaningful at these levels. As a �nal test of reliability and validity, we will assess performance of the ACT scales when aggregated to the nursing unit and organizational level by calculating four indices: ICC(1), ICC(2), 2 , and 2 . One-way analysis of variance (ANOVA) will be performed on each ACT scale (concept) using the unit as the group variable. e source table from the one-way ANOVA will be used to calculate the four standard aggregation indices [27]. ICC(1) is a measure of individual score variability about the subgroup mean. ICC (2) is an overall estimate of the reliability of group means and provides an index of mean rater reliability of the aggregated data [27]. 2 , and 2 are measures of validity, also known as measures of "effect size" in ANOVA. An effect size is a measure of the strength of the relationship between two variables and thus, illustrates the magnitude of the relationship. 2 denotes the proportion of variance in the individual variable (in each ACT concept) accounted for by group membership (e.g., by belonging to a speci�c nursing unit) [28]. is value is equivalent to the -squared value obtained from a regression model, and where group sizes are large, to ICC(1) [29]. Omega ( ) measures the relative strength of aggregated data as an independent variable. It is also an estimate of the amount of variance in the dependent variable (e.g., in each ACT concept) accounted for by the independent variable (i.e., by group membership-belonging to a speci�c nursing unit) [30]. Larger values of 2 and 2 indicate stronger effect sizes and relationships between variables. As a result, larger values of 2 and 2 also indicate stronger "relations to other variables" validity evidence (as described in the Standards validation framework) and thus, contribute to overall construct validity.

Conclusion
Assessment of the psychometric properties of scores obtained with a survey is critical to obtaining reliable and valid research �ndings. In this paper, we present a protocol for advanced psychometric assessments of surveys that is based on the Standards for Educational and Psychological Testing (the Standards), considered "best practice" in instrument development and psychometrics [10]. We believe this protocol can be applied to all nursing and related surveys that contain likert-type multi-item scales. Knowing the psychometrics of a survey will, in turn, allow researchers to have greater con�dence in their �ndings and use them to inform the design and evaluation of subsequent phases of their research such as in interventions to improve nursing care and patient outcomes. In this paper, we illustrated the newly developed psychometric protocol using the Alberta Context Tool (ACT) as an exemplar survey to which it can be applied; application of the protocol to the ACT survey is currently underway.

Ethical Approval
Ethical approval to conduct the analyses outlined in this protocol was provided by the University of Alberta Research Ethics Board.

Con�ict o� �nterests
e authors declare that they have no con�ict of interests.