Psychometric Properties and Validation of the Arabic Academic Performance Rating Scale

1Dental Public Health, Department of Quality Assurance, College of Dentistry, Jazan University, Jazan, Saudi Arabia 2Department of Dental Public Health, Universiti Sains Malaysia (USM), Penang, Malaysia 3National Accreditation Authority for Translators and Interpreters, Sydney, NSW, Australia 4Department of Quality Assurance, College of Medicine, Jazan University, Jazan, Saudi Arabia 5Department of Quality Assurance, College of Applied Medical Sciences, Jazan University, Jazan, Saudi Arabia 6Department of Health promotion, Maastricht University, Maastricht, Netherlands


Background
Education is an integral part of a country's skill based as well as financial based development [1].The core of teaching and learning experience among the students' seeking education is its assessment.It ascertains their achievement in each subject or course in a curriculum [2].Standard test score (STS) is one of the popular methods among most institutions for assessment or judgment of a students' academic performance.In Saudi Arabia, assessment is mostly done through standard tests which follow a predesigned curriculum [3].In this, all the test takers are required to answer similar set of questions which are derived from a common question bank [3][4][5].So, standard tests help in attaining scores in a consistent manner through a standardized process.This type of objective assessment is intended to give a fair achievement analysis of each student in their respective courses or subjects.But, it fails to evaluate the short term and long term needs of a child in a particular course or subject; that is, it does not report on educational experience in a broader perspective [5][6][7].It further fails to identify children with learning difficulties so that the nature of support and assistance that they need can be ascertained.It also fails to assist in designing and implementation of appropriate strategies in concordance with specific difficulties that the students may have encountered [6,8].Most importantly, some researchers have also raised concern over its reliability [4].
Thus, traditional standard test scores (STS) are solely not enough, and a subjective analysis should be added to complete the overall assessment [9].Through this, teachers will be able to observe and report on a more distinguished sample of academic content in comparison to traditional standard tests.This will also assist in providing input on social validity data of each student [10,11].Few teacher rating 2 Education Research International scales were successfully tested and validated in different domains of school experience.Among these are children's behavior rating scale [12], classroom adjustment rating scale [13], teacher child rating scale [14], social skills rating system [15], and behavior and emotional rating scale [16].These scales are psychometrically sound but possess limitations in monitoring and reporting of academic skill deficits.In addition, the calculation of academic work completion and accuracy rates across each course or subject is also not possible.To overcome this an Academic Performance Rating Scale (APRS) was developed and validated successfully by a group of researchers [17,18].
APRS (original English version) reflects teachers' perception of a child's academic performance.Initially, thirty items were generated based on suggestions provided by several classroom teachers, school psychologists, and clinical child psychologists.From the original thirty, 19 items were retained with a feedback obtained from another set of teachers, principals, and school and child psychologists regarding item content validity, clarity, and importance [17].The APRS final version comprised items focusing on one global construct, that is, academic performance through a series of questions (items), like work performance (e.g., estimate the written math work completed relative to classmates) of each child in different subjects; academic success (e.g., what is quality of the child's reading skills?);academic behavior (e.g., how often does the child begin written work prior to understanding the directions?); and academic attention (e.g., how often the child is able to pay attention without you prompting) [17].
Higher educational institutions and graduate producing universities in the Middle East are gradually shifting to English as their medium of instruction due to execution of new educational policies.But, owing to the belief of maintaining and transferring culture and tradition to the coming generations, most national schools (primary and secondary) still follow Arabic as their medium of instruction.On searching the published literature, it is observed that currently there is no subjective assessment scale in Arabic language to assess academic experience of school going children.A questionnaire based assessment if made available will not only reveal the academic/learning experience of Arab children in a broader perspective, but also help in acquiring large-scale analysis to provide quantitative as well as qualitative evidence of the school academic performance in the region.Thus, with this intention, the current study is aimed at providing a validated version of the Academic Performance Rating Scale in Arabic language.

Study Setting.
This study was conducted in Jazan city of Saudi Arabia, which is situated at southern tip of Arabian Peninsula bordering Yemen.Infrastructure of Jazan is still in development phase unlike Riyadh and Jeddah which are considered as the major cities of Saudi Arabia.Mode of instruction of schools selected in this study is Arabic.

Arabic Translation.
The translation process involved a qualified bilingual translator who carried out the initial translation of the questionnaire into Arabic language (Arabic-APRS-19, A-APRS-19) while maintaining the conceptual understanding of each question.A second bilingual translator then carried out back translation of the Arabic version into English.A cross examination of the Arabic to English translation found no discrepancies.

Content Validation.
Validation was done to know if the questions are fully assessing the intended outcome (single global construct).A rational analysis of the questionnaire was done by the experts, specifically focusing on the readability, clarity, comprehensiveness, and level of agreement using Likert scale.Three school principals were requested to rate the A-APRS-19 based on the importance of each item (4-point scale: 0 = not at all; 1 = low; 2 = somewhat; 3 = high) and also the consistency between the items (4-point scale: 0 = not at all; 1 = weak; 2 = adequate; 3 = strong) [19].

Ethical Consideration and Questionnaire Administration.
A convenient sample of 100 school children was identified from four different schools in Jazan city of Saudi Arabia.Permission from the review board at Jazan University was followed by permission from school authorities before recruiting the study participants.A face-to-face interview along with a written consent was obtained from both the school teachers and at least one parent of the child.Only healthy children of Arab nationality were included to maintain the homogeneity of the sample.Classroom teachers and children were categorized into 7th grade, 8th grade, and 9th grade, respectively.None of the teachers or child parent refused to be part of the study.

Statistical Analysis.
In order to assess the reliability of A-APRS, internal consistency and test-retest reliability were computed.Internal consistency was measured using Cronbach's  coefficient.Arabic version of APRS (A-APRS) was stated to be internally consistent if it acquired an  coefficient of at least 0.70 [20].For investigating the stability of A-APRS across times, a test-retest reliability analysis and the intraclass correlation coefficients (ICC) using Pearson's  were calculated (ICC agreements: <0.40, poor to fair; 0.41-0.60,moderate; 0.61-0.80,good; >0.80, excellent) [21].The distribution of the A-APRS across three different educational levels (7th grade, 8th grade, and 9th grade) was tested to explore discriminant validity and confirm differences, through a nonparametric test (Kruskal-Wallis).
Exploratory factor analysis (EFA) using principal component analysis and Rasch analysis were conducted to report on validity of the questionnaire.As the applicability of Rasch model is dependent on the assumption of unidimensionality (i.e., items defining a single global construct: academic performance), an EFA was first conducted to identify latent constructs (dimensions) in the A-APRS [22].Later, EFA factor structures were evaluated using a Scree plot and the standard multiple criteria include eigenvalue greater than 1.0 [23].Average measures and step measures (+SE) were ordered and the mean-square outfit statistic for each category was also evaluated [24].All the data entry and statistical analysis were done using SPSS version 22 (IBM, USA).

Results
The total sample size was 100, out of which the male school children were 40.6% and the female children were 59.4%.Mean age of study participants was 9.59 ± 1.38 years.In terms of location, 62.5% resided in the urban suburbs, whereas 37.5% belonged to rural suburbs.Previous semesters' standard test scores (STS) were also obtained in order to check its relation with the A-APRS.It revealed that 56.3% of school children were below average (below the optimum requirement).A cross tabulation of A-APRS with other measured variables (parents' education, location, etc.) was performed and the results are reported in Table 1.The students who scored less in the traditional STS have also scored less in their subjective analysis ( < 0.05).

Reliability. Nineteen-item Arabic version of APRS (A-APRS-19
) was subjected to reliability tests by assessing internal consistency through Cronbach's Alpha.The value obtained was 0.68 (Table 3) which was less than the expected value of 0.70.On examining the item correlation matrix (matrix of Pearson-type correlations), it was seen that four of the nineteen items were not in concordance with rest of the items.Linear influence of these variables is corrected by taking them out of the matrix.After removing and recalculating Cronbach's Alpha, a value of 0.90 (Table 3) was obtained indicating that the fifteen-item A-APRS-15 was internally consistent.Descriptive statistics of both A-APRS-19 and A-APRS-15 are displayed in Table 2.
To measure the intraclass correlation (ICC), test-retest reliability was performed.A-APRS-15 was distributed twice for the study sample in two consecutive weeks.Care was taken while repeating the measurement as the original scale (APRS) indicates that the rating for each student is represented only for the previous week [17].Correlation value of 0.91 showed that there was an excellent agreement between the repeated administrations [25].

Validity.
Currently there is no other Arabic academic rating scale in order to see the relation between them in measuring the same outcome.Administration of A-APRS-19 and A-APRS-15 to the same population revealed a significant correlation.Also, no significant differences in A-APRS-15 across the category of educational levels were seen ( > 0.05).A-APRS-15 scored consistently with the obtained standard test scores ( = 0.01) (Table 1).Content validation through experts revealed no major corrections based on a mean rating cut-off of 2.5 or higher [19].Category function statistics revealed a positive analysis of the structural validity of the A-APRS five-point Likert scale; that is, function of assessment was acceptable as each category count was equal to or more than 10.
Model's unidimensionality was confirmed by examining the explained and unexplained variance.Exploratory factor analysis (EFA) revealed that A-APRS-15 across its items measured the outcome (single global construct, that is, academic performance) successfully, thus representing onefactor analysis.Through pattern matrix analysis, it was seen that the items were significantly correlated with each other and a chi square goodness of fit value of 0.02 was obtained.Sampling adequacy was assessed with Kaiser-Meyer-Olkin measure which revealed a value of 0.78 indicating the sample used was nearly adequate.Bartlett's test of sphericity significance value 0.00 indicated equal variances (homogeneity) across the sample.
A single-factor solution was reported using the criterion of eigenvalue larger than 1 (Figure 1).Examining the "percent of variance explained" suggested that this one-factor solution explained 84% of variance.The Scree plot illustrated a pronounced drop after the first factor, further confirming a strong one-factor solution (Figure 1).Unidimensionality and redundancy of the items in the questionnaire were evaluated in a Rasch analysis using the Partial Credit Model as displayed in a tabulated format (Table 4).For the sake of improvement, the misfitting items as well as items minimizing overlap in the level of difficulty represented in the scale were omitted.They were classified as redundant if the infit/outfit mean square (MNSQ) was less than 0.50 or if it was more than 1.37.Results supported the interpretation of adequate fit of items, since the infit ZSTD (1.55-0.46),outfit ZSTD (1.59-0.99),infit MNSQ  (0.74-1.47), and outfit MNSQ (0.50-1.37) for the items were within acceptable ranges (Table 4).

Discussion
This is the first study that attempted to provide a reliable and valid subjective Academic Performance Rating Scale in Arabic.Overall result of both the exploratory factor analysis and Rasch analysis of the data indicated A-APRS-15 (Appendix -B in Supplementary Material available online at https://doi.org/10.1155/2017/1504701)as a highly reliable single construct scale.Exploratory factor analysis also provided the preliminary evidence of high internal reliability as evaluated by high alpha statistic.Person and item reliability statistics revealed and confirmed the highly reliable singlefactor structure of A-APRS-15.Initial analysis of A-APRS-19 revealed that items 13, 15, 18, and 19 were not in alignment with the rest of the items.The overall outcome measure was not reliable as Cronbach's Alpha value was less than the expected 0.70.Thus, it was necessary that these items be removed in order to achieve a more reliable scale so that all items contribute towards measuring one outcome (academic performance).Test-retest of A-APRS-15 also revealed a high correlation value which means that A-APRS-15 yielded consistent results over repeated administrations.Rasch analysis was then run on the modified scale termed as A-APRS-15 indicating this to be a validated and reliable single construct scale.
A feedback from the classroom teachers through a verbal survey suggested that most teachers found A-APRS-15 to be a potentially useful tool to measure overall academic performance of a child.The average time required for each teacher to complete the A-APRS-15 was around 8-10 minutes per student.Some teachers also suggested that, apart from calculating academic experience of major subjects like math, science, and literature, there should be inclusion of other important subjects/courses when developing and validating new academic scales.
Considering the limitations of the current subjective analysis of academic performance, the authors admit that there could be a possible inherent bias.As data was not normally distributed there are possibilities of restricted range in the correlation statistics.The current study does not enforce the subjective assessment as a sole measure for a child's academic experience.On the contrary, it is to inform that this subjective assessment will help in better understanding of areas where the child faces difficulties during his or her learning experience in major subjects so that proper measures are taken, thus lending a helping hand towards a brighter future.A-APRS-15 measure of academic performance will be an easy assessment that can be used in future studies to see its relation with other variables such as general health and oral health.

Figure 1 :
Figure 1: Scree plot created through factor analysis.Eigenvalue of 1.0 was considered as the cut-off value for obtaining the Scree plot.

Table 1 :
Background characteristics of study population.
* Significant with chi square value less than 0.05; NS: not significant.
a Single global construct; Partial Credit Model was used for Rasch analysis.