An Adaptation and Validation of Students’ Satisfaction Scale: The Case of McGraw–Hill Education Connect

(is study aims to adapt and validate an Arabic version of the students’ satisfaction scale. It tries to measure students’ satisfaction with the McGraw–Hill Education Connect platform in Saudi Arabia. It provides Saudi and Arab academics with a valid instrument for further studies and interventions to improve students’ learning and environments. (e study examined items to establish content, construct, convergent, and discriminant validity. It used two-phase Chemistry 101-student samples (N� 50 and N� 193). (e exploratory factor analysis (EFA) using the maximum likelihood extraction method and the Promax rotation method was used to explore the survey’s constructs in the pilot phase. It supported the five-factor construct of the survey. (ree competitive construct models were investigated using the confirmatory factor analysis in the main phase study. (e model that fitted the study data and satisfied reliability and validity standards was a second-order model identifying two primary constructs distinctively: satisfaction (N� 3, α� 0.912) and utility (N� 19, α� 0.965). (e utility scale was composed of four subscales: understanding (N� 5, α� 0.913), studying (N� 3, α� 0.896), preparation (N� 4, α� 0.893), and usability (N� 7, α� 0.913). (e results indicated that student’s overall satisfaction with MCGH Connect was significantly met (M� 3.52, SD� 0.176). Also, students were significantly satisfied with the MGHE Connect utility (M� 3.51, SD� 0.221). (e highest level of satisfaction was understanding (M� 3.60, SD� 0.170), and the lowest was with preparation to classes (M� 3.23, SD� 0.259). Students were equally satisfied with using MGHE Connect to understand the materials, study and review for exams, and friendliness.


Introduction
Nowadays, blended learning (b-learning) became simplified learning and the new traditional approach among higher education institutions [1][2][3]. B-learning can be seen as a combination of traditional teaching and the e-learning environment based on the principle that face-to-face and online activities are optimally integrated into a unique learning experience [4][5][6][7].
Moreover, educators' call given the advancements in computer technology and the Internet had led textbook publishers to increase the incorporation of pedagogically related technological supplements [8]. Text technology supplements (TTSs) are considered as specific technologies in the broader category of computer-assisted learning [8]. TTS becomes more prolific in higher education as complementary tools to assist student learning [9].
Many publishers and researchers claimed that textbooks' supplement products improve learning efficiency, time management, in-class discussions, student engagement, personalized learning experience, exam scores, course grades, and overall satisfaction with the course and course work [2,[10][11][12]. However, others disagreed [13][14][15]. Moreover, these systems provide just-in-time feedback to students and let instructors intervene at the right time to support students [16,17]. A McGraw-Hill Education (MGHE) product is one of those supplements that uses interactive learning technology to enable a more personalized learning experience by enhancing students' engagement with the course content and learning activities [12].
Learners' satisfaction with b-learning played a crucial role in evaluating its effectiveness and measuring such programs' quality [1,2,[18][19][20][21]. Institutions implement b-learning to meet learners' needs; thus, it is equally important to measure their perceived satisfaction to determine programs' effectiveness [2]. Evaluating learning effectiveness and learners' satisfaction is interconnected [1,[22][23][24]. Learners' satisfaction is positively correlated with the quality of learning outcomes, and studies established a relationship between students' perception of satisfaction and their learning environment and their quality of learning [25][26][27]. Learners' satisfaction is critical for learners to continue using blended learning [25]. at is why institutions involved in blended learning should be concerned about increasing learning satisfaction. Chen and Tat Yao summarized that it is essential to understand learners' attitudes, perceptions, acceptance, and satisfaction to evaluate instructional design success based on technology [2].
Moreover, institutions can intentionally provide learning environments with appropriate supplements when the factors influencing students' satisfaction were identified [25]. Understanding the factors influencing student satisfaction with blended learning can help design a learning environment and positively impact the student learning experience [25]. Standard measures of learners' satisfaction in blended courses use students' overall satisfaction with the experience, perceived quality of teaching and learning, and ease of use of technology [20,21]. Although students' satisfaction is not necessarily associated with achievement, satisfied students are more likely to accomplish their cognitive goals [27].
Although students' satisfaction concerns institutions to provide quality education, the field remains in a preliminary stage where more valid and reliable instruments are needed [28]. Also, there is a need to deeply understand the components of perceived satisfaction and quality of blended learning [27]. Interventions based on reliable data that support students' learning are critical [29]. Accurate data is necessary to support learning improvement and measure progress toward the goal [29]. e survey results are usually used to make recommendations for intervention curriculum, faculty training, and products directed at developing teaching methods. at is why it is essential to rely on welldesigned and validated instruments [29,30].
Instruments that have been initially developed in a particular language for use in some contexts could be made appropriate to use in one or more other languages or contexts [31,32]. In such cases, the translation/adaptation process aims to produce an instrument with comparable psychometric qualities as the original following a specific procedure [30,31,[33][34][35], and the instrument developer should evaluate the target population's validity [31].
Validity is an instrument's ability to measure what is supposed to measure a latent construct [36] (p. 55). It could be validated by examining the content, construct, convergent, and discriminant validity [36] (p. 55) [37]. Content validity can be established when subject matter experts examine the constructs, including the definitions and items for each construct [28]. Once the content validity is established, the instrument is implemented to examine the construct validity. e purpose of construct validity is to determine if the constructs being measured are a valid conceptualization of the phenomena being tested [28]. If given items did not load on the intended construct, they should be eliminated, as they were not an adequate measure of that construct [37]. In confirmatory factor analysis (CFA), construct validity is achieved when the fitness indices for a construct are satisfied [30].
Convergent validity is achieved when all items in a measurement model are significantly correlated with the respective latent constructs [30,35,38]. It could also be verified by using the average variance extracted (AVE) for every construct. e AVE estimate is the average amount of variation that the latent construct can explain in the observed variables to which it is theoretically related [37,39] (pp. 600-638). e AVE of all latent constructs should be above 0.5 to establish convergent validity [40]. Discriminant validity indicates that the measurement model of a construct is free from redundant and unnecessary items. It checks if items within a construct intercorrelate higher than they correlate with other items from other constructs theoretically supposed not to relate [30,37,38,41].
is study aims to adapt and validate an Arabic version of the students' satisfaction scale. It aims to measure students' satisfaction with the MGHE Connect platform in Saudi Arabia and provide Saudi and other Arab academics with a valid instrument for further studies and interventions to improve students' learning and environments.

Materials.
e study started with a survey proposed by Gearhart [8]. Gearhart's survey consisted of 30 items covering four general categories of perception: satisfaction (N � 5, α � 0.87), utility (N � 12, α � not reported), usability (N � 9, α � 0.87), and perceived value (N � 4, α � 0.91). Utility-scale items consisted of three subscales: understanding (N � 4, α � 0.66), studying (N � 4, α � 0.73), and preparation (N � 4, α � 0.87). Gearhart described these categories as follows [8]: (i) Satisfaction concerns whether the tool generally met the needs of the students (ii) Utility relates to how students used the technology, and it includes three subscales: (iii) Understanding reflects the degree to which students thought Connect helped them to comprehend the material better (iv) Preparation measures the students' use of Connect to introduce course content before discussions and lectures (v) Studying assesses the use of technology to review for exams (vi) Usability gauges student perceptions about access and user-friendliness (vii) e perceived value indicates if it is worth it (p.13) e survey's response scaling ranged from 1 � strongly disagree to 5 � strongly agree. Two items from the 2 Education Research International satisfaction subscale, items 3 and 5, were negatively worded and reverse-coded before analysis. Although students' perceptions survey developed by Gearhart consisted of 30 items [8], when the researcher contacted the author to get the permission and the items list, the researcher received a list of 34 items classified as follows: satisfaction (N � 5), utility (N � 16), usability (N � 9), and perceived visibility (N � 4) [42]. Satisfaction and usability scales were identical in both versions, while the utility scale was not. In [8], the utility-scale items (N � 12) were categorized into three subscales: understanding, studying, and preparation, while in [42], all items (N � 16) were grouped under one scale.
us, the researcher started the investigation with more items to reach the final validated scale.
Nevertheless, since the licenses of access to students in this study were free according to some arrangements between Yanbu Industrial College and McGraw-Hill company agent, the perceived visibility was excluded from this adaptation study. us, the researcher started the adaptation work using 30 items where five represented satisfaction, 16 represented Utility, and nine represented usability (see Appendix A).

Procedure.
e researcher followed the International Test Commission (ITC) Guidelines and other literature for translating and adapting tests [30,31,33,[43][44][45][46][47][48][49][50]. e 30item survey was translated from English into Arabic by two bilingual experts, and the translations were discussed with the researcher to consolidate in one version. en, the translated version was translated back into English by two other bilingual experts. Semantic adaptation and some corrections and discussions were made by the researcher and the other two experts to consensus the translated survey's initial version. e translated version was then sent to seven professional subject matter experts in educational technology, e-learning, computer science, and chemistry to review the content relevance to constructs and the items for each construct. One item from the Usability scale was deleted (item 22) since 71% of reviewers disagreed that it was relevant to Usability or other scales.
After verifying the content validity, the survey was administered using a 50-participant pilot sample to examine the instrument and its items empirically using EFA. e researcher also interviewed ten respondents to check if they had any questions, concerns, or comments about the survey. e survey was then ready for further empirical investigations in the primary phase using CFA.

Participants.
e study used two samples in two phases. In the pilot study phase, a cluster sample was used by selecting two sections randomly out of ten sections (each section has on average 25 students) of Chemistry 101 offered in Yanbu Industrial College located in the west region of Saudi Arabia in the Fall semester of 2019. e two-section sample was composed of 55 students. e sample students in that semester used MGHE Connect for the whole semester as a supplement platform in addition to the face-to-face method. After using MGHE Connect for 15 weeks and before the final exams started, the students were invited to respond to that phase's survey. e number of students who responded was 50, and the participants' ages ranged between 19 and 21.
Next semester, Spring 2020, there were also ten Chemistry 101 sections (each section has on average 23 students) whose students used MGHE Connect in the same way. After 15 weeks of using MGHE Connect in their learning, they were invited to respond to the survey. is final implementation sample was composed of 193 students whose ages were between 19 and 22 years.
e AVE of each construct should be higher than its maximum shared variance (MSV) with any other construct [62,63]. e shared variance (SV) is represented by the square of the correlation between any two constructs [37].
e study used Cronbach's alpha (α) to assess the reliability (α should be >0.7) [61,64] and the item-total correlation between each item and its construct (r should be >0.4) [65]. e composite reliability (CR) of each latent variable was also estimated because it is a more suitable indicator of reliability than Cronbach's Alpha [40,66]. MaxR(H), which refers to McDonald's construct reliability, was also estimated. e coefficient (H) describes the latent construct's relationship and its measured indicators [40]. Means and parametric tests were used to describe Likert scale responses and test the significance of differences [67].

Pilot Study Results.
e study started with the threeconstruct version suggested by the original author [42], and it indicated that Cronbach's alpha was as follows: satisfaction (N � 5, α � 0.785), utility (N � 16, α � 0.935), and usability (N � 8, α � 0.851). e item analyses showed the item-total correlations and alpha if item deleted which are shown in Table 1.
e results indicated that the three proposed constructs were internally consistent. However, three items (3, 20, and 25) had low item-total correlation coefficients (<0.4) and contributed negatively to Cronbach's alpha. us, they were removed from the suggested survey item list. en, EFA was conducted to explore the preliminary constructs of the survey. e Kaiser-Meyer-Olkin measure of sampling adequacy (KMO � 0.772) and Bartlett's test sphericity value (approx. chi-square � 1054.012, df � 325, p ≤ 0.001) indicated that factor analysis is appropriate to use. e EFA solution using Kaiser's criterion retained five factors. e total sum of squared loadings was 65.32%, and the extracted five factors and item loadings are shown in Table 2.
e EFA solution, shown in Table 2, supported the fivefactor construct. ree items (9, 18, and 29) showed crossloadings on factors, and two items (10 and 21) were loaded less than 0.4 on factors.
us, items 10 and 21 were removed, and EFA was conducted again to show the solution in Table 3.
e EFA solution again supported the five-factor construct. All items were loaded adequately on the respective factor (>0.40). Items 11 and 27 were swapped between constructs.
e internal consistency of all constructs/subconstructs was improved, and there was no indication for any further modification required at this phase.

Main Study Results.
In this phase, the survey was implemented to examine construct, convergent, and discriminant validities. Based on the survey's theoretical background [8] and the empirical findings of the EFA in the pilot phase, three proposed underlying construct/subconstruct models of the survey were intended to examine (see Figures 1-3): Model 1: first-order three-factor constructs. is model represented the proposed constructs sent by Gearhart through personal communication in which the survey items were categorized into three constructs: satisfaction (N � 4), utility (N � 13), and usability (N � 7) [42]. In this model, the Utility construct was not classified into more subconstructs, as Gearhart suggested in [8].
is model represented what was suggested by the EFA findings of the pilot study. In this model, the five correlated constructs were satisfaction (N � 4), understanding (N � 5), studying (N � 4), preparation (N � 4), and usability (N � 7).
Model 3: three-factor first-order with higher-order factor constructs. is model was proposed based on Gearhart's inputs in [8] and the pilot study's EFA findings. In this model, the first-order three-factor constructs were understanding (N � 5), studying (N � 4), and preparation (N � 4). ese three factors were constructed under a higher-order factor known as utility (N � 13), which is correlated with two other factors: satisfaction (N � 4) and usability (N � 7).
e three proposed models were tested using CFA with the maximum likelihood estimation method. Kaiser-Meyer-Olkin measure of sampling adequacy (KMO � 0.959) and Bartlett's test sphericity value (approx. chi-square � 4107.110, df � 276, p ≤ 0.001) showed that the Factor Analysis is appropriate to use. e fit indices of the CFA conducted for the three proposed models are as shown in Table 4.  Education Research International e CFA solutions showed poor fit indices for model 1 (CFI < 0.90, GFI < 0.90, TLI < 0.90, RMSEA > 0.08) while both model 2 and model 3 fitted the study data adequately (CFI > 0.90, TLI > 0.90, SRMR < 0.08, SRMSEA < 0.08). Only GFI was below the cutoff (0.90). us, model 2 and model 3 were considered to achieve the construct validity criteria with a slight advantage of model 2.

Reliability and Validity Evidence.
e study started investigating model 2 to examine the construct reliability, convergent validity, and discriminant validity, and the results are as shown in Table 5. e results indicated that the composite reliability (CR) and McDonald's construct reliability (MaxR(H)) were high (>0.7), establishing the reliabilities of all five constructs suggested in model 2. Also, the AVE values of all constructs were above 0.5, establishing the constructs' convergent validity.
However, MSV values were higher than AVE values in all constructs, and the square root of the AVE value, which is shown on the diagonals in bold faces, was not consistently higher than the rest of the interconstruct correlations. is finding indicated that the model did not achieve discriminant validity [40]. us, the researcher moved to model 3 to investigate if it would satisfy the required validities.
When model 3 was examined, the results are as shown in Table 6. e composite reliability (CR) and McDonald's construct reliability (MaxR(H)) were satisfied (>0.7), and all values of the average variance extracted (AVE) of higherorder constructs in model 3 were above 0. 5 Figure 1: First-order three-factor construct (model 1).
the convergent validity of these constructs was established. However, MSV values were higher than AVE values. e square root of the AVE value, shown on the diagonals in bold faces, was not consistently higher than the rest of the higher-order interconstruct correlations, as shown in Table 6. ese findings indicated that model 3 achieved construct reliability, construct validity, and convergent validity successfully but had a discriminant validity problem.
Neither model 2 nor model 3 was acceptable because of the lack of discriminant validity. us, further modifications and investigations were needed to resolve the discriminant validity issue and reach a better solution.

Alternative Models.
A minimal modification was conducted to achieve the discriminant validity while maintaining the content validity that has already been established. Research literature suggests that grouping highly correlated constructs, using higher-order constructs, and eliminating some items could resolve the discriminant validity challenge [37]. e researcher selected model 3 to modify because it already had higher-order constructs. e highest correlation coefficient between constructs in this model was between utility and usability constructs (r � 0.981).
us, the researcher grouped the usability construct with the utility construct to propose a new model 4, as shown in Figure 4.
Education Research International satisfaction and utility constructs was 0.860. Since the square root of the AVE of the satisfaction construct (0.807) was less than the correlation between satisfaction and utility constructs, this model still lacks discriminant validity. Gaskin and Lim suggested deleting item 13 to improve the model [68]. After removing item 13, the fit indices of the new model were χ 2 /df � 2.000 (<3), CFI � 0.941 (>0.9), SRMR � 0.048 (<0.08), and RMSEA � 0.072 (<0.08), indicating that data fitted the proposed model better. AVEs were 0.652 and 0.929 for satisfaction and utility constructs, respectively. e square of the correlation coefficient between satisfaction and utility constructs was 0.741, which was higher than the AVE of satisfaction construct (0.652), indicating that the model still lacks discriminant validity.
One more thing that could help achieve the discriminant validity is to improve the satisfaction construct [69]. e item loadings on the satisfaction construct showed that item 5 had the lowest loading on its respective construct. When item 5 was deleted, AVEs were 0.780 and 0.865 for satisfaction and utility constructs, respectively. e square of the correlation coefficient between satisfaction and utility was 0.734, which was less than the AVE values of the satisfaction and utility constructs, indicating that the model has achieved discriminant validity. e fit indices of the model were χ 2 / df   Figure 3: Higher-order construct (model 3).  is finding indicated that this model successfully achieved construct reliability, construct validity, and convergent validity in addition to discriminant validity. e model constructs, loadings, and variance explained for the optimal model are shown in Figure 5.    When the three proposed models were tested using the primary study sample, the results supported the five-factor constructs (model 2 and model 3) rather than the threefactor construct (model 1). CFA findings consistently supported the construct reliabilities, construct validity, and convergent validity of proposed models. However, none of them achieved discriminant validity. It was clear that the survey items and dimensions were highly correlated and challenging to achieve discriminant validity. Farrell and Rudd in [37] and Yale et al. in [70] suggested solving the challenge using higher-order grouped constructs or increasing the satisfaction construct's AVE. Finally, the study reached the model that satisfied the discriminant, construct, and convergent validity.

Education Research International
Even though high associations among items and constructs make the survey highly internally consistent and reliable, it raised the challenge to achieve discriminant validity. e survey's lack of discriminant validity indicated that different constructs' total scores could not be interpreted clearly. e study tried to detect distinct constructs as much as possible, and it reached the two higher-order constructs solution as shown in model 5. ese study findings, to some extent, supported the construct validity suggested by Gearhart [8]. e survey measures the overall satisfaction and utility dimension, which has four subscales measuring understanding, studying, preparation, and usability.
e t-tests for paired differences between utility subscales are shown in Table 8. Error bars: 95% CI Figure 6: e subscale means of students' satisfaction. Results indicated that the significant differences in satisfaction means between understanding and preparation (∆M � 0.36917, SD � 0.6804, t (192) � 7.538, p ≤ 0.001), studying and preparation (∆M � 0.34672, SD � 0.91129, t (192) � 5.286, p ≤ 0.001), and preparation and usability (∆M � −0.34845, SD � 0.63893, t (192) � −7.576, p ≤ 0.001) were significant. In all differences, the satisfaction level of preparation was significantly lower than others. Students were equally satisfied with using MGHE Connect to understand materials, study for exams, and its usability.
In sum, the study indicated an overall students' satisfaction with MCGH Connect, which means that the tool generally meets student needs. It indicated that MCGH Connect is adequate in helping students understand and comprehend study. Also, it helped them in studying and reviewing for exams. Students thought MGHE Connect helped them prepare for classes and study course content ahead of lectures and class discussions. However, students thought that MGHE Connect was significantly less helpful in preparation compared to understanding and studying. e study showed that students were as satisfied with MGHE Connect usability as understanding and studying.
Unlike studies that found that MGHE Connect was ineffective in improving student academic performance measures [13][14][15], this study showed how student perceptions were positive about MGHE Connect.
ese study findings showed the essential of considering direct measures such as exam scores and indirect measures such as survey measures to assess blended learning programs [14,15,[71][72][73]. Using both types of measures might resolve the ambiguity in assessing such programs' effectiveness [70]. e study findings also agreed with what [27,70] stated that student self-reports of learning have no relationship with actual learning. Students might perceive tools to substantially impact their learning while it has no impact on direct learning measures.

Conclusion
is study aimed to adapt and validate a scale to assess students' satisfaction with MGHE Connect in Saudi Arabia. It aimed to provide a valid instrument measuring students' satisfaction with MGHE Connect to further studies and interventions to improve students' learning and environments. e study followed a well-established procedure to translate the survey and establish content validity. It examined survey items to establish the construct, convergent, and discriminant validities, and composite reliabilities. e only model that fitted the data and satisfied all reliability and validity standards was a second-order model. Two primary constructs were distinctively identified: satisfaction and utility. e utility scale was composed of four subscales: understanding, studying, preparation, and usability. e survey constructs were strongly associated, and it was challenging to establish discriminant validity for proposed models. Higher-order grouped construct allowed achieving discriminant validity in addition to already established reliability and validity coefficients. e study's final version of the survey was reliable and valid (see Appendix B), and it can be used in further studies and interventions. e study showed that MGHE Connect, in general, met the students' needs and satisfaction. MGHE Connect significantly helped students comprehend course materials better, study before exams to get better scores, and prepare for class discussion in advance. However, students' use of MGHE Connect in preparation was significantly less than using it to understand and study. Also, students were satisfied with MGHE Connect's ease of use and friendliness. e study showed how useful MGHE Connect was based on students' perceptions in Saudi Arabia. However, this study is limited since the sample used may not represent the public as a whole.

Data Availability
e data used to support the findings of this study are available upon request to the author.

Conflicts of Interest
e author declares no conflicts of interest regarding the publication of this paper.