Dimensionality Reduction of Social Media Application Attributes for Ubiquitous Learning Using Principal Component Analysis

Ubiquitous learning is anywhere and anytime learning using e-learning and m-learning platforms. Learning takes place regularly on mobile devices. School-based instructors and learners have capitalised on ubiquitous learning platforms in unprecedented times such as COVID-19. )ere has been a proliferation of social media applications for ubiquitous learning. )ere are a vast number of attributes of the social media applications that must be considered for it to be deemed suitable for education. Further to this, mobile and desktop accessibility criteria must be considered. )e aim of this research study was to determine the high impacting and most pertinent criteria to evaluate social media applications for school-based ubiquitous learning. Data was collected from 30 experts in the field of teaching and learning who were asked to evaluate 60 criteria. Principal Component Analysis (PCA) was the method employed for the dimensionality reduction. PCA was implemented using singular value decomposition (SVD) on R-Studio. )e results showed loading values from principal component one for the top 40 educational requirements and technology criteria of the 60 criteria used in the study. )e implications of this research study will guide researchers in the field of Educational Data Mining (EDM) and practitioners on the most important dimensions to consider when evaluating social media applications for ubiquitous learning.


Introduction
Rapid developments in Information and Communication Technology (ICT) and the emergence of numerous applications have led to the establishment of digitalized learning environments. Additionally, there has been a rise in demand for traditional and unconventional learning systems along with lifelong education due to the development of a connectivism-based society. Ubiquitous refers to a world where invisible devices assist people in daily activities affording unlimited access to learning resources anywhere and anytime [1,2]. In an environment of secondary education, instructors and learners can capitalise from novel trends in ubiquitous computing, using ubiquitous technologies and devices in the learning space. Generally, and according to the literature, the youth carry mobile devices anytime and anywhere and relish playing with new contraptions [2,3].
Mobile learning (m-learning) is deemed as either as an extension of e-learning or a subset of e-learning [2,4,5]. In m-learning, information is retrieved at any time (synchronous and asynchronous interactions), any location (spatial mobility), and by anyone (collaboratively or individually) [2,6]. Commercialisation and wide access of mobile Internet services based on wireless broadband to every part of the society have been propagated by reduced cost and highspeed downlink packet access (HSDPA). is has resulted in the convergence of e-learning, m-learning, wireless technologies, social networking technologies, and mobile devices such that ubiquitous learning (u-learning) is available to learners irrespective of time or location [2]. e communication between the embedded computers in the environment and devices' assents to learners learning in a realm that they are interested in whilst moving into spaces such as social media (SM); hence, connecting learners to their learning environment [7]. Each predominant SM application suited to u-learning has various features and characteristics which can be broadly categorised as technology criteria and educational requirements that need to be probed by the instructor for it to be aligned with the outcomes of the lesson [8]. U-learning is no longer considered as teaching and learning support but is increasingly being relied upon as a conventional teaching and learning platform in an era of student-centred learning. Furthermore, the diffusion of SM in the schooling system has increased. erefore, ample scope exists for this research. Numerous attributes can be used to measure educational requirements and technology criteria that evaluate SM applications' suitability to u-learning. PCA is a nonparametrized algorithm which forms the basis for dimensionality reduction of educational requirements and technology criteria [9]. e underlying goal in dimensionality reduction is that reduced dimensionality should have dimensionality that corresponds to the original dimensionality, and the reduced parameters must account for the observed properties of the data. e literature abounds with dimensionality reduction techniques other than Principal Component Analysis to achieve this goal. Literature shows that nonlinear techniques perform well on artificial tasks but do not translate the same good performances on tasks of the real world. Traditional dimensionality reduction methods such as factor analysis and classical scaling do handle nonlinear data adequately. In [10], Van der Maaten et. al. investigated the performance of 12 nonlinear dimensionality reduction techniques such as diffusion maps, Sammon mapping, and multilayer autoencoders just to name a few and found that nonlinear techniques, despite their large variance, are not capable of outperforming principal component analysis. e main objectives of this research study are as follows: (1) identify the educational requirements and technology criteria for evaluating SM applications for u-learning; (2) visualize the principal components (PC) that have high variance and the most impact on the dataset using PCA; (3) deliver more information to practitioners in the educational terrain such that they are able to make conversant decisions on u-learning deployment and that scientific researchers acquire valuable sagacity to make better research decisions for imminent trends.
Section 2 discusses the literature review of attributes that assess educational requirements and technology criteria for evaluating SM applications' aptness for u-learning. Section 3 presents the materials and methods in this research study, while Section 4 discusses the results of PCA used in this study. e study concludes in Section 5.

Related Works
is section generates a selection of attributes to evaluate SM applications' suitability to u-learning by reviewing a vast number of related research studies. e criteria are broadly categorised into educational requirements and technology criteria.
In the study by Meyliana et al. [11], students' social media preference was analysed to increase student engagement with the university. Data was collected from 1021 students from fifty-eight Indonesian universities using questionnaires [11,12]. Entropy was used to process data and assign criteria weights for social media preference. e identified educational requirements were as follows: information quality which included relevance, timeliness, accuracy, comprehensiveness, and usefulness; learner engagement which comprised encouraging student-faculty contact, cooperation among students, active learning, giving prompt feedback, time on task, communicating high expectations, respecting diverse talents, and ways of learning. e technology criterion identified was service quality which consisted of efficiency, system availability, fulfilment, and privacy. Using the TOPSIS (Technique for Order Preference by Similarity to the Ideal Solution) method, it was established that the implementation of social media was more dependent on information quality as opposed to service quality. However, while comprehensiveness and usefulness of information were highly essential to students, they also valued system availability, efficiency, and fulfilment as they directly impacted their expectation and active learning process [11,12].
Most m-learning applications are created for the formal education and learning environment. ese applications are categorised by the enhancement of the interaction between instructors and learners to offer high flexibility and interaction in the learning process, making an accurate decision on which m-learning application to select can be challenging. e paper by Sarrab et al. [13] presented system quality characteristics for choosing m-learning applications centred on the outcome of a systematic review. ese technology criteria were usability, performance, functionality, availability, dependability, service quality, and information quality. Criteria were derived from a research literature review, and the resulting information was summarised, and quantitative representations of the quality characteristics were conducted [14]. e research paper by Torun and Tekedere, in [15], involved the development of an e-learning environment for teacher candidates studying the Scientific Research Methods course. e course contents were aligned to the 5E constructivist approach model, and a usability analysis was conducted to reveal the e-learning environment usability. e research comprised 42 teacher candidates and used three different data collection tools to measure the technology criteria, namely, efficiency, effectiveness, and satisfaction which are founding attributes of usability [15].
Since the advent of the primary Social Networking Site (SNS) is a novel way of communicating with other people, a lot of research studies have tried to theoretically and empirically identify the history of, the impact on, and the characteristics of the relationship between users and a SNS.
ere exists a behavioural studies' research gap on the reasons why users join and participate in SNSs [12,16]. e study by Rad et al. [16] explored the influential factors causing users to adopt a SNS. A multicriteria decisionmaking (MCDM) tool and the fuzzy AHP (Analytical Hierarchy Process) were used to evaluate the level of importance of literature-derived educational requirements such as performance expectancy, effort expectancy, social influence, self-efficacy, perceived enjoyment, and attitude toward technology; technology criteria such as facilitating conditions, trust, privacy, and security had the adoption of an SNS. Data was collected from 291 university students in the field of the SNS via questionnaires, and the findings of the study were trust, performance expectancy, and security which were critical influential factors in SNS adoption [12,16]. e study by Woodcock et al. [17] investigated the efficiency of an online synchronous platform employed to train preservice teachers using a blended learning approach. A mixed method approach using quantitative survey data and qualitative interview data was collected from 53 students who used the platform, and data was analysed using statistical analysis and thematic content analysis, respectively. One of the findings of the study was that preservice teachers' ability to learn and apply e-learning for students was reliant on the technology criteria, namely, ease of use, and the educational requirements, namely, psychologically safe environment, e-learning self-efficacy, and competency [17].
A historical overview of online distance learning along with definitions and classifications of key concepts was provided in the paper by Kaplan and Haenlein [18]. e target population, which included teaching professors and students, was discussed in great detail to propose parallel frameworks influencing intrinsic student motivation and for selecting an effective online teacher. e benefits of distance learning were reviewed, along with the specific relation between social media and online distance learning. Distinguished educational requirements were student commitment, challenge, control, competition, contemporaneous, student assessment, return on investment, teaching staff charisma, competence, constancy, compensation, contribution, learning goals, and quality assurance. Technology criteria were the digital and social media use policy [18].
Forty two published papers, which appeared in 33 international conferences and academic journals between 2001 and 2015, were reviewed in the paper by Zare et al. [19] to attain a comprehensive review of multicriteria decisionmaking techniques in e-learning. is gave rise to significant criteria for appraising e-learning such as interactivity which is an educational requirement [19] and technology criteria which included the following: usability, response, web and course design, accessibility, reliability, cost-effectiveness, functionality, security, stability, trust, accuracy, flexibility, interoperability, and continuity [19].
SM has afforded new opportunities for learners and instructors to interact, but there exists a need to investigate the factors that influence SM adoption by instructors and students. e study by Elkaseh et al. [20] examined the use of SM among students and instructors in Libyan higher education using the Technology Acceptance Model (TAM). Data was collected via. a survey method from a sample population of instructors and students of four Libyan universities, and Structural Equation Modeling (SEM) was conducted to investigate the proposed factors' predictive behaviour. Educational requirements were perceived ease of use, perceived usefulness, and attitude towards use [20].
It is apparent that technology has changed the landscape of the learning environment, and the way in which school learners learn is enhanced by different modes of education. Classroom technology incorporates interactive learning technology such as e-book technology. In Malaysia, the acceptance of novel technology such as e-book technology by school children is important [12,21]. e study by Elyazgi et al. in [21] identified the interface factors of Children Computer Interaction (CCI) and the determinants of usability guiding e-book behavioural acceptance by 417 school learners. With the combination of the TAM (Technology Acceptance Model) and e-book technology-related literature review, the research hypotheses were established from the interrelationship of a detailed set of constructs. e research hypotheses built the measurement framework, which was quantified by a structured questionnaire comprising a fivepoint Likert scale [21]. Using the questionnaire and TOPSIS, the importance of interface factors was deduced. e educational requirements were usability, perceived enjoyment, perceived usefulness, perceived ease of use, and behaviour intention towards the use of Information Technology (IT). Technology criteria were usability, interface, and Child Computer Interaction (CCI). e combination of CCI and TAM factors afforded results that showed that school learners accepted the use of e-books. e highest ranking was awarded to perceived ease of use, whilst the lowest ranking was behaviour intention. e former was attributed to the functions and features of e-books which seemed to be easy to use. However, it was concerning that the e-book technology usability scale was lower than the interface scale, which inferred that school learners' e-book technology acceptance will improve if it is viewed as championing an elevated level of interactivity [12,21].
e study by Debattista [22] reviewed various instruments and rubrics in higher education to suggest a more inclusive rubric that comprised a fusion of best practice approaches in some higher education institutions in the e-learning field. e findings of the study were that the suggested inclusive rubric supported the development, integration, sharing, and remixing of online courses by affording a single reference point with a wide range of pedagogical facilities, approaches, and tools to e-learning [22]. Educational requirements were, namely, instructional design, course opening and closing, assessment, interaction and community, instructional resources, learner support, course evaluation, and instructional design cycle [22]. Technology criteria were, namely, course opening and closing (technology competences and issue resolution), instructional resources, technology design, and instructional design cycle [22]. e study by Anstey and Watson [8] served to create a rubric for evaluating e-learning tools in higher education by sifting for the best criteria from extant literature. Educational requirements identified were social presence, teaching Mobile Information Systems presence, and cognitive presence [8]. Technology criteria were, namely, functionality, accessibility, user accountability, diffusion, technical mobile design and privacy, and data protection and rights [8].
Most colleges and universities subscribe to online education being critical in their long-term strategy. Studies have shown that online courses are best implemented when engineered to exploit the learning opportunities offered by the online technologies.
e study by Sadiku et al. [23] recognised the following educational requirements, namely, encouragement of student participation, cooperation, active learning, and reflection; prompt feedback; time on task; high expectations; respect to and addressing diverse talents, ways of learning, and individual differences; motivation; avoiding information overload; and creating a real-life context [23]. e paper by Kanagarajan and Ramakrishnan in [24] investigated the numerous smartness levels incorporated into u-learning environments (ULE). A review on infrastructural developments for u-learning was reported to address different open requirements and problems, potential improvements; technology challenges within the scope of ULE and Ambient Intelligence-Assisted Learning Environment (AmIALE). e identified educational requirements were, namely, adaptive learning, context-aware services, supervision and coordination of intelligent environment, enhance learner experience, learner's behaviour, learner autonomy, flexibility, and collaborative learning and the technology criteria were as follows: cost-effectiveness and audio and visual synchronisation [24]. e study also recommended a ULE enabled by Internet-of-ings (IoT) for delivering smarter levels such as connectivity, energy efficiency, special needs, self-discovery, self-optimisation, and multimodal human-computer interaction in an effective manner. Kanagarajan and Ramakrishnan in [24] believed that their work would elicit different dimensionality thinking to find solutions for numerous other current issues.

Dataset.
Data was obtained purposively from 30 experts for this study.
e criteria used to select experts were, namely, higher education or school-based practitioner, 3 years or more experience, and must have a university degree in teaching and learning. e composition of the dataset is shown in Table 1.

Questionnaire.
e dataset was gathered using a survey questionnaire sent on a link to the experts' mobile device. Online questionnaires on Microsoft Forms afforded a timeefficient and cost-effective data collection. e questionnaire consisted of closed ended questions using a rating scale from one to five with one being the least important and five being the most important attribute. e participant selected one suitable answer for each question. e first part of the questionnaire comprised questions pertaining to the respondents' demographic data. e questions explored technology criteria and educational requirements derived from the extant literature that were significant to the management of SM diffusion in school-based education. e values obtained from the Likert scale are discrete ones which are, however, treated as continuous values subsequently. Technology criteria and educational requirements were fine tuned with the use of the multivariate data analysis method called PCA. e reliability of the questionnaire was tested using Cronbach's alpha score. In a research instrument, when the alpha value obtained is 0.7 and above, this is considered reliable. When the alpha value is 0.8, this is considered moderate reliability, and when the alpha value is 0.9 and closer to 1, this is considered high reliability [25].
ere were 60 items in the question, and the overall reliability score (Cronbach's alpha) was 0.949 confirming high reliability of the questionnaire. e data was analysed and synthesised using PCA.

Principal Component Analysis.
PCA is a mathematical algorithm that reduces the dimensionality of a dataset, while maintaining the variability [26]. Considering the vast amount of educational requirements (36) and technology criteria (24), the most significant measures for gauging the suitability of the SM applications for u-learning appears unclear. A cursory look at the array of data can cloud and confound the most essential criteria [26]. A more powerful analytical method is needed to make sense of the data. PCA is a linear form of the feature extraction algorithm that can adapt the distributed data and diagonalize the matrix of covariance on a low-dimensional subspace. It is an empirical method using analytical skills from linear algebra to ensure that the number of variables gauging the suitability of the SM applications for u-learning is not unwieldy or even deceptive. In this case, PCA was used to evaluate the primary set in this multidimensional dataset in an unsupervised manner, which describes the variation in the measurements, while the latent linear correlation variable is transformed into a linear independent variable. An advantage of PCA is that each dimension is quantified to describe the variability of the dataset. Each quantified score provides a means for understanding how important each dimension is in relation to one another [26]. In this study, PCA is utilized from the perspective of multivariate data analysis to extract meaning from a 60-dimensional dataset. e dimensions with higher scores (principal components) provide a better portrayal of the criteria used to measure the suitability of SM applications for u-learning than the dimensions with lower scores [26]. e first principal component (PC1) accounts for the most variation in the sample criteria, and it is the direction along which the samples show the largest variation and is the strongest underlying trend in the feature set [26]. e second principal component (PC2) accounts for the second most/ highest variation in the sample, that is, the direction uncorrelated to the first component along which the samples show the largest variation and so on for all the other PCs. e PCA method is based on the principle that when components are analysed, the component with the greatest variation, which is normally component 1, can explain more of the variation in the dependent variable compared to a component with lesser variation in it [27].
PCA was implemented on R-Studio using functions that are already built-in R stats package. e function prcomp() performed PCA on the data matrix to first generate graphs that help show whether the samples are related or not related to each other and further reduce the dimensionality of the variables. e function prcomp() in R-Studio returns values for three parameters, namely, (i) "x" is the principal component. (ii) "sdev" is the standard deviation, and it is used to calculate how much variation in the original data each PC accounts for (square of sdev is useful for such calculation). (iii) "Rotation" (loading score), each PC has its own loading score, and so, it is for every sample, this is a matrix of eigenvectors. We can determine which of the criteria has a positive or negative loading score. More precisely, "x," "sdev," and "rotation" on R-Studio returns the following: (iv) "x" that prcomp() returns sums (the rotation * the original data) but compressed to the unit vector. (v) "sdev" value that prcomp() returns (and thus related to the eigenvalues). (vi) "rotation" that prcomp() returns a matrix of loading scores.
e PCA plot is drawn using base graphics and ggplot2 on R-Studio [28].

Data Standardisation.
Data standardisation is of great importance to data summarisation. is is called scaling in PCA, where the dataset was transformed using equation (1).
is means that the mean of the attribute becomes zero, and the resultant distribution has a unit standard deviation.
e dataset was standardised according to where i = 1, 2, 3, . . ., 30 (expert no.) and j = 1, 2, 3, ...., 60 (criteria no.), X ij represents the original value of the ith expert rating of the jth criteria, X m is the mean, and σ represents the standard deviation of the series formed by values of the ith Expert for all 60 criteria [28]. e R-Studio function scale () was used to standardize the data. It takes a numeric matrix as an input and performs the scaling on the columns [28].

Results and Discussion
is section presents the results of 30 experts rating of 60 criteria for the evaluation of SM applications for u-learning. In terms of PCA, an analysis was conducted using 60-dimensional data with observations by 30 experts [9]. Table 2 below lists the 60 criteria as extracted from the extant literature for the evaluation of SM applications. e table is organised into educational requirements and technology criteria [8]. Criteria are represented by Column IDs.
A load plot is used to show the influence of the original variables on the PCs [9]. Figure 1 shows the load plot of PC1 and PC2 given in the current dataset.
In Figure 1, the scatter plot shows the data points in a 2dimensional space, namely, the PC1 and PC2. e PC1 is on the x-axis because it is the first column in X, while the PC2 is on the y-axis because it is the second column in X [26].
To meaningfully explain the clusters, we calculated the depth of variation the PC1 accounts for in the original data by using the square of the standard deviation to calculate how much variation in the original data each PC accounts for in the study [28]. We then found the percentage of variation each PC accounts for, as shown in the two-dimensional screen plot in Figure 2.
In Figure 2, the x-axis represents the PCs and the y-axis shows the percentage variation. PCA redistributes total variance in such a way that the first component explains the total variance as much as possible [9,27]. e graph shows that the PC1 accounts for the largest variation in the data, while the PC2 accounts for the second largest variation in the data, and so on. e PC1 is the direction along which the samples show the largest variation [26][27][28]. e figure shows that the PC1 accounts for more than 38.7 % variation of the data and indicates that there are very big differences between the clusters represented on the screen plot.
Coordinates are projected onto a two-dimensional score plot having two orthogonal principal components, namely, the PC1and PC2 [9]. In Figure 3, the x-axis shows what percentage of variation in the original data that the PC1 accounts for, while y-axis shows what percentage of variation in the original data the PC2 accounts for [9]. Labels of the criteria are plotted below rather than dots. e actual criteria that the labels represent are shown in Table 2 as a column ID. Figure 3 shows that criteria clustered in the PC1 have a more positive loading score and are highly ranked than criteria in the PC2 which have criteria with a more negative loading score.    Mobile Information Systems Dimensionality can be reduced by projecting the data points on the PC1 [9,26]. e top 40 criteria were loaded from the 60 criteria that contributed most to the PC1. In the top 40, 18 criteria emerged from educational requirements and 22 criteria emerged from the list of technology criteria. e absolute value of the loading scores of features that contributed to the PC1 gave the magnitude which provided the ranking score [29]. Figure 4 gives the ranking scores for the top 18 educational requirements from the top 40 criteria. Figure 4 shows that instructor facilitation was the most highly ranked educational requirement by experts, and learner support was ranked the lowest. Instructor facilitation describes the SM application that has easy-to-use features that would significantly improve an instructor's ability to be present with learners via active management, monitoring, engagement, and feedback [22]. Ownership of learning describes the SM application that gives learners the opportunity to meet their own learning goals. is was the second highest-rated educational requirement by experts.
Adaptability of the social media application to learners' changing lives was ranked the third highest educational requirement. Figure 5 gives the ranking scores for the top 22 technology criteria from the top 40 criteria.
Operation stability means that systems are designed in a manner that processing of day-to-day transactions is performed efficiently, and the integrity of the transactional data is preserved [20].
is criterion had the highest loading score from all the technology criteria. Fault tolerance of technology which ensures error prevention, stability, accuracy, flexibility, interoperability, and continuity had the second highest loading score in the category of technology criteria. Multimedia controls such as audio readings and sound control, clarity of all images and graphics, control of audio or video clips, adjustment with final display process, and optimized size for multimedia contents had the third highest loading score from all the technology criteria.

Conclusions
Despite the widespread use of PCA in dimensionality reduction, several limitations still exist. Firstly, the original dataset is lost as PCA turns the original data into PCs. Secondly, it is difficult to relate PCs to the original features. irdly, data standardisation must take place before PCA can be applied. Finally, the number of PCs must be selected with care as some features can be omitted from the original list of features [26]. However, compared to other methods, the advantages far outweigh the disadvantages.
Out of the 60 attributes for evaluating SM applications' suitability for u-learning, 40 of the most significant were revealed in this study. While SM applications are dynamic and changes with the times, the criteria used to measure their appropriateness to education remain consistent. No study has exposed the most impacting factors of SM applications for u-learning. Considering the findings of the study, it is recommended that decision makers in school-based learning now apply the top technology criteria and educational requirements as a basis for decisions on which SM application is best suited to e-learning, m-learning, and u-learning in general. Future studies will focus on Intelligent Decision Support Systems and EDM using the dimensionality reduced criteria, thereby reducing overfitting. Reduced dimensions will also improve the performance of machine learning algorithms and intelligent decision support algorithms.
Data Availability e data collected from respondents and code in R-Studio are available from the corresponding author upon request.

Conflicts of Interest
e authors have no conflicts of interest regarding the publication of this paper.