The Design of Academic Programs Using Rough Set Association Rule Mining

Program accreditation is important for determining whether or not a program or institution meets quality standards. It helps employers to evaluate the programs and qualications of their graduates as well as to achieve its strategic goals and its continuous improvement plans. Preparing for accreditation requires extensive eort. One of the required documents is the program’s selfstudy report (SSR), which includes the PEO-SO map (which allocates the program’s educational objectives (PEOs) to student learning outcomes (SOs)). It inuences program structure design, performance monitoring, assessment, and continuous improvement. Professionals in each academic engineering program have designed their PEO-SO maps in accordance with their experiences. e problem with the incorrect design of map design is that the SOs are either missing altogether or cannot be assigned to the correct PEOs. e objective of this work is to use a hybrid data mining approach to design the correct PEO-SO map. e proposed hybrid approach utilizes three dierent data mining techniques: classication to nd the similarities between PEOs, crisp association rules to nd the crisp rules for the PEO-SO map, and rough set association rules to nd the coarse association rules for the PEO-SO map. e work collected 200 SSRs of accredited engineering programs by the ABET-EAC. e paper presents the dierent phases of the work, such as data collection and preprocessing, building of three data mining models (classication, crisp association rules, and rough set association rules), and analysis of the results and comparison with related work. e validation of the obtained results by dierent fty specialists (from the academic engineering eld) and their recommendations were also presented. e comparison with other related works proved the success of the proposed approach to discover the correct PEO-SO maps with higher performance.


Introduction
In designing academic programs, more emphasis is placed on improving students' knowledge and skills; this is accomplished in undergraduate programs through a series of courses in the subject area. General courses provide core foundational knowledge, and each course in these courses provides a range of activities to enhance students' skills in knowledge, cognitive skills, communication, leadership, teamwork, presentation, technical writing, and psychomotor skills domains. e life cycle of the program includes various phases such as design and speci cation, implementation, and continuous updating as shown by the authors in [1][2][3]. PEOs are speci ed by various stakeholders and components and are de ned as a set of skills that relate to knowledge, skills, and attitudes that learners are likely to demonstrate. PEOs are then mapped to the expected student learning outcomes in each course. Figure 1 illustrates a hierarchical structure of academic program design. e design of the PEO-SO map is a core phase of academic program design. In terms of student performance, PEOs are assessed 4-5 years after graduation based on the academic model presented by the authors in [4]. e ABET-EAC (American Board for Engineering and Technology-Engineering Accreditation Commission) has accredited 380 computer engineering programs by 2019. e ABET-EAC criteria for engineering programs are related to the knowledge, skills, and student behaviors acquired during the program. e PEO-SO direct and indirect assessments presented by the authors in [5] are used to ensure the achievement of program objectives. e guide to accreditation policies and procedures of ABETdescribes the details of the PEOs in Criterion 2 and the student learning outcomes in Criterion 3 (see Figure 2, Appendix A) and shows the ABET SOs in their old and new updated versions. One of the most common problems encountered in engineering program design is the incorrect mapping of PEOs and SOs. is influences program design, implementation, assessment, and accreditation processes. It also increases the burden of the accreditation process and decreases the quality of the program. erefore, a robust design of the PEO-SO map is a critical issue to avoid these problems and minimizes the effort required to prepare the accreditation documentation. On the other hand, the proper design of the PEO-SO map improves the SO selection and allocation, improves student performance, enhances graduate skills, and increases the satisfaction of the professional community, society, creativity, professionals, ethical issues, goal achievement of academic programs, economic issues, and local and international competition. e work in this paper aims to introduce a hybrid of data mining techniques (three different data mining models) to discover the correct PEO-SO map that avoids the problems of incorrectly mapping PEOs to their SOs. e paper uses the rough set association rule mining technique to eliminate the confusion in the association of PEOs to SOs and minimize the number of SOs associated with the corresponding PEOs and eliminate the ambiguous association. Only the rules that are certain to result in the correct mapping of PEOs and SOs are found.
is in turn minimizes the effort required to design academic engineering programs and prepare them for accreditation. In this work, a dataset of 200 SSRs of academic engineering programs was used to develop and validate the proposed model. e remainder of this paper is presented as follows: Section 2 provides a summary of related work using data mining techniques in education or higher education. Section 3 presents the proposed approach. Section 4 presents the different stages of data acquisition, preprocessing and presentation, experimental design, results obtained, discussion, and analysis. Finally, Section 5 is reserved for the conclusion and future work.

Related Work
Many researchers have used artificial intelligence and machine learning algorithms, as well as statistical theories and techniques, in their work to discover key patterns in educational datasets. eir goals are to support academic program design and assessment, accreditation and reaccreditation processes, and decision-making processes and to improve program performance in higher education institutions. Data mining techniques help remove difficulties and impurities to produce a good analysis of large datasets and discover the hidden knowledge in datasets captured by various information systems. Various researchers have worked on developing data mining models to summarize, classify, cluster data, and develop association rules and other features to be applied to various datasets in educational data mining (EDM). e data mining techniques in [6] detect the correspondence between course content learning objects and program level. In [7], an analysis of two different datasets of academic courses is presented using graphical, statistical, and quantitative techniques to select an appropriate ensemble learner from a combination of six potential learning algorithms. In [8], several relevant studies on computersupported learning analytics (CSLA), computer-supported predictive analytics (CSPA), computer-supported behavior analysis (CSBA), and computer-supported visualization analysis (CSVA) from 2000 to 2017 are presented. Predicting student performance using educational data mining was presented in [9], where the base classifiers were random tree, J48, KNN, and naïve Bayes. e use of decision trees and hierarchical linear models using data from the Spanish PISA 2015 and high and low effectiveness schools is presented in [10]. In [11], an assessment framework for capstone courses is presented to evaluate student's performance and project quality by assessing student learning outcomes. In [12], a methodology for collecting information and measuring student learning outcomes for the ABET accreditation preparation process is presented to assist in the completion of ABET SSR. e Apriori technique for identifying association rules was used to create the PEO-SO map that describes the relationship between the program's educational goals and student learning outcomes, as shown in [13]. e Apriori algorithm was used to create association rules to describe the mapping of PEOs and SOs. e problem with this approach is that the number of rules is very high and not specific. In the work presented in [14], a tool for computational methods of information was introduced that includes rough sets, incompleteness of information, data mining, granular computation and extraction of association rules, and new mathematical frameworks. e tool is named RNIA (rough nondeterministic information analysis). In [15], an assessment and evaluation strategy for ABETstudent outcomes (SOs) of computer science and computer information systems programs is presented, where the assessment is developed through direct and indirect methods. Quality of education through accreditation, teaching, and learning in nursing programs is presented in [16]. A framework and work phase is developed to collect and document the selection for assessment of ABET curriculum requirements.  e work developed a tool called ABETAS to automate this framework [17], which helps the institution prepare for ABET accreditation to minimize the burden of assessing student outcomes. e relationship between teaching quality and accreditation is presented in [18], illustrating how accreditation can assist the program in maintaining quality and programs in improving or achieving educational quality through the accreditation process. e approach used rough set theory to adapt the association rule model to discover customer favorites, and its analysis was presented in [19]. A modification mechanism for the attributes and association rules of the rough set was presented, and this proposal was applied to e-commerce platforms to categorize the rough recommendations. An investigation of mining class association rules using the rough set approach is presented in [20]. In [21], an algorithm for finding the finest class rules is presented that uses the adaptation of the Apriori association rule algorithm based on rough set theory to compute the support and confidence of the elementary set of lower approximation concepts. Rough set theory is used in association rule discovery. It simplifies the process of traditional association rule mining and avoids redundant rules introduced in [22] to determine rough set rules. In the work presented in [23], a collection methodology was presented for mapping PEOs to SOs derived from the SSRs of 32 engineering programs accredited by ABET. It minimizes the effort and time-consuming processes. An association rule mining algorithm based on the properties of rough set theory is said to improve the Apriori algorithm for association rule mining based on a decision table. Assessment methods for the ABET-CAC accreditation criteria for computer science undergraduate programs are presented in [24]. ey adopted set of student outcomes for the computer science program to meet the ABET program outcomes and PEOs. Using the theory of rough sets and their properties to discover information in a simpler way than the normal Apriori association rule mining method presented in [25] minimizes the attributes in the dataset and develops a simpler data mining model.
In the above review, many research papers were presented that focused on the accreditation process and its requirements. ey made good contributions in different areas related to the design of PEOs and SOs, but they did not pay attention to the size of the rules governing the relationship between PEOs and SOs in the PEO-SO maps. ey presented the discovery of PEO-SO with high ambiguity and large scale with low confidence. is leads to many errors in program design and accreditation processes. e idea of the approach proposed in this paper is to discover the correct PEO-SO map that generates minimum size and accurate rules for the relationship between PEOs and SOs. e proposed approach uses three different data mining techniques as follows: (i) e decision tree (J48) classifier is very popular to represent the data in a tree form similar to rules. Its goal is to discover the similarities and dissimilarities between different PEO categories. It discovers the confusion among different PEOs, which assists in eliminating this confusion during the PEO-SO map design phase. Its results can be used as a guide for PEO-SO map design. (ii) e Apriori algorithm for association rules is used to determine the clear rules describing the relationship between PEOs and SOs. (iii) e adaptation of the rough set theory for Apriori association rules is used to determine the association rules of the rough set. It consists of lower bound rules (describing the safe region, i.e., high conf.% rules) and upper bound rules (describing the uncertain region, i.e., low and high conf.% rules). e goal is to select the rules with the lower conf.% and avoid redundancy, eliminate the ambiguity between different PEO rules, and simplify the association e resulting PEO-SO maps are evaluated by 50 independent academic professionals to obtain their assessment and feedback. Figure 3 illustrates the complete structure of the proposed approach and shows the different phases of the work.

The Proposed Approach
is section presents the theoretical basis of the machine learning algorithms utilized in this paper to develop the proposed hybrid approach. We start with the bagged J48 decision tree algorithm for machine learning, then the Apriori algorithm for association rules, and finally the rough set Apriori algorithm for rough association rules (upper and lower bounds).

Bagged J48 Machine Learning Algorithm.
Classification is a process of recognizing, understanding, and grouping objects into predefined classes using training datasets. Machine learning software uses different types of algorithms to classify future elements of datasets into correct categories. A decision tree is a classification algorithm that uses a divide and conquer algorithm, which consists of decision nodes and leaf nodes. e decision node identifies a test over one of the attributes and the leaf node represents the class value [26]. e classification error is the percentage of misclassified cases [27]. In practice, the training datasets are usually large, which leads to a larger number of branches and layers in the generated decision tree. When there are more class categories in the decision tree, the classification accuracy decreases significantly. ere are various decision tree generation algorithms such as ID3, J48, FT, BF Tree, LMT, and many more. e performance is evaluated using the F-measure. By using machine learning algorithms, the proposed work is automated which increases the accuracy of the result [28,29]. e J48 algorithm was proposed and developed by Quinlan in 1993. In this work, the objectives of using the J48 classifier are as follows: first, we validate the process of data collection and representation, and second, we discover how the classifier is confounded with different PEO categories of the different SSR reports in different academic programs. In this work, we used the J48 algorithm because it has higher accuracy. To increase the accuracy of the classifier, an ensemble technique can be used; the classification performance is greatly improved by combining the decisions of different classifiers into a single classifier. is J48 algorithm uses two ensemble learning approaches, bagging and boosting, which are applied to five traditional classifiers.

Apriori Association Rule.
e association rules are formally described as introduced in [22,30]. Let Z = {Z1, Z2, . . ., Zm} be the set of attributes and T be an instance in the dataset S, where T ⊆ Z. Each instance in S is identified by TID. If the set of objects satisfies X ⊆ Z, Y ⊆ Z, and X ∩ Y = φ, the implication X ⟶ Y is defined as an association rule, and if s% in D matches X ⟶ Y, then the support of the rule X Y is s%, which is computed by s% = support (X ⟶ Y) � P(X|Y). If the instance contains X and Y, then the confidence of rule X ⟶ Y is c%, which is calculated by c % = support (X ⟶ Y) � P(X|Y) � P(X Y)/P(X). Minsup and minconf if s% ≥ minsup and c% ≥ minconf, which is defined as a strong association rule. e Apriori algorithm is updated in [31,32]. To reduce bias, a lift judgment measure is defined in [33,34] and defined by the formula lift = c (X ⟶ Y)/s (Y). Table 1(see Appendix B) illustrates an explanatory example for the calculation of support % and conf. % using the Apriori algorithm.

Rough Set eory.
Rough set theory can make a proper measurable analysis of vague, unpredictable, and imperfect information [35,36]. e universe (all instances in a dataset) is divided into units of imperceptible objects, which are defined as basic sets. is imperceptibility is related to the outcome and granularity of the information [37,38]. Pawlak's rough set model is the basis for formal reasoning and data analysis and self-directed decision-making [39]. Rough set theory classifies uncertain information expressed in terms of experience data. A set of similar objects is called an elementary set, which is a fundamental atom of knowledge. Any union of elementary sets is called a crisp set, while other sets are rough sets. Each rough set has boundary line elements that can belong to the set or its complement, as shown in [40,41]. ere are three types of approximation in rough set theory; the different regions that represent the approximation properties of rough set theory are the upper boundary BU, the lower boundary (BL), and the boundary region (BU-BL), which are shown in Figure 4.

Rough Set Association Rules (RSARs).
e algorithm for generating the rough set association rule is denoted by R_Apriori and was presented in [18]. It modifies the Apriori algorithm for generating association rules, which generates frequent rules. It consists of three phases. e first phase computes the support of each element with one rule (which contains only one element), while the second phase computes the support of each element with two rules (which contains two elements), and the third phase computes the support of each element with three rules (which contains three elements). e steps of the RSAR algorithm are described with an explanatory example illustrated in Table 2 (see Appendix C). e example uses a dataset from the instances of the collected SSR dataset for the PEO-SO map.

Dataset Description and Preprocessing.
e raw datasets used in this work were collected manually using the Google search engine and accredited academic programs' websites. From 200 SSRs of accredited engineering programs, the work aims to discover the robust and correct PEO-SO maps.
erefore, we focus only on the map section in the SSR documents. Each academic engineering program should illustrate the mapping between PEOs and a set of SOs (11 outcomes from a to k, see Figure 2, Appendix A). is is one of the requirements of ABET-EAC, presented in subsection B of Section 3 (outcome subsection). erefore, the dataset used was selected from these subsections in the SSRs of the entire dataset. en, we create a table for 200 PEO-SO maps extracted from 200 SSRs. Each PEO-SO map consists of a predefined number of entries, which are first collected in the form of symbols and words and then converted into numerical entries (on average, the total entries are 200 * 8 � 1600 entries, and each entry is represented as 11 student learning outcomes. Some entries were omitted because they were not complete or were missing). e representation of the data is performed in several steps. First, we encode the set of PEOs with a set of symbols similar to those in [13], and Table 3 shows a symbolic form of the common set of PEOs for all engineering programs. PEOs were presented in text form, and in many cases, each PEO was confusingly prepared (not specific and/or merged with two or more PEOs), e.g., "excel in industrial or graduate work in computer engineering and related fields" represents two PEOs as graduate studies and career development. To solve this problem, an essential word processing step was performed by implementing a software program similar to the Word2Vec model to convert each PEO into 11-dimensional word vectors (each word represents a single PEO) using the collected SSR dataset described above, and Table 4 shows an example of the output. e third step is the conversion, normalization, and presentation of PEO-SO map datasets, which are essential for the further data mining steps; it strongly affects the output of the data mining model, and Tables 5-8 show examples of these preprocessing steps. e following remarks are intended to illustrate these steps: (i) Table 3 shows an example of the PEO-SO map for academic program A. PEO1 is divided into two PEO categories: career development and graduate studies, which are symbolically represented as C_D and G_S, respectively. e last column refers to the assigned student outcomes for these two PEO categories, where "x" means that this SO is unassigned for this PEO, while "√" means that it is assigned.
(ii) Table 4 shows another example of the assignment of PEOs and SOs in academic program B. It illustrates the different representations of PEOs from one program to another and the difference in the assignment of SOs. 1. an ability to identify, formulate, and solve complex engineering problems by applying principles of engineering, science, and mathematics. 2. an ability to apply engineering design to produce solutions that meet specified needs with consideration of public health, safety, and welfare, as well as global, cultural, social, environmental, and economic factors. 3. an ability to communicate effectively with a range of audiences. 4. an ability to recognize ethical and professional responsibilities in engineering situations and make informed judgments, which must consider the impact of engineering solutions in global, economic, environmental, and societal contexts. 5. an ability to function effectively on a team whose members together provide leadership, create a collaborative and inclusive environment, establish goals, plan tasks, and meet objectives. 6. an ability to develop and conduct appropriate experimentation, analyze and interpret data, and use engineering judgment to conclude. 7. an ability to acquire and apply new knowledge as needed, using appropriate learning strategies.   Tables 3 and 4 into binary forms are suitable for input to different data mining models in the next steps.
(iv) Table 6 illustrates the final binary representation of the mapping of PEOs and SOs. Each row contains 13 features such as the pattern ID, the PEO class, and 11 student outcomes from a . . . k.

Result Analysis of Applying the Bagged J48 Machine
Learning Algorithm. is section presents the interpretation and analysis of the results obtained by applying the bagged J48 classifier to the dataset. e obtained results are presented in Table 9 , Table 10, and           D) shows the summary of the results obtained using the bagged J48 decision tree classifier, which performs the highest for the datasets and provides more details, while Table 11 (see Appendix D) shows the detailed accuracy of each PEO class. It shows the percentage of correct                Table 16: e alignment of ABET 11 old SOs (a-k) to 7 new updated SOs (1-7).

PEOs categories Symbol Program educational objectives (PEOs) Student outcomes a b c d e F g h i j k
New outcomes 1-7 (7) Old outcomes a-k (11) Applied Computational Intelligence and Soft Computing interpretation and analysis of the upper bound rules by applying the RSAR algorithm to the dataset. e obtained results are presented in Table 12, which shows the upper bound rough set association rules that govern the relationship between PEOs and SOs. To interpret the rules shown in Table 12, we consider the first row of P in the PEO-SO map as follows.
If PEO has only "1" SO entry, it means that this SO exists, or only "0," which means that this SO does not exist. e upper bound rule for P can be processed using the gray area of results that may or may not exist with an average confidence of 0.75. e obtained upper bound rules are shown in Table 12 and can be interpreted as follows: E, S, and C have the highest confidence level, which means that these PEOs are well defined in the proposed PEO-SO map, while G_S and T_C have the lowest confidence level, which means that these two PEOs are unclear in their mapping. Each PEO is defined with a rule associating all groups of student outcome SOs, and each association rule should be defined twice because each rule contains some of the student outcomes with two values of either 1 or 0. e upper bound rules are shown with SOs, which may or may not be present in the low confidence PEO-SO rules (see Tables 13 and 14).

Lower Bound Apriori Rough Set Association Rule Mining Algorithm.
is section presents the results of the interpretation and analysis of the lower bound Apriori rough sets association rule (safe region) obtained by applying the (RSAR) algorithm to the dataset. Table 15 illustrates the lower bound rule sets of RSAR that govern the relationship between PEOs and SOs. To interpret the rules shown in     Table 15 that govern the PEOs with 11 old SOs, we consider the first row P in the PEO-SO maps as follows. P is the lower bound (safe region) rule that includes only the SOs with "1" or "0" entries, which means that the SO either exists or does not exist in rules such as a, c, e, f, i, j, and k SOs, while SOs b, d, g, and h do not exist in rule P. erefore, the lower bound rule for P can be defined with only one rule, not two rules with an average confidence of 0.79, which is higher than that of the upper bound rule with an average confidence of 0.75. e rules obtained can be summarized as follows: the lower bound RSAR is more concise and robust in its description; each rule can be described once, but not twice as in the upper bound RSAR. e comparison between upper bound RSAR conf. % and lower bound RSAR conf. % is shown in Figure 5. From this, it can be seen that the average confidence level for the lower bound rules is higher than for the upper bound rules for all PEO domains, proving their concreteness and robustness to the Apriori association rules.

Result Mapping to the New Updated ABET-EAC SOs.
In this section, we present the mapping of the results obtained with the proposed approach to the new updated ABET-EAC SOs listed in Figure 2 (see Appendix A). We have mapped 11 (a-k) old SOs listed in Figure 2 (see Appendix A) with the new updated SOs as shown in Table 16.
e mapping between old and updated SOs is created using a software program similar to the Word2Vec model that maps 11 old SOs to seven new updated SOs, as shown in Table 16.
Each new SO is assigned to 11 old SOs; e.g., outcome "#" is assigned to both old outcomes "a" and "e," while "#5" is assigned to only one old outcome "d." Table 17 illustrates the assignment of outcomes from the old to the new updated ABET-EAC SOs. e interpretation of the rules is explained in Table 17, which governs the PEOs with the new seven SOs, as follows.
For example, we consider the first line of P in PEO-SO maps, which is the lower bound of the rule (safe area). It contains only the SOs with "1" or "0" entries, which means that the SO either exists or does not exist in this rule as SOs #1, #2, #3, #4, and #5 exist in rule P, while SOs #5 and #6 do not exist in rule P (see Tables 18 and 19).

Questionnaires, Analysis, and Feedback about the Obtained Results.
is section is reserved for the validation and evaluation of the obtained results by 50 independent professionals and experts (lecturers and professors) from different academic engineering professions. e evaluation was performed by distributing a survey among these experts and obtaining their feedback for further analysis. e survey template was designed to determine the level of satisfaction with the results obtained from the proposed approach to PEO-SO maps. e survey and results of the analysis are presented in Table 20 and Figure 6 (see Appendix E); it includes 9 questions and five satisfaction levels: strongly agree, agree, neutral, disagree, and strongly disagree. e average column is calculated by taking the average of the 3 boxes: strongly agree, agree, and neutral. e survey showed a 97% agreement with the results obtained with the proposed approach.

Comparison with the Related Work Presented.
In this section, we present the comparison of the proposed approach with the work presented in [13], which uses the Apriori association rule mining technique to discover the rules describing the relationship between PEOs and SOs, while our approach uses the rough set Apriori association rule mining technique. Our proposed approach uses the Applied Computational Intelligence and Soft Computing bagged J48 classifier to detect and resolve the confusion in the association of different PEO-SOs. Our approach uses the text processing software similar to the Word2V model to perform automatic preprocessing of the merged PEOs into PEO-SO maps, while this was performed manually in [13]. Figure 5 illustrates a graphical comparison of the obtained results as lower bound and upper bound rules conf. % of the Apriori association results in [13]. It shows that our proposed approach performs better for the upper bound and lower bound rules with higher conf. %. It concludes that our proposed approach develops a PEO-SO map with a higher conf. % and a smaller number of rules with a small size, as well as greater robustness. e work presented in [13] describes the PEO-SO map with 22 rules and ours with only 11 rules (lower bound 50% reduction in the number of rules). e work uses a questionnaire survey to evaluate the obtained results by 50 different specialists which is not the case in [13].

Conclusion and Future Extensions
is paper presents the framework for the proposed hybrid approach to support the design and accreditation process in academic engineering programs, which uses three different data mining techniques to determine the best design for PEO-SO. It can minimize the effort required to design a PEO-SO map and the time required to prepare the accreditation processes. e proposed approach has succeeded in designing a correct assignment PEO-SO map with a minimal set of rules and small rule sizes. e paper presented different phases for the development of the proposed approach, including dataset collection and preprocessing, data mining construction, modeling, analysis, interpretation, and evaluation of the obtained results. e main feedback from the questionnaires proved the validity of the proposed approach with 97% agreement in the design and minimizing the effort required for accreditation requirements. e comparison with related methods showed that our proposed approach performs better with a high percentage of performance. e limitation of the presented work is the choice of the confidence rate threshold, which distinguishes between lower and upper bounds. Future extensions of this work will be the use of natural language processing (NLP) techniques for the preprocessing phase of PEOs. e second extension will be the generalization of the proposed approach to all academic courses. e third extension will be the development of an interactive web-based system for the proposed approach. Finally, additional datasets will be collected and preprocessed for use as a benchmark dataset. s% and c% of class association rules are computed directly, and S and C are calculated by the following two equations: B L is defined as the lower bound approximation in rough set theory, B L (i) ∪ B L (y) indicates the number of i occurs in conjunction with y across the dataset in the  Data Availability e unavaliablity of the data set because it is collected manally and preprocessed to be suitable for further processing.

Conflicts of Interest
e author declares no conflicts of interest.