Knowledge Graph Enhanced Intelligent Tutoring System Based on Exercise Representativeness and Informativeness

Presently, knowledge graph-based recommendation algorithms have garnered considerable attention among researchers. However, these algorithms solely consider knowledge graphs with single relationships and do not effectively model exercise-rich features, such as exercise representativeness and informativeness. Consequently, this paper proposes a framework, namely the Knowledge-Graph-Exercise Representativeness and Informativeness Framework, to address these two issues. The framework consists of four intricate components and a novel cognitive diagnosis model called the Neural Attentive cognitive diagnosis model. These components encompass the informativeness component, exercise representation component, knowledge importance component, and exercise representativeness component. The informativeness component evaluates the informational value of each question and identifies the candidate question set that exhibits the highest exercise informativeness. Furthermore, the skill embeddings are employed as input for the knowledge importance component. This component transforms a one-dimensional knowledge graph into a multi-dimensional one through four class relations and calculates skill importance weights based on novelty and popularity. Subsequently, the exercise representativeness component incorporates exercise weight knowledge coverage to select questions from the candidate question set for the tested question set. Lastly, the cognitive diagnosis model leverages exercise representation and skill importance weights to predict student performance on the test set and estimate their knowledge state. To evaluate the effectiveness of our selection strategy, extensive experiments were conducted on two publicly available educational datasets. The experimental results demonstrate that our framework can recommend appropriate exercises to students, leading to improved student performance.


Introduction
Online education has emerged as a signifcant supplementary learning strategy for students [1,2].Many students rely on online exercise recommendations to support their learning.With the vast amount of educational materials available, the challenge lies in recommending appropriate exercises for efective learning [3][4][5][6].Exercises play a crucial role in personalized educational services by serving as a powerful tool to assess students' mastery of concepts [7].However, given the abundance of exercise resources, it is nearly impossible for students to complete them all within a limited time.Terefore, assisting students in fnding suitable exercises becomes a signifcant problem.An exercise recommendation system has been proposed to address this issue by leveraging students' historical answer sequences [8][9][10].
Knowledge graphs (KGs), also known as cognitive maps, provide graphical representations where concepts or words are organized into nodes and connected by vectors representing relationships.Te application of knowledge graphs to learn the order of skills has shown promising results (e.g., [11][12][13][14]).Te arrangement of concepts in a knowledge graph signifcantly impacts learning ability [15][16][17][18].
Te cognitive diagnosis model functions as an efective method to discover the knowledge state of students by analyzing the past interactions of students.As illustrated in Figure 1, the student frstly picks up some exercises (e.g., e 1 , e 2 , and e 3 ).Ten, the response of the student in each exercise can be obtained.Finally, the cognitive diagnosis model analyzes the response of students to estimate the student knowledge state on each concept.
Researchers have recognized that exercise and skill features in knowledge graphs greatly infuence the quality of learning when recommending exercises [19,20].Various methods have been developed to learn the features of knowledge graphs and recommend appropriate exercises, resulting in improved student performance [21][22][23][24].Tese methods efectively explore skill and exercise features to enhance learning efciency.Additionally, high-quality exercises contribute to learners' comprehension of the learning material.Consequently, the research community strives to create a high-quality exercise set to enhance e-learners' performance.Previous research [11,25] applied KGs to consider the dependencies of learning objects in exercise recommendations.However, these works only focused on basic relationships to establish links between KGs, without further investigating exercise features during the recommendation process.As a result, these methods fall short of meeting the requirements of modern e-learning.
Tis paper presents an innovative framework called Knowledge Graph Importance-Exercise Informativeness and Representativeness (KI-EIR) to address diverse learning needs based on KGs.To recommend exercises with high learning quality, the KI-EIR framework combines multidimensional KGs with exercise features to defne the recommendation goal and enhance exercise quality.Te KI-EIR framework consists of four innovative components and a novel cognitive diagnosis model called the Neural Attentive Cognitive Diagnosis (NACD) model, which facilitates exercise recommendation to achieve the recommendation goal.Specifcally, the NACD model estimates the student knowledge state by analyzing the past interaction of students.Ten, according to the cognitive diagnosis results, diferent types of students can be correctly distinguished.Finally, the intelligent tutoring systems recommend the proper exercises to students and improve their knowledge profciency.Te four components are the informativeness component, exercise representation component, knowledge importance component, and exercise representativeness component.Te recommendation goal involves recommending exercises with high informativeness and representativeness.
Specifcally, the informativeness component aims to select exercises with high informativeness from the untested exercise set (E U ) to the candidate exercise set (E C ). Te exercise representation component incorporates a graph neural network with two types of attention mechanisms to generate exercise and skill embeddings.Te knowledge importance component utilizes an innovative knowledge point extraction algorithm that incorporates skill embeddings to extract knowledge points based on the multidimensional KG.Five skill features of these knowledge points are discussed to generate skill importance weights.Subsequently, the exercise representativeness component selects exercises with high knowledge coverage from the candidate exercise set (E C ) to the tested exercise set (E T ) to achieve representativeness objectives.Finally, the NACD model predicts student performance on the tested exercise set and estimates their current knowledge state.
Te main contributions of this paper can be summarized as follows: (1) We propose a novel exercise recommendation method, KI-EIR, which selects exercises with high informativeness and representativeness.By incorporating the structural information of knowledge concepts, KI-EIR recommends exercises to students, thereby improving their overall cognitive level during the recommendation process.
( Te rest of the paper is structured as follows.Section 2 provides an overview of related works on cognitive diagnosis models, relation modeling, and exercise recommendation.Section 3 presents important terminologies and defnes the goal and problem statement of this study.Section 4 describes the methods proposed in this paper.Section 5 presents the experimental evaluation of our framework using two different metrics.Section 6 concludes the paper and discusses future research directions.

2
International Journal of Intelligent Systems  [26], medical diagnosis [27], and especially education [28].Te primary objective of cognitive diagnosis is to discover the latent trait characteristics of learners based on their testing records.Tese discovered characteristic features have been applied in tasks such as resource recommendation [29] and performance prediction [30].Early approaches to cognitive diagnosis mainly depended on psychological evaluation [31].Te two most traditional cognitive diagnosis models, namely, the Item Response Teory (IRT) [32] and the Deterministic Input, Noisy And Gate (DINA) model [33], model the response generated by a learner answering an item as the interaction between the learner's trait features and the item.Ackerman [34] extended the characteristic features into a multidimensional space by proposing the Multidimensional Item Response Teory (MIRT).In recent years, deep learning has been incorporated into cognitive diagnostics by several researchers [35,36].Wang et al. [30] introduced NeuralCD, which utilizes neural networks to autonomously learn the interaction function.However, these cognitive diagnosis models overlook the deep relations between exercises, skills, and students when estimating students' knowledge state.

Relation Modeling.
Based on psychological research, the relationship between exercises and skills has been extensively explored in numerous studies that measure students' knowledge levels (e.g., [37,38]).Many researchers employ Q-matrices to model the relationship between exercises and skills, where exercises related to the same knowledge concept are considered connected in the Q-matrix.Additionally, researchers investigate the relationship between two exercises or skills based on exercise embeddings (e.g., [39,40]).Semantic similarity scores of exercises are computed using prior interactions to model the signifcance of these interactions.However, these relation modeling methods do not consider the heterogeneous interactions between students, exercises, and skills.Terefore, this paper incorporates multiple dimension knowledge graphs (KGs) and Graph Convolutional Networks (GCNs) to establish exercise and skill relations and delve into exercise features such as informativeness and representativeness to recommend the proper exercises to students.[41] and item-based collaborative fltering [42].Model-based collaborative fltering involves mining hidden or explicit features to mitigate data sparsity and achieve good scalability [43].When applying traditional recommendation methods to exercise recommendation in the educational feld, students are treated as users and exercises as items.Tus, nearest-neighbor collaborative fltering can be further classifed as exercise-based and student-based.Considering the impact of multiple dimensional knowledge graphs, recent research has proposed knowledge graph-based recommendation methods for exercise recommendations (e.g., [21,25]).
Recent exercise recommendation methods that leverage knowledge graphs help mitigate misunderstandings in learning content descriptions.Inspired by this idea, Wan and Niu [44] introduced a learner-oriented exercise recommendation method based on knowledge concepts, represented as nodes, and the relationships between them as edges in knowledge graphs.Ouf et al. [45] developed exercise recommendation methods by incorporating knowledge graphs with semantic web to merge personalized concepts.To organize learning resources in a sequential manner, Shmelev et al. [46] proposed a method that integrates evolutionary methods and knowledge graph technology.Chu et al. [47] created an e-learning system based on a conceptual map that can generate learning paths using the connections in the concept map.Recognizing the need for diverse learning paths in diferent settings, Zhu et al. [25] presented a method for recommending learning paths using prebuilt learning scenarios.Tey developed an approach that requires the defnition of starting and ending nodes to construct learning paths.However, all of them ignore to discuss the rich exercise features contained in the KG.Terefore, the KI-EIR model is proposed to incorporate the multiple dimension KG with the exercise features including the representativeness and informativeness to recommend the exercises to meet specifc students' needs.

Preliminaries
Tis section is divided into three parts.Te frst part presents the problem addressed in this paper.Te second part provides defnitions for several terminologies used

Terminologies
Defnition 1 (informativeness).In general, a valid exercise is expected to reduce the level of uncertainty in an examinee's knowledge state.Tus, the informativeness of an exercise can be defned as the amount of information that the underlying cognitive diagnosis model (M) can acquire from the exercise to update the estimate of knowledge states.Selecting the most informative exercises acts as an efective approach to achieve the informativeness goal.After the student completes the test, the performance of the student with M on the entire tested exercise set is predicted, and the performance is evaluated by using a metric such as the Area Under the Curve (AUC), denoted as Inf(S).

Proposed Method
In  Te MPC function calculates the probability of correctly answering an exercise, which can be predicted by the cog- represent the parameter change in our model when adding the record r ij � < e i , q j , a ij > .Here, θ(R i ) represents the parameters obtained from the current interaction R i of student e i , and θ(R i ∪ r i,j ) represents the parameters after adding the interaction.For each exercise e j , the MPC function is defned as follows: ∆M(r ij ) is approximated by the gradient caused by r ij .Tis approach is particularly efective for models trained using gradient-based methods, such as neural models.
Based on the MPC score function, we select exercises from the untested exercise set to form the candidate exercise set E C .We calculate the MPC for each exercise and select the top-K exercises with the highest informativeness.
At the same time, this component also provides the KI-EIR framework with exercise informativeness interpretability, which means that the KI-EIR framework just needs to select the exercise with high informativeness and representativeness to recommend exercises.

Exercise Representation Component.
Te exercise representation component is the second step of the KI-EIR framework, where we extract exercise embedding (e * ) and skill embedding (s * ) by considering the heterogeneous interactions between students, exercises, and skills.
We employ the Graph Convolutional Network (GCN) model to generate embedding representations of exercises and skills, capturing their static relationships.Before applying the GCN model, we defne the neighbors of exercises and skills based on three meta-relationships: exercise-student-exercise (eSe), exercise-skill-exercise (eKe), and skill-exercise-skill (kEk).In the eSe and eKe relationships, the exercise neighbors are exercises answered by the same student or covering the same skill.In the kEk relationship, the skill neighbors are skills contained in the same exercise.To propagate information in the GCN, we use two matrices: the exercise relation matrix (R E ) and the skill relation matrix (R S ), which capture the high-order information.Ten, we apply the GCN model to generate the hidden embedding representations of exercises ( e) and skills ( s).
Each convolutional layer in the GCN model updates the nodes based on their own state and the state of their nearest neighbors.Let node i denote the state of an exercise or skill and Node(i) denote a group of nodes representing the neighbors of node i .Te exercise at the l-th layer can be computed as follows: where w ı and b ı represent the weight matrix and bias of the GCN layer, respectively, and RELU( ) denotes the activation function used in the GCN model.Te hidden embedding representations obtained from the GCN model capture the static relationships between exercises and skills.However, they do not consider the similarity among exercises and skills when generating their embeddings.To incorporate the deep semantics of exercises and skills, we use exercise-level attention and skill-level attention mechanisms.Tese attention mechanisms learn the semantic relationships between students and exercises, generating the fnal embedding representations e * and s * .Te formulation for e * is as follows: where R E is the exercise relation matrix, is the scaling factor, and W K , W Q , and W V are projection matrices.
Te process for obtaining s * is similar to that of e * .Te diference lies in using the hidden embedding representation of skills as input to the attention mechanism, and the skill relation matrix is used instead of the exercise relation matrix when calculating β E .
At the same time, the connection between students, exercises, and skills is demonstrated by the heterogeneous graph.Te information encompassed in the heterogeneous graph is subsequently extracted using a graph neural network, which produces the matching skill embedding and exercise embedding.As a result, the KI-EIR framework enables interpretation of the learning interactions.

Knowledge Importance Component.
After the procedure of selecting exercises from the untested exercise set to the candidate exercise set, the knowledge importance component aims to compute the knowledge importance weight W K as input for the next selection procedure: the representativeness component.Previous studies [48] have shown that organizing educational resources into diferent classes helps students understand learning profles and enables them to logically organize and recall knowledge.Terefore, in our knowledge graph, we separate knowledge concepts into diferent classes to learn the weight of knowledge concepts (KCs).Tere also exists some incompleteness or outdated information in the knowledge graph, which results in the suboptimal recommendation results.Terefore, we also invite two domain experts to correct and validate the information in the knowledge graph to reduce the impact of the data problem.
We categorize learning objects into three classes: (1) Subject knowledge: this class contains KCs at the subject level, such as "math," "physics," and "biology," supporting basic knowledge areas like "Ratio," "Geometry," and "Standard Form." (2) Basic knowledge: It is the core of the framework.Tis class includes specifc knowledge felds such as "Proportion" and "Negative Numbers" that are essential for solving specifc tasks.(3) Task: Tis class encompasses practical educational problems like "Factorising into a Single Bracket" and "Expanding Single Brackets."Te task level is the bottom level of our knowledge graph framework.
Figure 4 presents a visual representation of the multidimensional KG framework we employ in our paper.Each class in this framework consists of a hierarchy and associated learning object instances.Te learning objects represent meta-learning resources that are incorporated into the hierarchy and connected by semantic relationships, while the hierarchy refects the knowledge structure of the current class.We establish various relationships, dividing them into intraclass relationships and interclass interactions, to illustrate the semantic connections between learning objects.Intraclass relationships link learning objects within a class, while interclass relationships provide links between educational resources from diferent classes (see Table 2).Our knowledge graph expands the accessibility of learning objects across classes and strengthens connections between cross-class learning objects.Tis graph of knowledge showcases how learned information can be practically applied, deepening e-learners' understanding of the studied information and helping them comprehend how theoretical knowledge can be used in practical scenarios.

Knowledge Point Path Extraction Algorithm.
To determine the relevance of KCs, we explore all possible learning paths through the target learning object and the learning need of the e-learner.We have designed the knowledge point path extraction algorithm, which is based on the multidimensional knowledge graph, to accomplish this task.Te algorithm consists of two phases.
Te frst phase involves calculating the relationship constraints φ based on the learning need.Te getRelation() function is used to determine the relationship constraints φ � (α, β, c . ..) to meet the specifc students' requirements.
In the second phase of the algorithm, a learning path is constructed using the relationship restrictions.Starting with the target learning item, the algorithm generates the learning path by searching for the next learning object associated with a relationship that satisfes the constraints.Te associated learning object serves as a continuation of the search.Te initial learning object of the current learning route is the chosen target learning object.If the current learning item has no related learning objects, the path will consist of only one KC.Te algorithm performs a greedy search starting from the target learning object.
For a detailed description of the algorithm, refer to Algorithm 1.

Knowledge Importance Weight Extraction Algorithm.
Te knowledge importance weight extraction algorithm aims to extract the weights of KCs based on fve skill features.In previous work on quantifying algorithms [49], the feature set F of KCs was proposed to select important KCs, including the level (f 1 ), frequency (f 2 ), connection (f 3 ), similarity (f 4 ), and difculty (f 5 ) of the corresponding KCs.
(1) Te level feature (f 1 ) is designed to extract the level of a KC.By applying the knowledge point path extraction algorithm (KPE), which transforms the one-dimensional knowledge graph into a multidimensional one, the levels of KCs in all related learning paths can be extracted.For example, if the output of the KPE is "A-B-C," where the levels of A, B, and C are 0, 1, and 2, respectively, the diferent knowledge levels of KCs can be extracted based on diferent learning paths.Te level of the particular KC within all learning paths that encompass this KC can be defned as follows: International Journal of Intelligent Systems where N C is the number of learning paths that contain the KC.(2) Te frequency feature (f 2 ) focuses on extracting the frequency of a KC in all learning paths.A greedy algorithm is used to search for the frequency of the KC across all learning paths.Te frequency of the KC can be defned as follows: where N indicates the total number of learning paths.(3) Te connection feature (f 3 ) considers the connections between KCs.When KCs occur in the same learning path, they are considered connected.For example, in a learning path "A-B-C," skill A is connected to skills B and C. Te connectivity can be calculated as follows:

8
International Journal of Intelligent Systems where ConnectSet represents the list of connected KCs and K is the total number of KCs.(4) Te similarity feature (f 4 ) is used to explore the similarity between each skill.It utilizes the skill representations (s * ) and calculates the dot product to measure the similarity between skills: ( where A s � 0 represents the set of exercises where the student answered incorrectly for the exercises containing the KC.Te cognitive difculty of the KC is divided into fve levels if a learner has performed fewer than fve attempts to answer the exercise.Te average cognitive difculty for diferent learners on the KC is defned as the difculty feature of the KC: where N S is the number of students and N T is the time consumption. In this paper, we consider the novelty and popularity of KCs.To satisfy diferent learning preferences, we apply a weighted method (W) to combine the fve skill features using equation (12).Each weight (w) in our learning preference options corresponds to a certain feature (f i ).Te weight distribution details are as follows.(6) Novelty: When considering the novelty of exercises, we set w 1 � 0.5, w 2 � 0, w 3 � 0, w 4 � 0, and w 5 � 0.5.We consider the level and difculty of skills to represent the inherent novelty of exercises.(7) Popularity: When considering the popularity of exercises, we set w 1 � 0, w 2 � 0.6, w 3 � 0.1, w 4 � 0.3, and w 5 � 0. We consider the frequency, connection, and similarity of skills to model the popularity of skills.
Te weighted method calculates the weight of novelty (W nov ) and popularity (W pop ) for each KC.
Finally, by combining novelty and popularity, the skill importance weight (W K ) is obtained.Te formulation is as follows: where tanh � (e z − e − z )/(e z + e − z ).
Te knowledge importance weight extraction algorithm allows us to determine the weights of KCs based on their skill features, incorporating novelty and popularity considerations.Tese weights play a crucial role in assessing the importance of KCs within the learning context.
As a result, we further discuss the relationship between skills, which contributes to develop the next exercise factor in the cognitive diagnosis model and representativeness evaluation metrics.
At the same time, the interpretability of the knowledge importance component can be discussed from two aspects.Te frst aspect is domain knowledge interpretability.As the mentioned above, two experts in this feld from our university have labeled the knowledge graph to update any inaccurate or missing information.Te second aspect is the skills importance interpretability.We delve more into the fve qualities of skills: level, frequency, connection, similarity, and difculty.Tese fve metrics are interpretable in the knowledge importance component.Terefore, we believe that the knowledge importance component is also interpretable by incorporating the knowledge graph interpretability with skills interpretability.

Exercise Representativeness Component.
After collecting a candidate set E C of highly informative exercises and obtaining skill importance weights and exercise embeddings, this section focuses on designing the exercise representativeness component.Te goal is to select exercises from E C into the tested exercise set E T that exhibit high representativeness.To assess the informativeness of exercises, a novel scoring function is proposed to evaluate the knowledge coverage of E T .An approach is then devised to gradually add more exercises to E T until it achieves the highest coverage score.
Te knowledge coverage of the tested exercise set E T can be estimated by checking whether the corresponding KCs exist in E C .Terefore, a straightforward knowledge coverage function, denoted as SKC, is designed as follows: International Journal of Intelligent Systems where Cov(KC, Q C ) � 1 indicates that the KC is involved in E C .However, SKC has two obvious faws.Firstly, it considers all KCs equally and fails to distinguish the importance of each KC.Secondly, the value of Cov is binary and does not refect the number of exercises.For example, if the math quiz focuses on "Fractions" rather than "Real Numbers," it is more appropriate to select more fractions-related problems rather than simply covering both topics equally.Choosing nine exercises about "Real Numbers" and one exercise about "Fractions" should be equivalent to choosing fve exercises from each.
To address these faws, the exercise weight knowledge coverage function (EWKC) is proposed to calculate the knowledge coverage of the tested exercise set E T .Specifcally, the EWKC function combines the number of exercises to generate the knowledge coverage of E C .Moreover, to account for the importance of exercises and KCs, skill importance weights obtained from the knowledge importance component are incorporated.Te EWKC function is defned as follows: where W K represents exercise weight for the concept k, which is discussed in the knowledge importance component.Te ECov function counts the occurrence of a KC in E T and applies a sigmoid function to ensure the coverage value lies within the range of 0 and 1.Finally, the EWKC function calculates the weighted average knowledge coverage over all KCs, with weights determined by the importance of the corresponding skills.However, the EWKC function only considers the impact of skills and ignores the infuence of exercises.To better defne the representativeness of exercises, the response matrix (P n ) and dissimilarity matrix (  E) are introduced.

Response Matrix. Te response matrix P n of size |S| ×
N e is designed, where each element is defned as follows: Te matrix P n stores the probability of students answering the next exercises correctly.|S| represents the number of students, |C| represents the number of exercises done by each student, and N e represents the total number of exercises.If an exercise was not done by a student, the corresponding columns are flled with zeros.Tese columns correspond to the N e − |C| hypothetical exercises that students cannot answer correctly and will be replaced by other exercises in the future.

Dissimilarity Matrix.
To consider exercise representativeness, the dissimilarity between exercises, denoted as  E, is defned as follows: where  e * represents the exercise representation based on the exercise representation component.
Te fnal knowledge coverage combines skill features and exercise features to measure the representativeness of exercises.It is defned as follows: where α 1 , α 2 , and α 3 are hyperparameters in the model.Te representativeness of exercises is evaluated based on the weighted sum of the knowledge coverage of KCs in E T , the probabilities stored in the response matrix P n , and the dissimilarity matrix  E. Te exercise representativeness component aims to select exercises from the candidate set E C into the tested exercise set E T with high representativeness.Te knowledge coverage of E T is evaluated using the EWKC function, which incorporates skill importance weights and exercise numbers.Te response matrix P n and dissimilarity matrix  E are introduced to consider exercise features and representativeness.Te fnal knowledge coverage is calculated by combining skill and exercise features.Te hyperparameters α 1 , α 2 , and α 3 control the relative importance of these features in measuring exercise representativeness.
Tis component also provides the KI-EIR framework with another aspect of exercises' representativeness interpretability.It means that the KI-EIR model can evaluate the quality of exercise based on informativeness metric and representativeness metric.Ten, according to the quality of exercises, diferent exercises are recommended to diferent types of students.

Cognitive Diagnosis Model. Tis section introduces
a novel cognitive diagnosis model, NACD, within the KI-EIR framework.Te NACD model aims to estimate the knowledge state of students and make predictions on the tested exercise set.To achieve accurate diagnosis, the NACD model incorporates student factor modeling and exercise factor modeling.Te student factor modeling focuses on capturing students' behavior during exercise training, specifcally their slipping behavior and guessing behavior.Additionally, the exercise factor is modeled based on the output e * generated by the exercise representation component.Te exercise factor aims to explore the relationship between exercises and skills and utilizes the exercise-skill relation matrix as input for a relative-distance attention mechanism to generate the exercise factor representation.

Exercise Factor.
To model the relationship between exercises and skills, an exercise-skill relation matrix Q is constructed to map exercises to skills.In order to consider 10 International Journal of Intelligent Systems the relationship between the skills, the skill importance weight: W K , obtained from the knowledge importance component, is considered to generate the exercise-skill relation matrix.Specifcally, when an exercise e i contains a knowledge point k, the corresponding position will be replaced by W K , which considers the popularity and novelty of skills.Based on the exercise-skill relation matrix, the knowledge point vector of exercise e can be obtained as follows: where Q ∈ R N e ×K and x e ∈ 0, 1 N e ×1 represents the one-hot representation of exercises.Te exercise embedding, composed of the corresponding K V values, is then used as input for the relative distance mechanism.Te relative distance between input sequences, represented by x i and and a K i,j .To prevent unbounded values, the edge vectors are clipped using the function clip(x, k) � max(− k, min(k, x)), where k represents the maximum absolute value.Te associated relative position representations for W K and W V are defned as Finally, the relative position attention mechanism outputs the exercise factor representation F E .Te following equations describe the process: Te edge vectors are then utilized as input for the attention mechanism.Te attention weights a i,j are calculated based on the relative distances e i,j , which are computed as In the above equations, W Q , W K , and W V represent the query, key, and value matrices, respectively, and d F E denotes the dimension of F E .

Student Factor.
Te student factor, denoted as F S , models the representation of students based on their knowledge profciency vectors in diagnosing their states.Te formulation for the student representation is as follows: where x S ∈ 0, 1 S×1 represents the one-hot encoding of students and A is a trainable matrix within the framework.Next, we introduce two factors related to student behavior: the slipping factor and the guessing factor.Te slipping factor captures situations where a student attempts to complete an exercise but provides an incorrect answer due to careless mistakes.Te guessing factor represents instances when a student may guess an answer because they have not fully mastered the corresponding skills.Te formulations for the slipping factor and the guessing factor are as follows: where B and C are trainable matrices.
To generate the student factor representation F S , we incorporate H S , H Slipping , and H Guessing as inputs to a twolayer linear network.Te input for the linear network is defned as follows: Subsequently, X serves as the input for the linear network: where σ( ) represents the sigmoid activation function.

Student Performance Prediction.
To predict student performance on a tested exercise set, E T , we combine the student factor F S and the exercise factor F E .Te formulation for this prediction is as follows: where W s and W e are weighted matrices, b p is the bias vector, and p represents the likelihood that the student will answer the subsequent interaction exercise, denoted as e N e +1 , correctly.

Experiments
In this section, we conduct experiments using two public educational datasets: the Assistment dataset and the Eedi dataset to investigate the performance of our selection strategy: KI-EIR.Te experiments are organized into fve aspects.First, we compare the novel cognitive diagnosis model, NACD, with baseline models in terms of AUC and ACC matrices to validate the efectiveness of the NACD model.Second, we compare the performance of our selection strategies with the random strategy and EM strategy using the informativeness metric.Te experimental results demonstrate that our strategy outperforms other selection strategies.Tird, we discuss the performance of our strategies compared to other strategies using the representativeness metric.Next, we present the visualization of the recommendation process of the KI-EIR strategy, EM strategy, and random strategy using heatmaps to highlight the excellent performance of the KI-EIR strategy.Finally, we explore the key components of the KI-EIR method further on the Eedi dataset.
International Journal of Intelligent Systems 5.1.Dataset Descriptions.We use two datasets in this paper: the Assistment (ASSIST) dataset and the Eedi dataset.Te Assistment dataset is generated by collecting information from the Assistment Online Tutoring Systems.It is an opensource dataset for researchers to perform cognitive diagnosis tasks.Te experiments in this paper are conducted on the problem bodies of this dataset.Te Eedi2020 dataset is obtained from the NeuralIPS platform, which collected 1,380,000 records from 4918 students.Each student participated in an average of 280 workouts.For this study, we use problems 3 and 4 from the NeuralIPS dataset to compare the performance of our model.
Te statistical information of the Assistment and Eedi2020 datasets is shown in Table 3.

Baselines and Selection Strategies.
To evaluate the KI-EIR method, we test it based on four standard cognitive diagnosis models: IRT, MIRT, NCDM, and KaNCDM.Te details of these models are as follows: (1) IRT [50]: Tis is the most popular CDM in computerized adaptive learning.IRT and conventional approaches focus on developing and applying multiitem scales to assess "latent variables" (hypothetical constructs).( 2) MIRT [51]: MIRT is a multidimensional extension of IRT that demonstrates its potential for estimating several characteristics of ability.Te IRT-based methods have also been expanded to accommodate MIRT.
(3) NCDM [30]: Tis cognitive diagnosis model is the most standard model in the educational data mining feld.Te NCDM model employs neural networks to learn the complex relationships of exercises in order to produce accurate and understandable diagnosis results.(4) KaNCDM [52]: Tis framework is further developed based on NCDM to estimate the current knowledge state of students.Te KaNCDM improves upon NCDM in terms of feasibility, generality, and extensibility to make predictions.Extensibility is further discussed from two aspects: content-based extension and knowledge-association-based extension.
We also apply two selection strategies to compare the performance of our selection strategy.Te details of these selection strategies are as follows: (1) Random strategy (RM): Tis strategy serves as the baseline for the selection strategies.It randomly selects exercises from the exercise set without considering the overall performance.(2) Expectimax strategy [53] (EM): Expectimax is a treebased, brute-force MDP search algorithm that determines the expected utility of each action.It assumes that the agent will always choose the option that maximizes utility and that the environment will generate a subsequent state using a stochastic process after an action has been taken.Specifcally, we treat the exercises as the state and the knowledge concept importance as reward to recommend the exercises to students.
However, after observing the volume of datasets and the overall structure of the KI-EIR framework referring to Figure 3, we can fnd that the KI-EIR framework concerns about two types of networks including the graph neural network in the exercise representation component and fully connected network in the cognitive diagnosis model.Te Eedi datasets possesses a large volume of data containing 138W student records.Terefore, in order to efciently run the KI-EIR framework in the intelligent tutoring system, the following optimization tricks for this framework are introduced.Te frst trick is based on the idea of exchanging space for time.We pretrain exercise representation component to store the exercise embedding and skill embedding rather than running this component in each epoch when we train the KI-EIR framework.Te second trick involves integrated training and unifed optimization for all components.
Te framework settings for the KI-EIR framework are described in this part, as illustrated in Table 4.

Results and Discussion
. In this paper, we evaluate the prediction task of the cognitive diagnosis model based on whether an exercise was successfully answered in the next interaction.We use the Area Under the Curve (AUC) and Accuracy (ACC) metrics to measure students' performance in making predictions.A higher AUC or ACC value indicates better cognitive diagnosis performance, while a value of 0.5 suggests random selection.Te cross-entropy loss function is used.
A binary value represents the efectiveness of exercise recommendation.We measure the performance of our selection strategy with the metric of informativeness metric and coverage metric.Te informativeness metric (Inf(s)) is used for measuring the informativeness of the selection strategy in the exercise recommendation.Te AUC metric is adopted to indicate the informativeness of the selection strategy referring to the following formula: Te coverage metric (Cov(s)) is accepted to measure the representativeness of the selection strategy.Cov(s) is computed based on the percentage of knowledge concepts covered by the strategy-selected exercises.In order to discuss the performance deeply, the train-test curve of the NACD model is introduced to measure the performance.Te details are as follows.As illustrated in Figure 5, the vertical dimension means the AUC of the current NACD model in the corresponding epoch and the horizontal dimension presents the epoch of current training or testing process.We can observe that the performance of this model is relatively stable on both the training set and the test set.On the training set, the accuracy of the model remains around 84.5% and 80.3% on ASSIST and Eedi, respectively.On the test set, the model's accuracy fuctuated around 77.2% and 77.5%, respectively.According to the performance of train set and testing set, we can also draw a conclusion that the NACD model does not have overftting problems.

Informativeness Comparison.
In this part, we focus on comparing the diferent informativeness performance of strategies using the AUC metric (equation ( 25)).Figures 6 and  7 present the results at the middle (step t = 10) and fnal (step t = 20) stages of the tests.We compare the KI-EIR strategy with the EM strategy and random strategy by applying diferent cognitive diagnosis models: NACD, IRT, and MIRT.
Te random strategy performs the worst among all strategies on the Eedi dataset and provides the baseline accuracy for the experiment.Te EM strategy, which utilizes a Markov decision process, outperforms the random strategy on the Eedi dataset by considering the impact of interactions between exercises and students.However, the EM strategy performs worse than the IRT model on the ASSIST dataset due to the large number of exercises, which leads to inaccurate predictions when each exercise is treated as a state.Te KI-EIR strategy, which incorporates exercise and skill features from the knowledge graph, outperforms all models on both datasets, indicating its efectiveness in achieving the informativeness goal.Te specifc reasons are as follows.Te frst is that two innovative exercise evaluation metrics are designed including the representativeness and informativeness.By analyzing two exercise metrics, the quality of exercises is correctly modeled to recommend the proper exercises to students.Te second is that the KCs in the knowledge map are comprehensively explored to generate the skill importance weight.Te last is that the KI-EIR strategy enhances the recommendation system by making the process more fexible without requiring modifcations to the general methodology.Terefore, the KI-EIR framework outperforms other selection strategies on both datasets.

Representativeness Comparison.
Te representativeness comparison focuses on exploring the performance of diferent selection strategies in terms of the coverage metric.As illustrated in Figures 8-10, the EM strategy performs better than the random strategy because it considers the impact of behavior when selecting exercises on the Eedi dataset.However, the EM strategy performs worse than the random strategy on the ASSIST dataset due to the fact that there exist too many states to result in the inaccurate prediction of Markov decision process.
Compared with previous two selection strategies, the KI-EIR clearly measures the quality of the exercises by defning two exercise quality metrics such as the representative and informativeness metrics.Tis framework also considers the correlation between KCs in the multiple dimension knowledge graph to recommend the related exercises.Terefore, the KI-EIR framework combines the exercises' quality metrics with the KCs to recommend the exercises and shows signifcant improvements in coverage metric compared with other strategies, reaching close to 0.8 and 1 on Eedi and AS-SIST, respectively.11 depicts the diferences in performance based on diferent selection strategies (KI-EIR, EM, and random) by observing the color change.Te vertical dimension represents the selection strategies (KI-EIR, EM, and random), while the horizontal dimension represents the diferent testing phases from 0 to 19.Te color of the heatmap represents the performance of students when recommended with appropriate exercises, with stronger colors indicating a greater impact of the selection strategies.
According to Figure 11, the random strategy performs worse than the other selection strategies and is treated as the  Te conclusions drawn from the experiments are as follows.Firstly, the individual components of informativeness, exercise representativeness, and knowledge importance do not yield satisfactory outcomes when used alone.Te performance gradually improves as more components are incorporated into the KI-EIR method.Secondly, when the exercise representativeness component (ER) is involved in the KI-EIR framework, the performance becomes better than IF, increasing to 65.6% and 67.2%.Terefore, the exercise representation component (ER) is more important than the informativeness component (IF) according to the experimental results in this section.Tirdly, since the knowledge importance component (KI) provides the skill weight for ER, when the ER component is removed, the KI component is also removed.Consequently, the inclusion of the KI component leads to greater performance improvement compared to the IF component.
Table 6 presents the results of the ablation study of the KI-EIR model based on the IRT model on the Eedi and ASSIST datasets.6 demonstrate that the KI-EIR model outperforms the ablated versions in terms of AUC on both the Eedi and ASSIST datasets.Te inclusion of all components in the KI-EIR model leads to the best performance.Te ablation study confrms the importance of the informativeness, exercise representativeness, and knowledge importance components in the KI-EIR model, with their combined efect resulting in improved performance.
In order validate the performance of these tricks for these key components in the KI-EIR framework, we test the time consumption of recommending exercises for a student.Te details are shown in Figure 12.
Te conclusions drawn from the experiments are as follows.Firstly, the response time of KI-EIR, EM, and RM strategies is all below 1 s compared with the maximum usertolerable page loading time: 2 s.It means that the optimization tricks are proved as efcient to reduce the complexity of the KI-EIR framework.Secondly, three selection strategies spend more time on the ASSIST dataset than on the Eedi dataset due to the fact that the exercise number of the ASSIST dataset is much larger than the Eedi dataset.

Conclusion and Future Work
In this paper, we proposed a comprehensive framework, the KI-EIR (Knowledge Graph-Enhanced Exercise Item Recommendation) model, to address the challenge of providing informative and representative exercises in cognitive diagnosis tasks.Te KI-EIR model consists of four key components: informativeness, exercise representation, knowledge importance, and exercise representativeness.We can also observe that the KI-EIR possesses two types of scalability.Te frst scalability is the model scalability.We can use IRT or MIRT to recommend the exercises though these cognitive diagnosis models do not provide excellent performance referring to the experiments: the informativeness comparison and representativeness comparison.Te second scalability is the dataset scalability.We compare the large dataset: Eedi2020, with the small dataset: ASSIST, to validate our performance.Te results indicate that our framework is extensible in both two types of datasets.
Te informativeness component estimates the informativeness of each exercise and selects exercises with high informativeness from the untested exercise set to the candidate exercise set.Te exercise representation component utilizes the Graph Convolutional Network (GCN) model and two types of relation attention mechanisms to generate skill embeddings and exercise embeddings.Te knowledge importance component applies the knowledge point extraction path algorithm and knowledge importance weighted algorithm to calculate the skill importance weight.Finally, the exercise representativeness algorithm combines the skill importance weight, exercise weight, knowledge coverage, response matrix, and dissimilarity matrix to select exercises from the candidate exercise set into the tested exercise set with high representativeness.Te NACD model is then employed to accurately estimate the state of students based on the selected exercises.Te recommendation process is also interpretable.Te four aspects of interpretability of the KI-EIR framework can be discussed.Te frst aspect is the domain knowledge interpretability.Te knowledge graph is labeled by the two domain experts in our university to correct these outdated or incompleteness information in Section 4.3.Te domain knowledge in the validated knowledge graph is inherently interpretable because we can easily observe the specifc knowledge concepts and the relationship between them.Te second aspect is exercise informativeness and representativeness interpretability.In order to measure the quality of exercises and make the recommendation process interpretable, two metrics are proposed to measure the exercises including the informativeness and representativeness in Section 4.1 and Section 4.4.Terefore, the exercise recommendation process of KI-EIR is interpretable because the KI-EIR framework just needs to select the exercises with high informativeness and representativeness.Te third aspect is the learning interaction interpretability.Te heterogeneous graph in Section 4.2 provides the insight of the relationship between students, exercises, and skills.Ten, the graph neural network is applied to extract the information contained in the heterogeneous graph and generate the corresponding skill embedding and exercise embedding.Terefore, the modeling of the learning interactions is interpretable in the KI-EIR framework.Te fnal aspect is the skill importance interpretability.We further explore the fve properties of skills in Section 4.3.2including the level, frequency, connection, similarity, and difculty.Tese fve properties of skills are interpretable.As a result, the KI-EIR, which can demonstrate interpretability from four aspects, can provide the interpretable and transparent recommendations to students.
Te KI-EIR model demonstrates promising results in improving cognitive diagnosis and exercise recommendation in educational settings.By leveraging the power of knowledge graphs and incorporating multiple components, our framework provides accurate and informative exercise recommendations for students, thereby enhancing their learning experience and academic performance.Te proposed framework opens up new avenues for research and development in the feld of educational data mining and cognitive diagnosis.However, the KI-EIR framework still possesses some limitations to recommend the exercises to students.Te frst limitation is that some metrics are crucial for the recommendation system, but it is hard for the KI-EIR framework to consider these metrics in current stage such as the user satisfaction, user engagement, and long-term learning outcomes.Te reason is as follows.Te frst is that these metrics require a large number of real users involved after the platform is put into practice.Te second is that data of other recommendation systems cannot be directly referenced because diferent platforms possess diferent user satisfaction, user engagement, and long-term learning outcomes so that other recommendation system datasets cannot be referenced directly.Te second limitation is that some nuanced factors are valuable for KI-EIR, but these factors are hard to be quantifed based on the datasets such as nuanced pedagogical methods for different teachers and individual learning styles.

International Journal of Intelligent Systems
In future research, we suggest exploring the application of reinforcement learning techniques such as Deep Q-Network (DQN) to further improve the selection of exercises with high representativeness and informativeness.Tis approach can help reduce the time required for the selection phase and enhance the efciency and efectiveness of the cognitive diagnosis process.At the same time, we also need to further discuss the optimization methods of our intelligent tutoring system to reduce the time consumption of the recommending process.Finally, we will put the KI-EIR framework into practice and obtain some valuable metrics such as user satisfaction, engagement, and long-term learning resources to further improve our framework.

Figure 4 :
Figure 4: One-dimensional KG framework used in previous studies (left).Multidimensional KG framework employed in our paper (right).Dotted lines represent links between classes, solid lines denote interactions within classes, and nodes of various colors represent learning items in diferent classes.Tis transformation process is assisted by two domain experts in our university.

Figure 5 :Figure 6 :Figure 7 :
Figure 5: Te train-test curve of NACD model on ASSIST and Eedi in terms of AUC metric.

Figure 11 :
Figure 11: Heatmap illustrating the performance of selection strategies on the Eedi2020 dataset.
Toy example of cognitive diagnosis.Te student frstly chose three exercises to test.Ten, according to the test results, the cognitive diagnosis model estimates the student profciency on each knowledge concept.
4 Figure 2: Toy example illustrating the heterogeneous interaction between students, exercises, and skills.

Table 2 :
Designed relationships in the knowledge graph.
Input: Students' historical response dataset: D � s 1 , s 2 , . . .s N , s i � (e i , s i , t i ); Te knowledge level graph G; Output: Te all possible learning paths: P. While fndAllPaths (KC u ) do R � getRelations (KC u ) (fnd the relations connected with KC u ) if r | r ∈ R, r ∈ ⊘ { } then P.addPath(p); Add the new path into the path set else while all r ∈ R do list(KC u ) � getConnectObject (KC u ) (obtain the connected objects of KC u with r) p.addElement (list(KC u )) (put the element in the learning path) Recursively apply fndAllPaths (KC u ); end end ALGORITHM 1: Knowledge point path extraction algorithm.
5) Te difculty feature (f 5 ) leverages students' interactions to indicate the difculty of skills.It models the cognitive difculty of skills based on students' behavior when they attempt exercises containing the same skill at diferent timestamps.Te cognitive difculty of a skill set for each student (S i ) at timestamp t is represented by π S i ,KC,t :

Table 5
presents a comparison of the results of baseline models with the Neural Attentive Cognitive Diagnosis (NACD) model.Te NACD model outperforms all baseline models in terms of AUC and ACC on both the ASSIST and Eedi datasets.Te MIRT model demonstrates better performance than the IRT

Table 3 :
Statistics of the Assistment and Eedi datasets.

Table 4 :
Te framework setting for the KI-EIR framework.

Table 5 :
Comparison of results of baseline models with the Neural Attentive Cognitive Diagnosis (NACD) model.

Table 6 :
Ablation study of the KI-EIR model based on the IRT model on two datasets.Figure 12: Te time consumption comparison on Eedi and ASSIST.International Journal of Intelligent Systems Te results in Table