English Grammar Error Correction Algorithm Based on Classification Model

,


Introduction
English grammar error correction algorithm is an important task of natural language processing, which uses computer programming technology to automatically recognize and correct the grammar, spelling, word order, and punctuation errors contained in English text written by nonnative language learners [1]. Due to the huge number of language vocabulary, the complexity of grammar rules, the ambiguity of semantics, and the ambiguity of speech, only the use of computer programming technology to analyze and study these problems will help improve the ability of language processing and recognition and enable natural language processing machines becoming machines with human intelligence [2]. ere currently are three methods of grammar error correction: the first is a rule-based method, which formulates specific error correction rules to correct specific error types and thus largely depends on the quality of the rules and can only modify certain types of errors; the second is a statistics-based method, which uses related information such as words to obtain text features, model the language, and select appropriate statistical models to correct errors in the text; the third is a depth-based method, which uses word vector representation to construct a deep neural network to correct errors in text end to end regardless of specific error types [3]. erefore, all of these require a more accurate and efficient algorithm to correct English grammar, which is the main topic of this paper [4].
Classification is the core problem in machine learning and data mining and there are many classification models, such as neural networks, decision trees, genetic algorithms, and support vector machines; in recent years, information and structure that have been extracted from reliable data classifiers have gradually become a hot research topic [5]. For traditional classification methods, the classification results are very clear, but if this clear classification method is used to classify fuzzy objects, it will not be in line with reality. Most classification models are based on the assumption that the distribution of classes in the data stream is roughly balanced and the designer usually assumes that the number of samples contained in the training data set is roughly the same [6]. is basic assumption is applied to many real data streams and it mainly uses the rules provided by the above modules to complete the automatic grammar inspection and correction function. For any automata constructed according to certain rules, the automata defines a language, which is composed of all the sentences that the automata can recognize and grammar and automata express language from the perspective of equivalent generation and recognition [7]. With this classification algorithm and the continuous increase of training samples and the continuous progress of the learning process, the classification accuracy of the resulting classifier will continue to improve, thereby reducing time and saving storage space [8].
On the basis of summarizing and analyzing previous research works, this paper expounded the research status and significance of English grammar error correction algorithm, elaborated the development background, current status, and future challenges of classification model, introduced the methods and principles of feature extraction method and dynamic residual structure, constructed a basic model for English grammar error correction based on classification model, analyzed the classification model and translation model of English grammar error correction, proposed the English grammar error correction algorithm based on the classification mode, performed the analyses of the model architecture and model optimizer of the grammar error correction algorithm, and finally conducted a simulation experiment and its result analysis. e study results of this paper provide a certain reference for the further researches on English grammar error correction algorithm based on classification model. e detailed arrangement is arranged as follows: Section 2 introduces the methods and principles of feature extraction method and dynamic residual structure; Section 3 constructs a basic model for English grammar error correction based on classification model; Section 4 presents the English grammar error correction algorithm based on the classification mode; Section 5 conducts a simulation experiment and its result analysis; and Section 6 is conclusion.

Feature Extraction Method.
e mathematical system can give each node a minimum value; that is, the damping coefficient is added to the previous algorithm formula, and finally, the calculation formula for the node value is as follows: where Q i (x) is the total number of nodes in the network; d i is the set of adjacent nodes of node i; r i is the outdegree of node i; n is the total number of nodes; A i (x) is the directed edge weight of node i; and B i (x) is the set of nodes with directed edges pointing to node i. e back-propagation needs to calculate the error by comparing the prediction result of the forward propagation with the label of the training sample. For example, through using the squared error loss function, for the multiclass recognition problem of j categories and m training samples, the final output error function of the network is expressed as follows: where W j is the mapping matrix; a j is the jth dimension of the label corresponding to the mth sample; b j is the jth output predicted by the network for the mth sample; k is the error propagated to the previous layer; and c is the previous layer output.
Assuming that the sample vector X � (x 1 , x 2 , . . . , x n ), the correlation degree of the vector X can be defined as where R (X) is the expectation of vector X; obviously, the greater the value of E (X), the greater the correlation of vector X. In particular, if x 1 , x 2 , . . . , x n are independent of each other, then E (X) � 1; therefore, E (X) can determine the weight coefficient in the above formula, but E (X) needs to be estimated by the sample. According to different methods, each basic classifier t i (x) is given a weight y i (x), and finally, each basic classifier is combined to form an overall classifier T i (x) to predict and classify the data in n, and the value of T i (x) and the calculation formula are as follows: When the data class label in n is unknown, the T i (x) ensemble classifier considers the class distribution of n−1 to be the closest to the class distribution of n.
When integrating the projection index value, it is required that the local projection points of the projection value U (i) are as dense as possible, preferably condensed into several points, and the projection points are scattered as much as possible on the whole. erefore, the projection index function can be expressed as where U (i) is the average value of the sequence; C (x) is the window radius of the local density; D (x) is the distance between samples; p i is a unit order function; s i is the number of indexes of the sample; and g i is a sequence normalized to the characteristic value of the indicator.

Dynamic Residual
Structure. e sample set of each index value is supposed to be x(i, j)|i � 1, 2, . . . , n; j � 1, 2, . . . , m}, where x (i, j) is the jth sample of the i index value and n and p are the number of samples and indexes, respectively. In order to eliminate the dimension of each index value and unify the variation range of each index value, the following formula can be used for extreme value normalization: where l ij and m ij are the maximum and minimum values of the index; α ij is the normalized sequence of the index eigenvalues; β ij is the standard deviation of the projection value; and c ij is the local density of the projection value. e mapping process is the density distribution formed by unsupervised clustering, and then the mapping relationship is formed through linear combination. e pattern vector uses nonlinear radial basis functions to obtain the competing attribution of each cluster center, and the most common representation of the radial basis function is as follows: where V i is the output value of the middle layer i; e is the pattern vector; w i is the pattern vector of the cluster center corresponding to the middle layer; and u i is the distribution width of the cluster center points, which can generally be attributed to this group according to V i and training samples. Information entropy represents the uncertainty of statistical characteristic quantities, which is defined as where N is the training data set; k is the behavior category K � 1, 2, . . . , K { }; q i is the proportion of the ith type of behavior data in the training data set N; n is the number of thresholds; and h is the step size. e adjacency matrix is used as the input of the algorithm, and the adjacency matrix vector of n nodes is used as the n-dimensional feature of the node. According to the multiple distances between the node and the i cluster centers x i , the final division result is obtained as follows: where f (x i ) is the distance of point x i ; z i is the internal degree of node i; and t i is the dot product of the output of each module and the weight.
In the preprocessing stage, all are converted to lowercase, and the training set and test set are retained without stemming; the feature extraction algorithm uses the algorithm, and the word representation uses weights: where G i (x) is the frequency of word x appearing in document i; f i (x) is the total number of documents in the training set; o i (x) is the total number of words in the training set; and h i (x) is the document where word i appears number.

Classification Model.
In the extraction part of the rule base, the rule base is combined with the corpus to filter out the rules of grammar errors and store them in the rule base to provide rule supplements for the subsequent automatic grammar checking and correction process. e grammar error correction model facilitates and quickly finds a list of documents containing words by establishing a mapping relationship between words and documents and inverting indexes are established offline in advance [9]. At present, the index object of the inverted index not only includes the words in the document, but also usually builds an index on the document itself. erefore, not only the position of these words in a document, but also the storage location of the document itself is stored. When indexing words and documents, the words contained in the document will be automatically extracted to form a word dictionary, and the position of the word in the document will be placed in a fixed-format inverted file. In the inspection and correction part, it mainly uses the rules provided by the above modules to use the corpus, combined with the limited back-off algorithm, to complete the automatic grammar inspection and correction function. is part is the core function of the system and the system also improves the system through the cycle inspection module performance. In the process of automatic grammar checking, the scope of errors is determined by using the different collocation relationships of dependency analysis results ( Figure 1).
In fact, the global model of distributed and streaming big data should be a generalization of the local flow data distributed in multiple nodes, and the main task is to be able to classify and predict the flow of unknown types of data, which is a distributed multiple node's streaming data providing a shared prediction model. In the global classifier, the basic operation operators involve the storage of the local mode, the generation of global training sample data, and the update and maintenance of the global integrated classifier.
at is, through integrated learning technology, when the local patterns of all nodes in a history window reach the central node, the global pattern can be incrementally updated by triggering the global classifier. In theory, the reconstructed training sample set and the original data set need to be equivalent, and at least important statistical parameter values should be maintained. In the local classifier, the basic operation operators include Complexity the acquisition of the current data block, the mining of the local pattern, and the updating and maintenance of the local pattern. According to the block-to-block mining technology, the incremental update of the local mode can be realized in a new data block arrival by calling the local classifier. e purpose of this is to ensure the difference between weak classifiers as much as possible to improve the coverage of the sample and also to improve the learning efficiency of the integrated classifier [10].
Validity means that when a rule is added to the rule base that can cause the wrong sentence to be detected, then this rule is valid. To achieve the effectiveness of the test, the mode can use as many descriptions as possible to express the scene used by the rule according to the situation and then evaluate the test effect. e difference from validity verification is that misjudgment verification estimates more things; misjudgment verification not only checks whether errors are ignored, but also judges the correctness of the errors pointed out. ey use the characteristics of each grammar level to clarify the order and then store the rules according to the order. e reason for this method is that the system is arranged in the order of the frame of the file name, so the system intelligently reads one rule at a time and multiplerule base in the layer rule model is composed of multiple rules, and one section stores a rule accordingly, and it is also a readable file with the rule base index table and the system depending on each other. e system will retrieve the index table according to the order of the index items before and after the index table. When the search reaches the end, it will immediately determine that the index table is empty; at the same time, the level numbers of both are expressed in the algorithm's input parameters.

Translation Model.
ere is a direct relationship between the core words in the main sentence. ey should be put together to form a condensed clause, and the modifiers of each core vocabulary should also be put together with this core vocabulary as the center to form a condensed subclause sentence. If there are redundant components, they are placed on the next level, recursively forming a hierarchical structure. e vocabulary put together according to the grammar relationship forms a new clause, the words within the clause are mutually constrained, and the lower clause is the supplement and modification part of the upper layer. e use of different weights takes into account that different clauses have different influences on the entire sentence. For example, the main part of the whole sentence should contribute more to the correctness of the whole sentence than the main modification component. Among them, the error detection and correction module for general grammar errors provides alternatives for each word and then uses the language model to select the best choice for grammar error correction ( Figure 2). e core of grammar error correction is to use the language model to decode the sentence to be tested. e correction of English gerund collocation errors is another application scenario for testing the effects of hierarchical language models. e gerund collocation is a kind of dependency, so there are certain advantages in applying this language model to the grammar error correction task of gerund collocation. e output result of fuzzy comprehensive evaluation is no longer that an object clearly belongs to a certain category or does not belong to a certain category, but belongs to each category with a certain degree of membership. Such results are often more real and have more information, and x 12 x 13 x 14 x 21 x 22 x 23 x 2 x 1 x 24 sometimes, the actual application requires a clear classification judgment. In this case, it can be classified according to the membership degree of the object relative to each type and the deep model is essentially a deep mapping network structure. e input signal is continuously decomposed and expressed through layer-by-layer mapping in the network and finally forms a multilayer expression about the target, whose main feature is that it does not need to be artificially selected and displayed, but through automatic machine learning, gets a deep representation of the target. Some features that contribute a lot to the classification of rare categories will become unimportant if they are considered in the global context; the words that appear in the rare category are likely to be unusual words in the document set and are likely to be filtered out [11]. e classification accuracy of a particular category will be affected. In the first stage, increasing the feature dimension will add more classification information, so it will improve the classification performance; in the second stage, too many features will increase the classification information and also bring the noise and the classification will become worse. In a typical generative confrontation network, the default generator models a mapping in a continuous space. However, in the grammar error correction task, the input and output of the generator are natural language sentences, which are essentially a sequence of discrete symbols. e typical generative confrontation network cannot be directly applied to the mapping modeling in the discrete space because it is impossible for the optimization algorithm to make minor updates to the generator parameters as in continuous space. e entire generative confrontation network is composed of a generator and a discriminator and the generator is a neural grammar error correction model, which is an encoder and decoder model that uses a sequence-to-sequence framework. e discriminator is a twodimensional convolutional neural network and such a training strategy is fragile [12]. e classification model is responsible for distinguishing whether a corrected sentence is manually annotated or generated by a generator. Once the generator accidentally fails on some samples, so that the discriminator almost perfectly distinguishes the corrected sentence generated by it from the corrected sentence manually labeled, then the generator is immediately lost. According to the reward value feedback from the discriminator, the generator knows that it has generated a poorquality sequence, but the problem is that it does not know the correct direction to update the parameters.

English Grammar Error Correction
Algorithm Based on the Classification Model 4.1. Model Architecture. Misjudgment verification, on the one hand, judges whether there is an error in the rule judgment and is ignored; on the other hand, it also judges whether there is a rule that fails to correctly indicate the error. Misjudgment is a problem often encountered in rule models, which is difficult to conflict between rules. e complete solution cannot be based on the rule model itself, because such consequences often lead to interlocking rules and mutual repulsion to make them too complicated. After the rule model detects the error, the model verifies the change of the sentence probability before and after the modification and determines whether to modify it. Similarly, after this, if the system finds that the modified error is an article or preposition error, the system calls this subsystem to judge the above sentence again. If the judgment is consistent, it is considered credible, as the previous experiments have shown that both the article correction system and the preposition correction system have a good recall rate. If the right and wrong can be verified before the grammar error correction, it is easier to improve the grammar error correction capability of the system (Figure 3). e technology uses random values to initialize the parameters of the model and then iteratively optimizes these parameters to make them closer to the optimal solution. ese iterative optimization techniques ensure that every time they participate in optimization, they will be closer to the optimal value, but they may not be able to provide a method to determine whether the optimal value has been reached [13].
After analyzing a sentence through the dependency syntax analyzer, what is obtained is a set of dependent word pairs that represent the interdependence between the words in the sentence, and this set is also the set of semantic relations between words. e dependency relationship between words in a sentence widely exists in the sentence structure of the text. e semantic relationship between words obtained after the dependency syntactic analysis of the sentence can well reflect the contextual structure information between the words in the sentence. A set of semantic relations between words can be obtained by performing dependency syntactic analysis on all sentences in a text, which can reflect the structural information of the text. e model uses the dependency relationship between feature words obtained after word segmentation of all sentences in the text and dependency syntax analysis to construct a text network. e text network constructed by this method has certain semantic connections between adjacent feature word nodes. Moreover, the text network constructed according to this method can not only reflect the contextual information of feature words, but also avoid the introduction of unrelated edges. In addition, using the dependency relationship between the feature words in the dependency syntax can realize the directed connection between the feature word nodes, thereby constructing the text network into a more accurate weighted directed network.
In the process of compound sentence modeling, according to the number of related words, the compound sentence modeling is divided into double related word compound sentence modeling, single related word compound sentence modeling, and unrelated word compound sentence modeling. For the modeling of double related words or single related word compound sentences, only the conditional probability of the corresponding related words needs to be substituted; for the case of unrelated words, since it involves the semantic research that is currently difficult to break through, here is a simple division of commas symbol. Prepositions indicate the relationship between nouns, pronouns, etc., and other words in the sentence and cannot be used as sentence components in the sentence alone [14]. e preposition and its object constitute a preposition phrase, which is used as an adverbial, predicative, complement, attributive, or prepositional object in a sentence. In future research, the model can study the grammar error correction of prepositions from the perspective of sentence composition and even sentence semantics. Considering automatic grammar error correction for missing errors, such as missing articles, missing prepositions, etc., it is difficult to accurately complete the task. Adding the wrong instance library is the solution to this problem, but the size of the instance library will reduce efficiency. Such methods can achieve high performance under the premise of sufficient annotation data, but there may be problems such as data sparseness and space overhead that affect the performance of grammar error correction, and statistical models require a large number of corpora to train.

Model Optimizer.
Before the training starts, use the corpus preprocessing script to regularize the corpus and then perform model initialization and training. When an exception occurs in any step of corpus preprocessing, model initialization, and model training, a training exception notification will be sent to the administrator for system repair. After the model training is over, according to the training results, use the evaluation script of the previous grammar error correction experiment part to evaluate the model. If the grammar error correction effect is improved, update the grammar error correction model of the grammar error correction module; otherwise, directly end. Similar to grammar error correction, the design of feedback suggestions is also carried out from two aspects [15]. One is the feedback filtering interface itself, which provides its workflow chart; the other is the call flow between modules, which is explained using sequence diagrams. Abstract and extract the system function modules and build them into independent remote services to facilitate other business system calls to add new business functions and a new remote service module (Figure 4). ey effectively realize the decoupling between the system and the modules, and it is also very easy to expand the new business. is module is mainly for implementing service cluster deployment and shielding application servers, using proxy servers to accept requests, making services transparent and isolating the real servers in the internal private network, and improving network security.
When solving the error polynomial equation, the money search method is used to solve the error polynomial equation obtained in the second step, and these solutions of the error polynomial equation are the error bits, and the corresponding error bits are corrected to realize the grammar error correction function of decoding. e error polynomial equation is a polynomial equation used to characterize the position of the error. Each of its solutions is the position where the error occurs. erefore, to realize the grammar error correction function, the error polynomial equation must be obtained first and this mode does not need to support other parallel modes, or even serial mode, like the companion computing [16]. At the same time, after the money search retrieves the wrong location, it will   Complexity automatically correct the data at the wrong location. is algorithm aims to determine whether there is similarity or even homology between two or more sequences by comparison, which aligns the sequences into an alignment table and then compares the base information in each column in turn. Similarly, grammar error correction through common structural features and the accuracy of grammar error correction in this way are better, but for a large number of data sets, sequence alignment is a relatively complex problem, and when looking for similar structural features, some sequences will be discarded, so the possibility of data loss also exists in the result of grammar error correction. Although this design makes full use of context information, which greatly improves the translation effect, the disadvantage of this approach is that no matter how long the previous context is and how much information it contains, it will eventually be compressed into a few hundred dimensional vector, which means that the longer the context length, the more information will be lost in the final state vector and that after the length of the input sentence increases, the final decoded translation result will be significantly worse. In the basic model, every English word has the same contribution to the mode, which seems unreasonable; that is to say, the model that does not introduce attention has no problem when the input sentence is relatively short, but if the input sentence is, it is relatively long. At this time, all the semantics are completely represented by an intermediate semantic vector. e information of the word itself has disappeared, which makes it conceivable that a lot of detailed information will be lost. is is an important reason why the attention model is introduced and its geometric meaning is to perform a weighted summation of the information of all positions in the input sequence, so as to achieve the effect of dynamically obtaining the most relevant subsequence information from the input sequence at any time stamp of the output sequence, that is, the above attention mechanism in the article [17].

Simulation Design.
is study uses a grammar error correction test set for all types of English grammar errors as the experimental object; the dataset contains 1568 sentences, including 32 types of grammar errors. Two authoritative English-speaking annotators independently mark grammar errors and provide correct corrections. e community discovery result that does not contain any prior information depends on the selection of the initial clustering center. When the network structure is complex, the accuracy of the division will be greatly reduced. erefore, some semisupervised community discovery algorithms have been proposed and semisupervised learning uses both labeled data and unlabeled data, which can greatly improve the efficiency of learning. However, existing semisupervised algorithms require a large amount of labeled data, cannot really change the structure of complex networks, and still have the disadvantage of low efficiency. e place where the tag data is added is outside the complete algorithm framework and a complete algorithm includes multiple iterations. is method of adding does not make full use of the iterative process at each step. Prior information intervenes in the division of communities by means of random labeling, which requires a large amount of prior information, but usually manual labeling of prior information is difficult to obtain and expensive. Even if a large amount of prior  Complexity information is added, it still cannot really interfere with the correct division of the network; that is, the randomly added label is not the most needed part of the network division.
When the corpus is applied to different tasks, sometimes it is necessary to preprocess the corpus, such as using natural language processing technology to tag the corpus. When the corpus is small, some complex annotations can be performed on the entire corpus, such as some morphological and semantic annotations. More information can be obtained when analyzing annotated corpora, which are widely used in computational linguistics, speech recognition, machine translation, and other fields. e mode extracts relevant information for the cleaned web pages, such as the keywords and keyword positions contained in the page content, and then uses the relevance algorithm to calculate the relevance of each web page to each keyword in the page content and hyperlinks, and finally, it uses these relevant information builds a web index database [18]. Nouns and verbs are not as limited as articles and prepositions, but each noun or verb has a corresponding variation. erefore, in this paper, for noun errors, verb forms, and subject-predicate consistency errors, a change table is established for them by artificial induction and expansion and the size of this change table is not fixed and will gradually become larger after manual expansion. Since there are fewer article errors in the training corpus and more correct sentences, the extracted positive and negative samples are extremely imbalanced, and the imbalance of the samples will have a greater impact on the results of the recognition model.

Result Analysis.
In the training phase, the feature vector is multiplied by the classification matrix, and the category with the highest score is used as the classification result. When the perception is classified incorrectly, the weight corresponding to the correct category is increased, and the weight corresponding to the wrong category is reduced. When the perception is classified correctly, there is no need to modify the matrix. Finally, all the classification matrices in the training process are added and averaged to obtain the final weight matrix. Obviously, compared to the voting naive perception, the average perception only records an additional cumulative value at any time, so it will be relatively efficient in terms of memory usage and training speed [19]. Compared with the statistical classification model, this deep classification model automatically learns the context information of the target word by a neural network, so complex feature engineering is not required ( Figure 5). However, when the deep context model processes the context information, it not only inputs the native words of the sentence, but also the part of speech corresponding to the word is also added to some error types, so that the entire model has a good grammar error correction ability. Confusion network is a weighted directed graph, which has the characteristic that each path from the start node to the end node passes through all other nodes; it uses a neural network to extract the context information of the target word and does not require complex artificial feature engineering and the previous classification model.
All operations in the feature space are implicitly implemented through the inner product kernel function in the original space. at is, any algorithm that only uses scalar inner products can implicitly operate in a high-dimensional feature space through a kernel function. erefore, this idea can be used to implement general linear algorithms in the feature space, but it can achieve nonlinear algorithms relative to the original space.
is will greatly improve the efficiency of learning algorithms, improve existing algorithms, and improve various recognition rates of pattern recognition tasks (Figure 6). For the classifier base, the framework is transparent to them and they are only responsible for training the corresponding sample set and getting an output during testing. As for how the data set is generated, how the output is combined is not for the classifier base [20]. erefore, each classifier base can be weighted in the decoding output process, based on the structure risk and regularization, to define the loss of the weight vector, and an optimal weight coefficient can be obtained by optimizing this loss function. is method can use function techniques to achieve nonlinear grammar error correction. In the training phase, each classifier is not optimized in the direction of the least overall loss; that is, the loss of each classifier is inconsistent with the overall multiclass classification [21]. e grammar error check is based on the spell check as the research premise. After the input text passes through the spelling processing module, the spelling error of the word has been corrected. When the part of speech of the confusing words in the confusion set is the same, the context word feature is more effective than the lexical collocation feature in the disambiguation process of the true word. On the contrary, when the part of speech is different, the lexical collocation feature is more effective. In some specific situations, if these two types of features are used to disambiguate true words at the same time, the problem of feature coverage and conflict may occur (Figure 7). If the correlation between the feature and the head word is relatively high, it means that when the feature appears, the head word is more likely to   8 Complexity appear [22]. Here, mutual information is used to measure the two correlations; after the feature is selected, the weight is assigned according to the contribution of the feature. In addition, if the current feature has a high correlation with other confusable words in the confusion set where the head word is located, it means that its validity for distinguishing confusable words is relatively low, so the correlation and discrimination validity are used here and determine the weight of the feature. e feature of this method is that it combines the advantages of statistical methods and rulebased methods and automatically corrects major grammar errors in English grammar without manual intervention.

Conclusions
is paper constructed a basic model for English grammar error correction based on classification model, analyzed the classification model and translation model of English grammar error correction, proposed the English grammar error correction algorithm based on the classification mode, performed the analyses of the model architecture and model optimizer of the grammar error correction algorithm, and finally conducted a simulation experiment and its result analysis. In the inspection and correction part, it mainly uses the rules provided by the above modules to use the corpus,    Complexity combined with the limited back-off algorithm, to complete the automatic grammar inspection and correction function. In the grammar error correction task, the input and output of the generator are natural language sentences, which are essentially a sequence of discrete symbols and also improve the system through the cycle inspection module performance. e typical generative confrontation network cannot be directly applied to the mapping modeling in the discrete space because the optimization algorithm cannot be in continuous space, and minor updates are made to the generator parameters. For the classifier base, the framework is transparent to them and they are only responsible for training the corresponding sample set and getting an output during testing. e study results show that, with the continuous increase of training samples and the continuous progress of learning process, the proposed English grammar error correction algorithm based on classification model will continue to increase its classification accuracy, further refine its recognition rules, and gradually improve correction efficiency, thereby reducing processing time, saving storage space, and streamlining processing flow. e study results of this paper provide a certain reference for the further researches on English grammar error correction algorithm based on classification model.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.