A Text Detection and Recognition Algorithm for English Teaching Based on Deep Learning

Traditional English teaching cannot make eﬀective use of various resources, and the scheduling ability is poor. People cannot accurately obtain the information in the English textbook text in the learning process, resulting in some people who cannot better learn and master the English language. For this problem, this study adopts deep learning algorithm and establishes an English teaching text algorithm based on association semantic rules to mine the features between sentences and phrases in the text provided by English teachers. The proposed algorithm completes the feature extraction of the English teaching text and also analyzes the association analysis between semantics in English teaching text. In fact, its essence is to get English teaching association rules on the basis of information theory. By combining with semantic similarity information, English teaching text can be accurately detected and identiﬁed. The simulation results show that the proposed algorithm can accurately extract English teaching text information, and the accuracy and convergence speed during extraction are higher than other competing algorithms.


Introduction
At present, the rapid development of the network and computer technologies promotes the emergence of all kinds of network media, and the number of English teaching materials continues to increase. However, English texts appear in all kinds of news, media, and English teaching materials, making it increasingly difficult for people to read. People are eager to extract the text information of English teaching materials, understand the English content, and comprehensively improve the quality of English teaching materials in terms of reading and writing both. Based on the artificial intelligence and deep learning environments, this study uses the text detection algorithm based on PSE and the semantic rule mining algorithm to extract the semantic rule information of the English teaching materials. is can help to deeply excavate the keyword features in the English teaching materials and to combine the semantic analysis method to extract the text information and data mining. ese text detection algorithms can help the readers to accurately identify the text information of the English teaching materials and play a guiding role in the English teaching. is is conducive to the overall improvement of English teaching quality. Machine and deep learning techniques, including neural networks, are largely used in the prediction scenarios for real-world problems. is study detects and identifies the text information in English textbooks, realizes the sharing of English textbook text resources, and can also effectively improve the quality of English teaching. Its innovations are as follows. (1) A large number of text data in abstract English teaching can be deeply extracted corresponding to the information on the textbook text, so as to improve the quality and efficiency of teachers' English teaching to a certain extent. Subsequently, English learners can accurately grasp the feature distribution in English teaching texts and better manage and classify English teaching texts. (2) By detecting and identifying English teaching texts, we can better optimize the teaching quality and improve people's speed of reading English teaching texts. is study makes full use of the concept of deep learning to extract English teaching text materials and formulates an English teaching text algorithm based on association semantic rules to mine the features between different sentences and phrases in the English teaching material text, so as to better extract the keyword semantic association data in the English teaching text. e following are the main contributions of the research conducted in this study: 1 We formulate an English teaching text algorithm based on the association semantic rules 2 We adopt a deep learning-based algorithm that establishes an English teaching text algorithm based on the association semantic rules to mine the features between sentences and phrases in the English language 3 rough combining with semantic similarity information, English teaching text can be accurately detected and identified e rest of the study is organized as follows. In Section 2, we offer an overview of the related work. Section 3 is about the machine learning, in particular, deep neural networks, i.e., ANN and CNN. In Section 4, we design a deep learningbased algorithm. In Section 5, experimental setting and results are discussed. Moreover, experimental details along with performance evaluation metrics are also presented in this section. Finally, Section 6 concludes this study and illustrates several directions for further research and investigation.

Related Work
With the advent of the era of artificial intelligence, a major branch of the field of artificial intelligence is computer vision [1]. At the beginning of the twentieth century, a large number of scholars and experts focused on text detection. OCR was first used to detect text information, but the effect cannot meet the requirements. Since the rapid development of deep learning, more scholars use convolutional neural network (CNN) for text detection and study and explore the field of text detection through deep learning algorithm [2,3]. From 2016 to 2017, a variety of different algorithms were proposed. When formulating the target detection scheme, the full convolution network (FCN) was used to strengthen the innovation, which greatly improved the level and ability of text detection [4].
In addition, some scholars introduce the end-to-end concept when training the model, so as to comprehensively improve the effect of text detection and recognition [5]. Ge et al. proposed CRNN algorithm. is algorithm makes full use of the concept of indefinite length speech sequence in speech recognition.
is method is similar to the speech recognition. e text recognition model is established to solve the problem of phrase recognition or word recognition [6]. Xing et al. trained the background/text classifier of support vector machine (SVM), selected the original pixels as local features, classified all pixels, extracted the text information on the text confidence graph, and extracted the text region [7]. is all happens based on the confidence graph. Derendarz et al. suggested a symmetric text detector based on text symmetric features. Firstly, the symmetric text detector is used to extract the multiscale sliding window text saliency map in the input image. en, the saliency map is fused according to the manual features, and finally, the convolutional neural network (CNN) classifier is used to delete the false detection information to complete the text detection [8]. Zhou et al. proposed a model in order to generate word level candidates by using edge box and the trained aggregation channel feature (ACF) detector and then trained the random forest two classifier based on the gradient histogram (HOG) feature to remove the false detection data and complete the text location [9].
Dewi et al. used the network similar to Yolo to extract the word level candidate box and filtered the text based on CNN classifier [10]. Jiang et al. transformed text detection into image segmentation based on the full convolution network (FCN) model. Furthermore, the authors segmented the region on the image on the text and then extracted the final text region by using local character features and global features [11]. AA et al. proposed r2am algorithm, which first uses CNN to extract the features on the input image, then inputs the features to the RNN with character level to complete the decoding operation, and then inputs the label. is algorithm uses the character attention mechanism to obtain the characteristic data, and it can recognize the text without a dictionary [12]. Guo et al. put forward the focus attention network (fan), which adds local supervision information to the attention module and can make the actual tag sequence and attention features align with each other. e proposed attention network can effectively deal with the problem of low attention accuracy on the attention module [13].

Artificial Neural Network (ANN).
e principle of artificial neural network is similar to the information processing process of the human brain, which is formed by a large number of artificial neurons connected with each other. Artificial neuron is the basic unit for processing information in artificial neural network. Figure 1 shows the basic processing flow of input signals to artificial neurons. e output of an artificial neuron to the input signal a � [A1, A2, LAN] T is t � f (U + b), and u is calculated by the following formula: where f represents the excitation function, ω i represents synaptic weights on different neurons, A i represents different components on input signal a, and b represents neuron bias parameters. In general, the excitation function is a nonlinear function, such as Tanh function and Sigmoid function.
Training neural network means that the neural network readjusts the parameters under the interference of the external environment and adopts other methods to deal with the external environment after conversion. In the process of neural network prediction and classification, a supervised learning mode will be used, that is, a certain sample data will be selected for learning, and different variables on the network will be changed to make the results closer to the real value.

Convolutional Neural Network (CNN).
e working mode of convolutional neural network is similar to feedforward neural network, which is similar to the structure of traditional neural network [14]. ree-dimensional neurons can be used to reduce the parameters on the neural network, which greatly enhances the operation efficiency of the network. e components of convolutional neural network include the hidden layer, the input layer, and the output layer. e hidden layer includes the pool layer, the convolution layer, and the excitation function layer.

e Convolution Layer.
A certain number of convolution kernel parameters together form a complete convolution, and the corresponding convolution kernel is determined according to the convolution bias term and weight vector. Various characteristic graphs are generated by the convolution kernel, and the number of convolution layers has a decisive impact on the number of convolution cores. e receptive field corresponding to the convolution kernel represents all the information involved in the convolution. e large receptive field indicates that the higher the amount of contextual semantic information available from the network. At the same time, the information on the text image corresponds to the information collected on the convolution kernel. Figure 2 is the convolution layer diagram.

e Pool Layer.
e basic principle of pooling is to simplify the amount of information and variables and speed up the operation of the network. e maximum pooling used in this study is to divide the input image into several different rectangular areas and use the ruler fixed scale filter to complete the maximum value calculation, so as to obtain the maximum value on each subarea. Figure 3 shows the maximum pooling operation process, and the connecting line in the figure represents the maximum pooling operation subarea.

Excitation Function Layer.
ere are many different types of neurons in a complete network. After continuous development, a large number of excitation functions [15] are generated. Now, the more widely used function types include tanh function, sigmoid function [16], upgraded prelu function, and relu function. In the past, tanh function and sigmoid function were selected as the excitation functions on convolution neural network, However, the gradient disappears between these two functions. At present, relu function is selected for in-depth training. Figure 4 shows relu function.
In the process of image processing by neural network, the image is usually multichannel. Each layer on the neural network is composed of multiple convolution cores, and the multichannel feature image is output. ere are also many different channels in the convolution core. e corresponding channels are convoluted by the multichannel convolution core, and then, the pixel values on the corresponding pixel points are accumulated. After the convolution operation, a large number of characteristic images are output. erefore, the number of channels of n-layer convolution core is equal to that of the input characteristic image.

Text Detection Algorithm Based on PSE.
is study adopts deep learning algorithm in the process of English teaching text detection and recognition to achieve the goal of Integrated English teaching text detection and recognition [17]. e text detection module is completed on the PSE detection framework, and the text recognition module is implemented on the CRNN recognition framework. Input

Scientific
Programming the English teaching text information into the detection framework, use the PSE text framework to detect the input text information to obtain the corresponding text box, and then input the text box with English teaching text information into the CRNN recognition box. Further identify the English teaching text content in the text box, and finally input the result information after word segmentation. Figure 5 shows the structure of English teaching text detection and recognition.
At present, the most common problems in the field of text detection include missed detection and false detection.
is study focuses on the extraction and fusion of text feature information in English teaching. Loss function and network structure have a decisive impact on the quality of model effect, which is the core of in-depth learning. is study takes the accurate acquisition of row position information as the initial point, improves the network framework based on FPN feature extraction and feature fusion on the basis of considering the network structure, then introduces the empty space pyramid module based on ASPP, compares the experimental results obtained after the introduction of this module, and judges the feature extraction effect of this module. en, start with the loss function, analyze and use the Dice coefficient loss function to compare the performance of different models under various balance coefficients, and select the best value in each model, so as to improve the accuracy of model detection results.

Extracting Features of the English Semantic Rules.
is study designs an English teaching text extraction algorithm and proposes an English teaching text extraction algorithm based on the semantic rule mining. Based on this algorithm, the features between sentences and phrases in English teaching text are deeply mined, so as to achieve the purpose of relevance analysis of keyword semantics in the English teaching text. In order to complete the feature extraction, which can greatly improve the quality of the English teaching, a Boolean weight method is used. Here, the features are extracted based on the Boolean weighting method, that is, the later weight value of the word in a document is 1; on the contrary, the weight value is 0. e following is the basic formula:   e above formula f ij represents the frequency of word i in document d j and w ij represents the weighted vocabulary results. e weight level indicates the frequency of the text data which can also directly reflect the importance of the text data. e following formulas are the weight calculation method based on the information theory; the text summary extraction algorithm is based on the entropy weight: During the extraction and recognition of English teaching texts, the number processing calculation should be completed in a standardized way. Based on the classification and measurement of document keywords on the premise of standardized processing, the importance of all keywords can be explained to achieve the purpose of identifying keywords. Based on information theory, this study effectively distinguishes semantic rules and association rules in English teaching texts and then obtains the final test mining results, which are expressed by the following formula: e maximum semantic correlation similarity can be obtained by characterizing the correlation between the maximum correlation coefficient and the number of texts by the following formula: where k represents the number of semantic categories and L ac represents the extraction feature of semantic information related features, that is, the keyword entropy of each text; 1 is the standard constant coefficient: where coff const represents the fixed coefficient distribution of IDF const and coff1 represents the fixed coefficient of semantic feature IDF 1 .

Accurate Extraction of English Teaching Text
Information. Based on the actual distribution results of probability density, the extraction rates of different phase points in the corresponding region can be obtained. e following is the calculation formula: e concept of related overview function is a set, which is the possibility of various proportions relative to the total point. Its formula is as follows: e vector maximum difference of the similarity between two different English teaching texts can be obtained by expressing the similarity degree between two different texts by norm. e following is the basic formula: e logarithmic distribution of the spatial feature distribution of semantic relevance is obtained, and the corresponding English textbook text results are extracted based on the concept of relevance semantics as follows: e following formula is the confidence space corresponding to the distribution effectiveness of characteristic points of time-frequency distribution of English teaching text: c i�1 μ ik � 1, k � 1, 2, . . . , n. (11) erefore, finally, the clustering function of English teaching text information is optimized to complete the extraction of English teaching text information based on the location of semantic feature points of English teaching text summary.

Constructing Semantic Ontology Model for English Teaching Text.
is study uses feature-oriented extraction and semantic ontology model to extract information from English teaching text database, effectively optimize the classification of English teaching text articles, strengthen teachers' class scheduling ability in the process of English teaching, and improve students' level of reading English textbook text information. Here, we need to establish a semantic ontology distribution structure model based on English textbook text analysis. e following are the English textbook text distribution nodes: ; these two distribution nodes are undirected graph models, which hide the weight index of English textbook text distribution. e a i attribute value is A � a 1 , a 2 , . . . , a n represents the confidence interval corresponding to the English teaching text.
rough the integration of feature extraction module and hidden information transmission Scientific Programming channel, query interface and query vector set, and English teaching text query information output module, the distributed storage of English teaching text is established, and the English teaching text model is established based on ternary components. e formula is as follows: Based on the English teaching text ontology model, the Wigner Ville spatial distribution of the English text summary database is established [18]. e control library can obtain the information flow characteristics under various sampling intervals and reasonably divide the English teaching text set x into c types. e semantic ontology vector in the English teaching text can be divided into M sets in different directions. After preprocessing the English teaching text and screening the English teaching text information, we can obtain the corresponding English teaching text extraction structure and complete the English teaching text algorithm design. At this time, we need to establish the English teaching text information flow model, modulate and demodulate the English teaching text classification information based on the English teaching text information flow, and complete the design of anti-interference filter, and accurately extract English teaching texts and classification information.

A Text Information Process Model for English Teaching.
Based on the above saved English teaching text scheduling model, establish the English teaching text information flow model [19], extract the information and teaching text by using the information processing method, and establish a semantic feature base to explain the semantic information of English teaching text. e semantic feature base has n vector attribute sets, which can represent that the text module is a more complex envelope mode, which is calculated by the following formula: where f(t) represents the time-varying nonstationary parameters of different frequency components, b(τ, ∅) represents the extension function of English teaching text, and t indicates the time delay when extracting English teaching text. e following is the form of window function: where n (z) is the frequency resolution of text semantic feature distribution, which is based on the ontology model of English teaching text, inz � e ±jω0 . e upper zero resolution is the lowest in the frequency domain, and D (z) represents the scale factor. e English teaching text information stored in the semantic feature ontology model belongs to the scalar time series, which is represented by the following: Fusion of English teaching text information based on Fourier transform. e following is the Fourier transform process of extracting English teaching text information flow: where ρ(a, (b)) represents the time-frequency combination, b represents the Fourier spectrum window, and a represents the scale parameter. ere will be two Fourier spectrum windows in the English teaching text, which are controlled by feedforward and the edge state function of the semantic feature distribution of the English teaching text, which is calculated according to the following formula: where u i ∈ R m represents the state space distribution function and x i ∈ R n represents the state vector distribution of the English teaching text. Based on this, nonlinear feature sequence analysis and semantic association feature extraction are used to extract and mine English teaching text information.

Experimental Environment.
In this study, the performance of the algorithm in the process of English teaching text extraction is simulated. e hardware equipment of the experimental environment is i5 2.5 GHz Intel processor, 18 GB memory, and windows 7 64 bit operating system. e simulation software used in this algorithm is MATLAB 2016b. e basic parameters of equipment are Gmax � 30, D � 12, c � 3, NP � 30, F � 0.5, CR � 0.1, and m � 2. Set the initial English teaching text sampling frequency f 1 as 2.2 Hz and the termination sampling frequency f 2 as 0.24 hz; ω, the adaptive weight coefficient, is 0.56.

Results.
According to the simulation parameters, as discussed in Section 5.1, we compared the proposed algorithm with the traditional FCM algorithm [20]. Our outcomes show the extraction accuracy of the English teaching text in Figure 6. According to the analysis results shown in Figure 6, the accuracy of the English teaching text extracted by the method adopted in this study is higher than that of the FCM algorithm. Further in-depth analysis shows that the accuracy of response continues to improve after gradually increasing the number of iterations. We observed that the accuracy of this algorithm can reach 100% after 35 iterations, but the FCM algorithm needs about 50 iterations. By analyzing the data in the figure above, the accuracy of this method is about 78% and that of FCM algorithm is only 57% in 10 iterations. In addition, after continuously increasing the number of iterations, the accuracy of the algorithm adopted in this study also keeps rising steadily, which is more stable than FCM algorithm. e test index selected here is the speed of extracting English teaching text. e simulation results are shown in Figure 7. According to the results shown in Figure 7, this algorithm can complete the convergence in a short ten years and has strong real-time performance.
rough the above analysis, it is concluded that the English teaching text recognition and detection algorithm based on deep learning proposed in this study can extract the information in the English teaching text very accurately. e information in the extraction stage has high accuracy and strong convergence, which can realize the rapid detection and recognition of English teaching text and improve the comprehensive application of English teaching text. It can also be an important way for English teachers to improve teaching texts.

Conclusions and Future Work
In recent years, the text detection method based on deep learning has developed rapidly. Compared with the traditional text detection methods, the advantage of deep learning algorithm is to abandon the manual feature extraction method. However, this involves the process to learn and complete the feature extraction for a large number of sample data, so as to obtain a large number of feature information with high accuracy. In this study, we adopted a deep learning algorithm and established an English teaching text algorithm based on association semantic rules to mine the features between sentences and phrases in the text provided by the English teachers.
rough experiments, we evaluated its detection speed and accuracy which were noted much higher than those of the traditional algorithms. e proposed algorithm can also be used for image detection in order to identify future trends in the field of image processing and computer vision.
In the future, we will develop more sophisticated algorithms for text detection which can result in higher accuracies. Similarly, deep learning methods are quite time consuming and the model training may take significant time.
erefore, methods such as aggregation can be used to improve the performance of the algorithm in terms of training and prediction time. Apart from CNN, other methods such as graph convolutional network (GCN), LSTM, ResNet, and attention networks can also be explored in the near future. e results demonstrated in this work are based on the Relu function; however, other functions would have different outcomes. erefore, we will investigate various functions. e limitations of the network depth, kernel size, and filter number study can also be a good option for further improvement of this research. [21-28].

Data Availability
e data used to support the findings of the study can be obtained from the corresponding author upon request.