Application of Convolutional Neural Network Based on Deep Learning in College English Translation Teaching Management

Copyright


Introduction
English classroom teaching is one of the tasks that must be broken through. In today's international nancial and commercial exchanges, the requirements for comprehensive English application ability are constantly increasing. Among them, college English translation ability plays a key role. In recent years, the demand for English translation talents has been continuously increasing, and the problem of translation education has gradually attracted the attention of the academic circles. In college English education, translation education has always been concerned, but its educational e ect is not ideal.
As a result, English teachers and scholars are concerned about how to improve students' translation abilities. is study suggests that English majors be taught to translate using a college English translation approach based on CNNs. However, NMT brings bene ts to the eld of MT. However, NMT only converts between natural languages using a single NN, which has a number of aws. is research uses an MT model based on CNNs to increase MT performance. e translation is determined as a compulsory course for English majors and is considered to be a very practical course. e cultivation of translation ability is one of the comprehensive language abilities. With the constant development of computer information technology, according to Yao, people's lives have been signi cantly altered. Computer-aided translation technology (CAT) is currently being used as an important supplementary tool, including attempts to adapt it to the teaching process [1]. Wang studied the creation of English translation training using computer-aided translation software in depth. With the help of corpus statistics software, he performed a complete data analysis of the translation and the original text [2]. Najjar discussed the translation of the Qur'an in Chinese and English and studies the morphological transformation of hyperbolic patterns. He sought to determine the impact of translation strategies on the quality of the translation of research data [3]. In cultural translation, Zhang discovered that people see translation as a cross-cultural communicative activity. In diverse cultural situations, translation thought will likewise shift dramatically. e translation industry places a high value on cultural differences, as evidenced by recent translation studies [4]. Fitriani's research tried to classify and describe grammatical faults in English-translated sentences in terms of syntax and morphology. e surface strategy taxonomy is used to guide the analytical approach [5]. Susini found that implicit meaning is one of the linguistic phenomena that need to be overcome in translation. His research aims to investigate the structure in which implicit meaning is realized in English [6]. Although scholars agree that English translation is very important, there is a lack of research on how to improve the effect of college English translation.
MT is the process of converting sentences from a source language to a target language using a computer. MT helps people communicate in a variety of languages. Chen proposed the Residual Encoder-Decoder CNN, which was inspired by deep learning (RED-CNN). His proposed RED-CNN model produced excellent results [7]. Kruthiventi found the discovery that understanding and predicting human visual attention mechanisms has received neuroscience attention. Its large receptive field can understand semantics at multiple scales while taking into account the global context [8]. Lu created a deep learning technique that uses graph cut refinement to partition the liver in CT scans automatically. For precise refining of the initial segmentation, it leverages graph cuts and previously learned probability maps [9]. Deep learning is widely employed in a variety of fields and has produced numerous results.
In recent years, scholars have begun to consider from a new perspective of deep learning regarding many of the main modules retained in MT tasks. Many studies have shown that deep learning can effectively solve various bottlenecks of previous MT methods. As a result, NMT received attention immediately after it was proposed, greatly improving its performance in just a few years. e innovation of this paper is to analyze how the CNN plays a role in MT technology, thereby improving the teaching quality of college English translation. Experimental analyses of deep learning-based revertive networks and CNNs are carried out.

CNN MT Technology Based on Deep Learning
e ultimate goal of language learning is communication, and translation is one of the most important forms of crosscultural communication for foreign language students. As a basic skill for comprehensive language use, translation ability is very necessary for English majors. Because the future employment of college students will require them to work in various fields, learn the latest achievements of other countries in various fields, or show China to the world the latest achievements in many fields, so college English translation is crucial for them.
MT is one of the best ideas of scientists and all human beings in the early days of computers. It has generated new vitality under the urgent needs of world economic development and cultural exchange [10]. MT has gone through a tortuous development path and has made great progress. e current NN MT technology includes revertive NN, recurrent NN, feedforward NN, and CNN. However, from the current situation, there are still many problems in MT that need to be solved as soon as possible [11]. e structure of NN MT is shown in Figure 1.
Deep learning is a relatively recent research area in machine learning. It was added to machine learning to help it achieve its original goal of artificial intelligence. Figure 1 shows how deep learning has advanced rapidly in the disciplines of image processing and speech recognition since 2013. Deep learning has recently been discovered to be more effective at reducing the problem of linear inseparability, lack of correct semantic representation, and functional design. It can help statistical MT overcome the challenges of fully utilizing nonlocal context, data sparseness, and error propagation. MT is a popular topic in academia right now [12].

Algorithm Based on Revertive NN.
In the traditional feedforward NN model, the adjacent layers are fully connected, while the nodes in each layer are disconnected. A revertive NN is a network that contains both feedforward and feedback pathways. Its feedforward pathway is similar to the traditional feedforward NN model. e feedback pathway can reflect the output of some neurons to itself at a later time as the input of a new time [13]. e schematic diagram of the revertive NN is shown in Figure 2.
As demonstrated in Figure 2, feed forward NNs are the simplest type of artificial NN. According to the different layers of feed forward NN, it can be divided into single-layer feed forward NN and multilayer feed forward NN. e revertive NN, unlike the standard feed forward NN, pays attention to the timing input. It adds a cyclic circle to the hidden layer, and this cyclic circle signifies that the hidden layer's output from the previous moment is used as the hidden layer's current input. e calculation formula of the revertive NN h t is as follows: is way of connecting the revertive NN has obvious advantages for the sequence input U h . e network can better capture timing information and obtain context information. erefore, the reply NN is especially suitable for the fields of speech, text, and video processing that focus on timing. e field of speech includes speech recognition and language conversion. e text domain includes text classification and MT. e video domain includes video recognition and video classification [14].
It is precise because of the special connection method of the revertive NN that the revertive NN has a serious problem; that is, it cannot solve the long-term dependence.
is is due to the continuous multiplication of gradients in the revertive NN when the error is backpropagated. erefore, when the gradient is less than 1, the gradient is prone to disappear, and when the gradient is greater than 1, it is prone to gradient explosion [15]. e Long-Short Term Memory (LSTM) was created to overcome the revertive NN's long-term reliance problem, and the network was enhanced over the classic revertive NN. In science and technology, LSTMs have a wide range of applications. And, more are all tasks that LSTM-based systems may learn. Figure 3 depicts the LSTM network. e LSTM network differs from a standard revertive NN in that the original revertive NN unit is changed into a CEC memory unit, as shown in Figure 3.
To tackle the problem of gradient dispersion, the CEC memory unit's summation technique allows the gradient to be kept while the mistake is conveyed. In the LSTM network, three gates are introduced, whereas the forget gate determines whether or not the input should be forgotten [16,17]. e forget gate's job is to pick which parts of a long-term memory (the output from the previous unit module) to keep and which to discard. e input gate i t is as follows: e forget gate f t is as follows: e output gate o t is as follows: Here, i is the input gate, f is the forgetting gate, and o is the output gate. After the calculation of the three gates is completed, the memory unit is updated. After the widespread use of LSTM networks, many variants of traditional revertive NNs have followed, including GRU networks [18].
GRU is a well-performing LSTM network variation. It has a simpler structure than an LSTM network and produces excellent results. e GRU network is different from the LSTM network which contains three gates. ere are two gates in it: an update gate and a reset gate. e update gate determines the percentage of input and memory in the hidden layer's output. ere are two gates in it: an update gate and a reset gate. e update gate determines the percentage of input and memory in the hidden layer's output [19].
It consists of two gates: an update gate and a reset gate. e percentage of input and memory in the hidden layer's output is determined by the update gate. e update gate and reset gate are calculated by  Mobile Information Systems ere are many defects such as missing translation, mistranslation, overtranslation, and so on, which lead to the deterioration of translation results [20].

MT Based on CNNs.
CNNs are a type of feedforward neural network (NN) with deep structures and convolutional computations. It is one of the most representative deep learning algorithms.
en, a convolutional neuralbased MT model is proposed for the long sentence translation defect as shown in Figure 4.
As shown in Figure 4, CNNs (CNN) are similar to regular NNs in that they are made up of neurons that have trainable and learnt weights and bias constants. CNN models are usually built on feedforward NN models. e difference is: the "hidden layers" of traditional NNs are replaced by "convolutional layers," "pooling layers," and "completely linked layers." is unique structure enables CNNs to perform exceptionally well as shown in Figure 5.
As shown in Figure 5, the activation layer is followed by the convolution layer, which is similar to the role of the activation function in the NN. It generally uses the ReLu activation function and can also be other activation functions such as Sigmoid.
e calculation formula of the neuron is as follows: Among them, θ is the parameter of the convolution kernel.
Some researchers proposed a new model of NMT in 2016 and successfully applied CNNs (CNN) to MT. e Encoder-Decoder model based on the reply NN is similar in concept.
A CNN of the model is responsible for the Encoder encoding to convert the source sentence into an intermediate vector. Another CNN is responsible for Decoder decoding, decoding the intermediate vector to generate the target sentence. e high-level CNN abstractly extracts longdistance information, realizes NMT, and has achieved outstanding achievements, which has brought significant development to the field of MT by CNNs.

Encoder Coding Based on CNN.
e translation is divided into two parts: understanding and expression. Encoder encoding is responsible for understanding, and Decoder decoding is responsible for expression. e input layer represents the input to the model. e input of the Encoder model is the sequence number of the word corresponding to the vocabulary after the segmentation and generalization of the source sentence as shown in Figure 6.
As shown in Figure 6, the vocabulary is constructed based on the training data set. e embedding layer refers to the process of representing words as vectors. After the embedding layer, an input sequence number is correspondingly converted into a vector of a specified dimension. Embedding layer e e,t is calculated as follows: e embedding layer uses random initialization to assign the initial value a t to the word vector. After training, the required word vector e e is obtained to realize the representation of words with vectors.   e proposal of the hidden layer is proposed along with the concept of a multilevel network, which mainly solves a linear inseparable problem. e bidirectional hidden layer actually contains two layers of network, one is a forward LSTM network and the other is a reverse LSTM network. e bidirectional LSTM network aims to capture the input sentence information more comprehensively from two different perspectives of positive order and reverse order, so as to achieve a better understanding of the input sentence.
To aid in generating the output of the current moment, the current moment calculation is added to the output of the LSTM network at the previous instant. e network captures the above-given information h f,t in positive sequence, such as the following formula: e network captures the above information in reverse order, as in the following formula: e output e e,t of the forward LSTM network is fused with the output h b,t+1 of the reverse LSTM network to obtain the output of the bidirectional hidden layer. It captures the context information and obtains the final vector representation, which is calculated as follows: is h f,t vector will serve as the input to the next hidden layer. f takes the concatenation function.
In the discipline of natural language processing, deep networks attempt to extract abstract high-level features of language in order to gain a deeper comprehension of language. e network is calculated using the following formula: In practical applications, it can be considered to continue to increase the number of hidden layers h s,t−1 to obtain higher-level features according to requirements.

Decoder Decoding Based on CNN.
e Encoder is pretrained through a NN to determine the initial value of W. Its goal is to make the input value equal to the output value. After Encoder encoding, the sequence of the output of the hidden layer corresponding to the input of the source sentence at each moment is obtained. It uses the attention mechanism network to compute contextual information. After Encoder encoding, the kernel state of the hidden layer is obtained.
e Decoder part of the model is shown in Figure 7. Figure 7 shows that the input of the decoder model is different in the training learning phase and the testing application phase. In the process of model testing, because the correct target sentence is unknown, the input of the model at the first moment is still a special word, which represents the beginning of the decoding process. e input at each other moment is the final output of the network at the previous moment.
To a certain extent, it is used for dimensionality reduction. e principle of dimensionality reduction is matrix multiplication. In convolutional networks, it can be understood as a special fully connected layer operation. is is the role of the embedding layer. e embedding layer refers to the process of representing words with vectors, which is the same as the embedding layer of the Encoder network and corresponds to different vocabulary. After the embedding layer, an input sequence number is correspondingly converted into a vector of a specified dimension. e calculation of the embedding layer e d,t is shown in the following formula: In the hidden layer, the forward LSTM network is employed, and the number of layers can be modified depending on the situation. Two layers of LSTM networks are hidden in the hidden layer of the MT model's block principle-based Decoder network. e LSTM network's kernel is launched by the Encoder network's second hidden layer's kernel.
Attention models are commonly employed in natural language processing, image identification, and audio recognition, among other deep learning applications. It is an important technology that ought to be studied in depth. e attention mechanism network is found in the Decoder network's hidden layer. e output of the hidden layer at the previous moment of the Decoder network serves as the basis for the attention mechanism. It constantly mixes the output of the Encoder network to calculate and provide context information appropriate for the present state of the Decoder network.
For the calculation of the current network context information, Mobile Information Systems And, all e tj are normalized to get the contribution of the output of the Encoder network at each moment to the calculation of the hidden layer of the Decoder network at the current moment. According to the contribution calculation, the context information c t corresponding to the current moment is obtained. e output of the embedded layer at the current moment, the network output is calculated as follows: Each layer of the multilayer LSTM network is roughly the same, the network structure is the same, and the difference lies in the specific input s f and output s f,t−1 . e output of the final hidden layer is as follows: e normalized exponential function is commonly known as the softmax function. In multiclassification, it is a generalization of the binary classification function sigmoid. e goal is to present the findings of multiclassification as probabilities. e output layer is a completely linked layer with an output dimension equal to the target language vocabulary's dimension vocabulary. e final output is activated by the Softmax function, as shown in the following formula: After Encoder encoding and Decoder decoding, backpropagation updates the network parameters, fits the training data, and captures MT features.

Cross Entropy Loss.
In classification issues, the crossentropy loss function is frequently utilized. Cross-entropy is frequently utilized as a loss function in NN classification issues.
In classification issues, the cross-entropy loss function is frequently utilized. Cross-entropy is frequently utilized as a loss function in NN classification issues. In machine learning and deep learning, entropy is a measure of the uncertainty of random variables. Assume an is a discrete random variable with a finite number of potential values, and the following is its probability distribution: en, the entropy of random variable a is calculated as follows: Here, the base of log is 2 or the natural logarithm, and if p i is 0, the value of log0 is defined as 0. e distance between two random distributions is measured by cross-entropy.
Assuming that random variable A has a value set of U and that its two distributions are p and q, the cross entropy of the two random distributions is determined as follows: In deep learning, cross-entropy is frequently used to calculate the similarity between the network output and the ground truth. It calculates the cross-entropy loss to determine the current error and adjusts the parameters accordingly.

BLEU Score. A variety of automatic evaluation standards for translation technologies have been proposed.
At present, the widely used and recognized evaluation standard is to use the BLEU algorithm for scoring and discrimination.
e BLEU algorithm is calculated as follows: where N represents the number of N-gram models, and w n represents the weight of the corresponding N-gram model, usually 1/N. p n in the formula represents the matching accuracy of its corresponding model. e BP is shown in the following formula: e length of the to-be-evaluated translation is C, and the length of the reference translation is r. e length penalty factor BP is connected to the relationship between c and r and is a piecewise function.
Since any n-gram model does not match, the BLEU value in this case is 0, which is meaningless. erefore, the BLEU algorithm is not suitable for measuring the translation of a single sentence but is suitable for evaluating the translation of many sentences.

Comparative Experiment of CNN and Other English
Translation Models. e experiment employs Ubun-tu14.04LTS as the experimental environment and LuaJIT for NN initialization and NMT model development. e NMT model, which is based on a CNN, is then tweaked and built accordingly. is section mostly compares and examines the outcomes of similar investigations. e application data come from the machine text translation public dataset of the Global AI Challenge. e original dataset contains a total of 10 million Chinese-English parallel sentence pairs. It limits the source and target language sentences in the training data to no more than 60 words. e two MT models were trained under different sentence lengths, and the efficiency of the two translation models on the data set was compared to ensure that the parameters during the training process remained unchanged. e comparison experiment of the two models is shown in Figure 8.
As illustrated in Figure 8, MT technology based on the reply NN performs well in short sentence translation. However, the translation quality results in extended sentences that are unsatisfactory. MT technology based on revertive NNs has a severe difficulty with long sentence translation capacity. It has a significant impact on MT quality and is an area that has to be improved.
According to the findings of the experiments, the CNN described in this study can partially solve the problem that NMT is sensitive to sentence length. Simultaneously, the multisequence coding method incorporates lexical and syntactic information into the NN to further guide the generation of translations by using the CNN to encode related sequences other than the source language sentences in parallel. us, the translation performance is improved to a certain extent.
In the experiment, the revertive NN and the CNN-based translation model are compared, and the experimental data are used to identify the model's relevant benefits and characteristics. e following is shown in Table 1. Table 1 shows the following: the calculation of the output at the present moment is dependent on the output at the previous moment due to the revertive NN's timing mechanism. Context information stacking in long-term sequences is prone to information loss and information disturbance, resulting in low quality translation results. e statements are dynamically segmented according to the specific conditions of each source statement. Compared with the source sentence, the sentence block successfully removes redundant information interference, and the sentence length is shortened. e network can better understand the meaning that the source sentence wants to express and improve the effect of MT. e identical English texts are utilized as the test set when the experimental model is created. Corresponding model tests were carried out using different experimental models. It analyzes the quality and performance of the model by comparing the different translations generated by the model. In this paper, the quality of sentences of different length levels is analyzed, and the semantic extraction and semantic expression as well as the fluency of sentences are compared, as shown in Figure 9.
As shown in Figure 9, from the translation effect of the two translation models, the English translation model based on CNN shows better translation quality than other models in terms of semantic extraction and semantic expression of sentences. Moreover, the contextual relationship of related words in the sentence and the choice of translation also show certain advantages. It greatly guarantees the fluency of sentences while delivering the semantics correctly, illustrating the ability of LSTM for deep semantic encoding and long-term memory.

Comparison Experiment of MT Technology.
is study reveals the translation results of MT models using BLEU scores.

Mobile Information Systems
Overfitting can be avoided by using dropout (neuron dropout probability). It is the most appropriate dropout value based on the existing data and application context. e model cross-entropy loss and the BLEU score of the training dataset in the first 5 iterations were lower than when the dropout was set to 0.2, 0.5, and 0.7, as shown in Tables 2 and  3.
Dropout can help avoid overfitting to a certain extent, as shown in Tables 2 and 3. Since the number of hidden layers is set to two, the number of model layers is reduced. At the same time, it sets dropout to 0.2 based on comparing experimental results, model training time, and overfitting risk. It limits overfitting to a certain level. Table 4 shows the experimental findings of the MTmodel comparison in this paper: Table 4 shows the cross-entropy loss of each translation model on the training dataset, although the fit is poor. e BLEU scores on the training dataset were similar when each model stopped training. e MT model based on the CNN provides a better translation effect than the regression NN model. e experimental findings objectively indicate the efficiency of the MT model based on CNNs.

Effect of CNN in College English Translation Teaching Management.
is paper conducts experiments on two English major classes A and B. ere are 40 students in each class, the time period is half a semester, and the experimental comparison results are midterm exam results. Class A is taught by traditional manual translation, and class B is taught by MT teaching management mode based on CNN.
is paper investigates the students' preference for the teaching management of MT and traditional manual translation teaching management by two classes, as shown in Tables 5 and 6.
As shown in Tables 5 and 6, for the teaching management mode of MT, there are 35 students who like it very much, accounting for 43.75%. Only 3 students expressed    dislike, accounting for 6.0%. But for the teaching management mode of traditional teacher translation, only 10 students expressed that they like it very much, accounting for 12.50%. 14 students expressed that they disliked it very much, accounting for 17.50%. It can be seen that the teaching management of traditional teacher translation is not popular. is article compares the scores of the two classes after one semester, as shown in Figure 10.
As shown in Figure 10, at the beginning of the semester, the average grades of Class A and Class B were compared, and it was found that the students in both classes were at the same level of translation. However, after a semester of study, their translation skills have improved a lot, especially in English. e application of different teaching methods is the reason for this difference. erefore, it can be concluded that through the application of this method in teaching, students' translation level has been fully improved.

Conclusions
With the increasing demand for professional knowledge of English translation in society, college English translation education is becoming more important at major schools and universities.
e usage of neural networks in English translation systems has been discovered to improve the quality and accuracy of the English translation. Traditional statistical machine translation, on the other hand, can no   longer match the needs of modern civilization. rough these analyses, CNNs have been found to help improve the accuracy of MT. e experimental comparison between CNN and the revertive MT model is carried out in the experimental section. e translation ability of CNN is found to be stronger when the sentence lengths are different. Finally, the translation model based on CNNs is carried out for the English translation class, and the teaching comparison is carried out. Finally, it is found that the translation model based on CNNs has good teaching quality. Due to the author's lack of ability, there are still some flaws in many aspects, and the author strives to do better in the next work.

Data Availability
e data used to support the findings of this study are available from the author upon request.

Conflicts of Interest
e author declares that there are no conflicts of interest.