English Language Learning Pattern Matching Based on Distributed Reinforcement Learning

e rapid development of a new generation of information technology, the promotion of network technology, and the emergence of complex and diverse requirements for control objects make the structure of language learning models more and more distributed. Distributed learning theory emphasizes the central position of learners in the learning process and the universality of learning scenes. is paper explores the signicance and value of various learning modes to improve students’ learning eect. By analyzing the research data and explaining various eective language learning models, this paper aims to establish a theoretical framework of English language learning models and explore more eective language model matching schemes. is paper analyzes the adaptive multiagent, reward function, Markov model, probability function model, etc. and conducts experiments on the basis of the designed model. e linear correlation parameters of the model and the English language pattern matching eciency are analyzed and judged on several important indicators. Because the algorithm designed in this paper has a good eect on the control of error, the error reduction rate has reached 85.6%.


Introduction
Cooperation is an important trend in the development of education in the future, and collaborative learning has gradually become the main way for educators to carry out teaching and learning activities. In this context, it has quickly attracted widespread attention from experts and scholars in various elds [1]. In the cultivation of the four basic abilities of English listening, speaking, reading, and writing, the e ect of English language teaching has always lagged behind that of other subjects [2]. From the perspective of the teaching process, teaching is dominated by the language communication between teachers and students in a suitable educational environment, so the way and frequency of interaction between teachers and students play a vital role in oral English teaching [3]. With the in-depth development of technical education applications, the scale and complexity of information-based learning resources and learning systems are increasing. We regard distributed learning as a learning method that breaks the boundaries of time and space through computer networks and information technology and provides learners with rich information-based learning resources and a good network learning support environment [4]. Its technical means, design ideas, and system architecture have undergone profound changes, and the teaching system is developing in the direction of distributed, collaborative, and intelligent way [5]. On the one hand, with the help of technology, the scale of learning system is expanding day by day, and it has distinct distributed characteristics in structure; on the other hand, people hope to realize the uni ed sharing, reuse, and interoperability of these distributed learning resources and systems. e ultimate goal of English teaching is to cultivate students' comprehensive language application ability. In fact, the individual's cognition needs to be continuously developed for a long time, but learning is completed through assimilation or adaptation to external stimuli and nally reaches a state of equilibrium, then beginning a new stage of learning [6]. e core of this way is to emphasize cognitive con ict. In teaching, it is particularly important for teachers to trigger students' cognitive con ict. In language teaching, we usually measure the development level of students' language ability by their four skills of listening, speaking, reading, and writing [7]. In order to study the e ect of students' English language learning, improve learning efficiency, construct dynamic knowledge structure, and measure the role of each mode, the modeling method of distributed learning system is a set of systematic engineering methods, which analyze the requirements of distributed learning system, designs distributed learning system, and establishes the software model of distributed learning system from two aspects of model abstraction and representation [8]. e distributed learning system modeling method is based on object-oriented method, constructivist learning environment design method, and software process method.
Since entering the 21st century, with the continuous development and maturity of computer science and technology and digital media technology, new technologies such as mobile Internet, artificial intelligence, big data, and cloud computing are strongly changing the social culture and economic form at an unprecedented speed and momentum [9]. Distributed learning environment is a kind of learning environment based on distributed cognitive theory, which aims to call all technical means to provide the same learning place and communication place for geographically dispersed learners. Learning is any improvement in a system that makes the system do better or more efficiently when repeating the same work or doing similar work [10]. Reinforcement learning is a learning mechanism for matching learning through the interaction between agent and dynamic environment. It is a trial and error learning method. Trial and error and delayed reward are the two most important features of reinforcement learning [11]. Generally, the states and actions of the reinforcement learning system are considered as discrete and finite sets, and the value function can be expressed by look-up table method [12]. In practical applications, there are a large number of system states or actions that are continuous or both [13]. is paper will adopt the research method of combining qualitative research and quantitative research. rough the research, the connotation of English language learning and the research status of mobile learning are sorted, analyzed, and refined. is paper discusses the significance and value of various learning modes to improve students' learning effect. e innovative contribution of analysis and research lies in establishing the theoretical framework of English learning model and exploring more effective language pattern matching schemes. Various effective language learning models are explained based on the data level. Reinforcement learning technology and agent technology are introduced into the research of adaptive system. Based on the dynamic binding mechanism, the adaptive mechanism based on reinforcement learning is proposed, and the corresponding learning algorithm is proposed to support the learning process of adaptive agent.

Research on Distributed and Reinforcement Learning.
As a cognitive theory including cognitive subject and environment, distributed cognition advocates placing individual cognitive activities in context and social culture and emphasizes that cognitive phenomena are widely distributed within individuals, between individuals, media, learning environment, social culture, and time, that is, the analysis element system covering cognitive subject and environment and all things involved in cognition. Distributed learning is a teaching mode. It allows teachers, students, and content to be distributed in different noncentral places.
is makes teaching and learning independent of time and space. erefore, distributed learning seems to have similar characteristics or some connection with distance education. Distributed learning is not a new term to replace "distance learning." It comes from the concept of "distributed resources." erefore, distributed learning is a teaching model based on noncentral storage of learning resources. Its pancentralization reflects the independence of teaching and learning, so that learning will not be depressed in a single form of learning, showing a trend of diversification. Teaching interaction is the interaction between learners and learning environment in the process of learning. At present, the combination of reinforcement learning and other technologies is also one of the focuses of research [14]. In the single agent environment, the most common technologies combined with reinforcement learning are genetic algorithm and neural network. e reason is that genetic algorithm and neural network also have strong white adaptability, so it is easier to combine with agents that emphasize initiative [15]. e current combination of reinforcement learning and other techniques is also one of the focuses of research. In a single agent environment, the most common technology combined with reinforcement learning is genetic algorithm and neural network. e reason is that genetic algorithm and neural network also have strong white adaptability, so it is easier to combine with agents that emphasize initiative [16]. Figure 1 shows the basic model of reinforcement learning.
Because of learning algorithms, they are generally divided into three categories: unsupervised learning, supervised learning, and reinforcement learning [17]. Since unsupervised learning is usually the same as Pavlov's conditioning principle, the learning system will adjust parameters and distribution characteristics according to the data provided by it, and it is not a closed loop [18]. Supervised learning is a learning method with feedback mechanism, as shown in Figure 2. Generally, there will be error signals to express the feedback content [19]. Learning is mainly manifested in the signal provider, and there will be signals generated by the environment to evaluate the quality of the generated actions, which are usually listed as standard parameters.
In addition to the agent and the environment, a comprehensive reinforcement learning system should also have other components, such as reward and value functions and a model of the environment [20]. e value set as s t is the expectation of accumulated reward obtained during the execution of action a t and subsequent strategy π, which is generally expressed as V(s t ). en, there are r t R(s t , a t ) is the reward of t moment. en, for any strategy π, there will be a de ned value function whose expected value is Among them, r t , s t are the immediate reward and state at time t, respectively, and the decay coe cient c(c ∈ [0, 1]) makes the adjacent reward more important than the future reward. Figure 3 is a schematic diagram of an adaptive multiagent system. Figure 3 shows a high-level abstract view of adaptive multiagent. An adaptive multiagent system is generally composed of multiple adaptive agents. Each adaptive agent is relatively independent and resides in the corresponding environment [21]. Adaptive agents will interact with each other. For example, when sharing a common resource, adaptive agents will negotiate with each other.

Analysis of English Language Matching
Models. e language matching system is a typical distributed control system. Generally, it is considered that each language matching system utilizes local and neighbor information to design a coordinated control protocol, so that multiple language matching systems can obtain speci c collective behaviors, such as formation keeping and assembly [22]. Based on di erent needs and applications, learning systems are rich in types and forms, and the speci c technologies that they rely on are also di erent. For the distributed linear control system with unknown disturbance, rstly, the distributed output cooperative control protocol is designed when the system matrix is known, and then the minimization performance index is introduced to obtain the distributed optimal solution combined with the optimization method [23]. en, the output coordinated optimal control problem with unknown system matrix is considered. e interactive process of collaborative learning in cloth learning environment re ects a multidimensional interaction, which can be summarized as two aspects: the interaction between subjects and the interaction between subjects and objects [24]. e collaborative interaction skills that run through them are a key point that cannot be ignored. Consider a multiagent system consisting of N linear subsystems, each matching A i can be represented as a subsystem A i , regarded as a tracker, and described by the following dynamic equations: where x i ∈ R ni , u i ∈ R mi , y i (t) ∈ R pi represent the status, input, and output of subsystem indicates the initial state of subsystem A i . At this time, an additional autonomous system can be described, which can be expressed as e autonomous system is used to generate the external disturbance to be suppressed and the rated reference output signal A i of subsystem y ri (t) ∈ R pi , which are, respectively, expressed as follows: ere is A i when subsystem E i � 0 is not subject to disturbances from external systems. At this point, a multigroup interactive data table about distributed reinforcement learning can be obtained (Table 1). e task of agent is to learn a strategy π: S ⟶ A to maximize the return value of the action selected by agent from the environment. How to quantitatively define the learning strategy π is the primary factor that agents need to consider when learning. In addition, agents should also consider the long-term impact of choosing actions, that is, whether the actions of agents are optimal in the long run. erefore, three objective functions of reinforcement learning need to be obtained: e objective function V π (s i ) in (8) is called the converted cumulative return, and c is the discount factor. However, it is found through calculation that c in the above formula reflects the degree of importance that agent places on the future. e larger the value is, the more important the future return is. e objective function in (9) becomes a finite level return. At this time, only the cumulative return of finite steps in the future is considered. e objective function in (10) represents the average return. At this time, the average return of the whole cycle is considered [25]. For individuals, language has dual functions. On the one hand, individuals can communicate with the outside world through language. On the other hand, language can promote the development of its own internal language. erefore, language unifies the individual's social development and thinking development [26]. When the learner produces the correct words, reinforcement is carried out; when the words produced by the learner are wrong, they are corrected. Strengthening and correction are actually the purpose of feedback, but the behavioral view ignores the cognitive function of the learners themselves. Analyzing the environment of adaptive agent is an important work to describe the behavior of adaptive agent [27]. e environment change of adaptive agent is a continuous process. In order to simplify the description, this paper regards the environment as a series of discrete finite states; namely, where S represents the state set of the environment, which includes all kinds of environments. Setting this parameter is beneficial for this paper to make a reasonable and accurate judgment on the environmental changes of adaptive agent. A distributed system is to integrate some application systems with limited functions into a more powerful application system to meet the application requirements that any single application system cannot meet. Because of this modularity, the structure of the whole system is very flexible and the system has strong adaptability. It can be flexibly combined, constructed, and even automatically adjusted according to the needs of the actual system, which brings great convenience to the functional design and implementation of the system. Assuming that there are c i kinds of changes in the state component sc i , the environment in which the agent is located can be divided into c 1 c 2 . . . c m kinds of states. If different states of state components are represented by integers in [0, c i − 1], then the state set of the environment can be represented as follows: e above state components are related to specific applications. When designing a specific match, it is necessary to carry out the corresponding state components according to the definition and define the corresponding function to collect the current state of these state components to ensure the correctness of the data for subsequent calculations and source reliability. e reinforcement function defines the quantitative evaluation of agent action by environment. e design of reinforcement function is based on the specific application. Generally, the actions that have a positive impact on the learning process are given a larger return value, and the actions that have a negative impact are given a smaller return value. e adaptive agent will gradually tend to choose the actions with a larger return value in the learning process. In reinforcement learning system, the key assumption based on English language matching is that the interaction between agent and environment can be regarded as a Markov matching model. erefore, on the basis of the above research, this paper continues to analyze the problem processing method for Markov and set the time point on the time series as t. erefore, Markov's matching process can be composed of ve elements:

〈S, A(s), P a ss , R a ss , V|s, s ∈ S, a ∈ A(s)〉. (13)
When the system is in the state s at the matching time point t , the matching a is executed; the system will get the probability P a ss , when the next matching is carried out. At this time, all actions will get a matrix composed of transition probability, which can be expressed as P a ss′ Pr s t+1· s|s t s, a t a . (14) When the system is in the state s at the matching time point t, after the execution state a, the system will get timely reward R a ss , which is generally a reward function: R a ss E r t+1 |s t s, a t a, s t+1 s .
If the transition probability function P a ss and the reward function R a ss have nothing to do with the matching time t, at this time, the amount does not change with the amount at the time point, and it is in a stable state. e state at this time can be expressed as For any parameter, it can be said that the environment has Markov property, if there are Pr h 1 Pr s t+1 s, r t+1 r|s t , a t . (17) At this time, the comparison parameter tables under di erent models can be obtained, as shown in Table 2.
Many teaching aids in traditional classrooms are visualization tools, such as chalk and paper. For students, they can use visualization tools to extract information from images, understand the deeper meaning of words, and exchange ideas with each other. Taking activity theory, distributed cognitive theory, and conversation theory as the theoretical basis and taking into account the core idea of situational cognitive theory, this paper proposes a conversation activity distributed theoretical model, which is called CAD for short. At the same time, CAD applies conversation theory, activity theory, and distributed cognitive theory. e three theories support CAD from di erent angles. Conversation theory and activity level can provide the basis for the concrete interaction of CAD, and activity theory can provide support for the system design based on CAD and how to make the system more e ective. Creating a harmonious, reciprocal, and orderly group culture in a distributed English language environment is a prerequisite for the e ective occurrence and development of collaborative English language interaction. At the same time, it plays a vital role in narrowing the matching relevance between English language learners and between students and teachers, stimulating students' initiative to participate in interactive communication, and maintaining the orderly development of collaborative interaction process. Carry out rich and colorful extracurricular activities, such as actively carrying out English speech competitions and other activities. Combining the two teaching channels is an e ective way to improve the e ect. In short, in the teaching of new

Result Analysis and Discussion
In order to establish a scienti c, feasible, and high-e ciency English language matching model, based on the above research and analysis, this paper makes further experimental analysis, so as to con rm the reliability of pattern design and observe whether the model can match English language to the corresponding pattern in practice. To this end, this paper will analyze and judge several important indicators such as distributed self-adjustment e ciency, interactive correlation degree of reinforcement learning, linear correlation parameters of Markov model, and English language pattern matching e ciency. is paper selects three English learning sample sets from ordinary colleges and universities for efciency analysis. Figures 4 and 5 are the analysis diagrams of the interactive correlation between distributed self-moderation e ciency and reinforcement learning on sample sets A, B, and C. erefore, it can be observed that in the distributed selfregulation e ciency and interactive correlation of reinforcement learning, it has a good impact on the self-regulation and correlation of English language patterns, which provides good reliable data and correlation coe cient for the matching with the next analysis. In the distributed selfadaptive e ciency, it is observed that it has a good test e ect on sample set C, and, in the overall trend, it can also be known that the e ciency is gradually improved with the increasing of the numerical value on the quantization axis, which also re ects the advantages of the model in data processing and also has a good analysis e ect for a large number of data. For the interactive relevance of reinforcement learning, in this paper, reinforcement learning is an important indicator of the entire English language learning model, and it is also of great signi cance to the matching model, because the interactive relevance is important for the model to push a reasonable and correct English language.
e learning model has a decisive in uence. erefore, it can also be found in the experiment that the test results on the sample set C have always shown good results. is is also because the sample set C also has a good advantage in distributed self-adjustment. erefore, it will also achieve good results in terms of interaction. When measuring the linear correlation parameters of Markov model and the e ciency of English pattern matching, another two sets of sample sets Q1 and Q2 are used in this paper. Figures 6 and 7 are the analysis charts of linear correlation parameters of Markov model and English pattern matching e ciency on sample sets Q1 and Q2. e above two parameters are a direct comparison of the model. rough the experiment, it can be found that the linear correlation parameters of Markov model are generally stable in Q1 but relatively large in Q2. is may be because the sample set in Q1 is concentrated and the correlation in English language is strong, is will lead to the Markov model's judgment that it has high linear correlation, and the transformed graphics will be stable, while, in Q2, on the contrary, due to the lack of concentration of language, the resulting graphics will be divergent and unstable. In the e ciency of English language pattern matching, it can be found that, in the last 4-5 stages, Q1 and Q2 have assimilation phenomenon. is is because the algorithm designed in this paper has a good e ect on the control of errors, and the error reduction rate has reached 85.6%. is will have a great impact on the results. Although the results are somewhat deviated due to the di erence of sample sets and  the inconsistency of calculation ow in the comparison of previous experiments, the error function is designed at the end, and even the data that originally deviated far away can get a certain reference value.

Conclusions
e development of students' language ability is a gradual and spiraling increasing process. Among them, the autonomous learning process is a process in which students exert their subjective initiative to learn actively, and it is also a teaching process in which teachers play an important role. By analyzing the relevant theories of learning theory and learning environment, this paper establishes the abstract model of "role activity environment" of distributed learning system and establishes the problem framework of system modeling from the two dimensions of system elements and modeling level, that is, according to the matrix model of "role activity environment" and "theoretical basis analysis method design model." e reinforcement learning technology and agent technology are introduced into the research of adaptive system. On the basis of dynamic binding mechanism, the adaptive mechanism of adaptive system in uncertain environment-reinforcement learning-based adaptive mechanism is proposed, and the corresponding learning algorithm is proposed to support the learning process of adaptive agent. e reinforcement learning algorithm that builds the environment model through the shared experience strategy can reduce the training and speed up the learning process by constructing the environment model between the agents through the shared experience strategy. Finally, the experimental simulation in the grid environment proves that the algorithm is e ective and convergent. Because the algorithm designed in this paper has a good e ect on the control of error, the error reduction rate has reached 85.6%. However, this paper needs further modi cation. ere are still some problems in the research. For example, we need to return to the goal of reinforcement learning and nd better decisions. It is necessary to jump out of the decisions and data that have been tried before, so that it is possible to nd a better decision. is needs further modi cation in future research.
Data Availability e data used to support the ndings of this study are available from the author upon request.

Conflicts of Interest
e author declares he has no con icts of interest or personal relationships that could have appeared to in uence the work reported in this paper.