Research on Intelligent Retrieval Method of Teaching Resources on Large-Scale Network Platform

With the increase in information on various cloud computing platforms, there are more and more teaching documents and videos, which provide sufficient resources for people to learn. Facing the large-scale digital teaching resources, how to quickly and accurately retrieve the required content has become an important research direction in the information field. Especially in the face of heterogeneous, dynamic, and large-scale teaching resources stored in the cloud computing platform, the traditional cloud computing resource retrieval has poor performance and low work efficiency. To solve this problem, a cloud computing platform retrieval method based on genetic algorithm is proposed, which is suitable for intelligent retrieval of teaching resources. Firstly, the teaching resource storage system based on cloud computing platform is analyzed, and the overall architecture of the system and the network topology of cloud storage data are given. Then, a resource retrieval method suitable for cloud computing platform is designed by genetic algorithm, and the convergence performance of genetic algorithm is improved by ant colony algorithm. Finally, the selection algorithm in genetic algorithm is optimized by using random numbers and increasing the number of cycles. The experimental results show that the proposed intelligent retrieval method has greatly improved the Recall and Precision compared with the traditional retrieval methods.


Introduction
With the increase in information on various cloud computing platforms, there are more and more teaching documents and videos. Di erent from local storage, users using cloud data can greatly improve work e ciency and reduce hardware investment costs [1][2][3][4]. Many products and services based on cloud computing are constantly being introduced, and the scale and elds involved in the computer industry are constantly expanding. "Cloud Computing-Aided Instruction" (CCAI) has become a new means for colleges and universities to set up modern teaching [5][6][7], which has e ectively improved the teaching quality technically. e features of CCAI mode are very bene cial to the information management of teaching, reduce the capital investment and maintenance costs, improve the network security, and help to build a personalized teaching environment.
However, with the continuous growth of digital teaching resources on the cloud computing platform, how to quickly and accurately retrieve the required content has become an important research direction in the information eld [8,9]. In most cases, these network resources are unorganized, or each has a di erent organizational structure, which brings a lot of pressure for users to inquire about resources. Although the emergence of search engines has eased the pressure of resource inquiry, most search engines are not satisfactory in recall and precision. Most of the time, users cannot nd the resources they need from a large number of inquiry results. In this case, the user experience is poor, and the user still has not got rid of the trouble of too much information. Web information retrieval belongs to the category of information retrieval and is an important development stage in the field of information retrieval.
Genetic algorithm [10,11] is a globally optimized intelligent probability search algorithm developed by referring to the natural selection and genetic evolution mechanism of organisms. e genetic algorithm is an effective method for finding the optimal solution in a large solution space. Searching for an optimal query in large-scale information retrieval system can also be regarded as a problem of searching for the optimal solution in a large solution space. erefore, how to apply the retrieval method based on the genetic algorithm to the cloud computing teaching platform to improve the retrieval effect is the key research content of this paper. e results show that the genetic algorithm is effective in query optimization, and it can overcome the shortcomings of low Recall and Precision of the retrieval system, so that users can accurately and efficiently obtain the required network teaching resources.

Related Works
At present, the resource retrieval methods of cloud computing teaching platform are mostly based on manual classification or keyword matching technology [12,13].
ese two retrieval methods have not optimized the user's query requirements, which lead to the unsatisfactory retrieval results of these teaching platforms.
ere are many disadvantages in the retrieval method of manual classified catalog. e first is inefficient. Administrators of resource management systems need to upload resources based on manually categorized directories. However, once there is any objection to the manual classification catalog, the administrator shall be contacted to modify relevant catalog. e second is poor compatibility. Resources in one system are hard to reuse in another. To use these resources, the administrator needs to enter them one by one in another system. If we want to overcome these shortcomings, we need to provide a unified resource storage method, and the resource storage method based on cloud computing platform is a good solution.
e retrieval method based on keyword matching has great limitations in the semantic disclosure of information, and it is difficult to guarantee the accuracy and precision of information. e retrieval system simply matches the keywords entered by the user. Many resources that should be retrieved are not retrieved, while resources that should not be retrieved are retrieved. is requires query optimization. Global analysis is an early query optimization method with practical application value. Roul [14] proposed a global analysis method based on Latent Semantic Indexing, which realized effective semantic clustering and topic sorting of web documents. However, when the document set is very large, it is often infeasible in time and space to establish a global dictionary of word relations, and the update cost after the document set changes is huge.
At present, the popular local analysis methods mainly include Relevance Feedback and Pseudo Feedback. Pseudo Feedback is developed on the basis of Relevance Feedback.
Relevance Feedback is a very important mechanism for query optimization in information retrieval. Because of the remarkable effect of relevant feedback, it has been widely applied and studied in information retrieval. Zhang et al. [15] proposed a method to improve the query effect by using relevant feedback.
is method expands and shrinks the query at the same time, thus obtaining a high recall rate. Pseudo-relevance feedback does not need to interact with users. It directly regards the first N documents retrieved by the first query as relevant documents and optimizes the query based on this. Wang et al. [16] proposed a pseudorelevance feedback framework for information retrieval, which combines relevance matching and semantic matching. However, the selection of keywords in Pseudo Feedback is more important. Generally speaking, keywords with higher weights are selected for query expansion. is selection method ensures the importance of keyword selection, but it does not guarantee that keywords are related to the topic.
Although information retrieval technology has made some progress, the performance of retrieval engines in large-scale network platforms still cannot meet users' expectations. Because of the huge retrieval data set and the diversity and complexity of the factors that affect retrieval efficiency, the above optimization techniques are not ideal in practical application. e introduction of the genetic algorithm provides a new way to solve information retrieval problems. erefore, a cloud computing platform retrieval method based on the genetic algorithm is proposed. e main innovations and contributions are as follows: (1) try to apply the genetic algorithm, which is suitable for finding the best solution in large space, to retrieval optimization, and design a resource retrieval method suitable for Spark platform, so as to overcome the low Recall and Precision of the retrieval system; (2) the ant colony algorithm is used to improve the convergence performance of the genetic algorithm, and the selection algorithm in the genetic algorithm is optimized by using random numbers and increasing the number of cycles.

Cloud Computing eory and Related Technologies.
Cloud computing is a research hotspot in computer science and technology at present, which has attracted the attention of many enterprises and related Internet experts, and is an important trend of computer network technology development in the future. e concept of cloud computing was first put forward by Ehrlich Schmidt, CEO of Google Inc., at the Internet Conference in 2006. A typical cloud computing platform needs to have (1) a gridded data storage matrix network; (2) firewall equipment; and (3) computing resource equipment, allowing users to remotely use an expandable cloud storage space by leasing, to realize cloud application services [17], as shown in Figure 1.
A complete cloud computing architecture should include access layer, core layer, resource convergence layer, API interface layer, and application layer [18], as shown in Figure 2.

Cloud Storage System Network Topology.
e teaching resource system under CCAI mode needs to meet the requirements of all-weather, all-geographical, and all-connection. In this paper, C/S mode [19] is adopted to construct the service system architecture of network teaching resources, and all data are stored in the data server, as shown in Figure 3. In the teacher's office, upload or access the online teaching server through the campus network. Students on campus can access learning resources through campus network in dormitory or library. On the other hand, offcampus personnel can also remotely access the training and learning resources through the Internet, thus realizing the efficient sharing of limited teaching resources, breaking the geographical space limitation, and reducing the input cost of manpower and material resources.
At present, there are many excellent learning resource banks, some of which are all open, and the construction of these network resource banks has laid the foundation for the improvement of network education. However, these learning resources have a disadvantage; that is, they are difficult to be compatible with each other, that is, different systems have different learning resources, and the construction standards of these resources are different, so they cannot share resources. If you want to use the resources of another system in one system, you need to rebuild the resources according to the resource construction scheme of this system. In this situation, the learning resource pool has not been shared in the real sense.

Intelligent Retrieval of Teaching Resources
Based on Genetic Algorithm

Design of Resource Retrieval Method Based on Genetic
Algorithm. As mentioned above, faced with the heterogeneous and large-scale teaching resources stored in the cloud computing platform, the traditional cloud computing resource retrieval has poor performance and low work efficiency. erefore, this paper uses the genetic algorithm to realize the retrieval of cloud computing resources. First, suppose that there are m hosts H in the resource retrieval task, and n virtual machines V are installed on these hosts, and each genetic individual is coded k 0 , k 1 , . . . k n−1 by coding mapping [20][21][22]. For example, as shown in Figure 4, in the mapping relationship between virtual machines and hosts, if the sequence length is 5, then the number of 0-5 of V is {1, 0, 2, 0, 2}. e number in the sequence is the number of host H, and then the population is initialized. Let the total number of constituent objects of a retrieval task be N and the fitness of each of the N constituent objects be f i . en, the probability of the i-th object being selected for evolution is as follows: Let the position change of the retrieval task in a certain period of time be δ(H). While the probabilities of selection crossover and change of the genetic algorithm are P c and P m , respectively, the expected value of the next generation belonging to the dynamic process of retrieval task is as follows:

Mathematical Problems in Engineering
where Ο(H) is the dynamic order of task [23]. e longest distance of transmission is L, and m(H, t) is the number of objects in the next generation that need to be transmitted by the retrieval task. f(H, t) and f(t) are the fitness and average fitness of the next generation of objects that need to be retrieved.
In the process of retrieval task, in order to ensure the integrity of the object and prevent the local data loss due to the change of retrieval task, the probability of selecting crossover operation must satisfy the following formula: en, according to formulas (2) and (3), we can get the following: In formula (4), generally, the value of P m is very small and then formula (4) can be further optimized to obtain as follows: If f(H, t)/f(t) > C (C is constant), it means that the operation has not reached the optimal solution calculated by the algorithm. Let K be given as follows: If K > 1, there are: From this, we can recursively get the following: After the object of the retrieval task is iteratively calculated by the genetic algorithm, the position change of the resource object required by the retrieval task in a certain period of time can be obtained. During the training of position change, the ant colony algorithm is used to improve the convergence performance of the genetic algorithm.

Genetic Algorithm after Ant Colony Optimization.
Let the number of ants in the nest be R and the set of elements to be optimized be D, where D ϕ i represents its i-th element. In order to solve the initial population problem, the number of all parameters to be optimized in this paper is n. Assuming that there are K possible values of these elements φ i , then ζ j (D φ i )(0) is the pheromone of the jth element under the initial condition.
According to formula (9), the t-th ant calculated its parameters to distinguish the probability of each possible value [24][25][26].
en, elements are selected from the set D ϕ i with high probability and adjusted according to the following formula: where Δζ j (D φ i ) is the information increment on element φ i , representing the sum of pheromones left by all ants passing through this element. Its calculation method is as follows: e above process was repeatedly performed until the maximum allowed number of iterations was reached, or all the ants could obtain the unique element, thus obtaining the optimized initial population-related parameters.
After the initial population is generated by the ant colony algorithm, it is necessary to continue the genetic operation. e main contents of genetic operation are selection operator, crossover operator, and mutation operator. e operation of the traditional genetic process will lead to premature convergence, so this paper improves the selection operator in the genetic operation in order to improve the convergence speed of the addition algorithm and obtain a better solution. is paper improves the selection algorithm based on traditional roulette. In the improved roulette method, the selection operator will also cycle m times, but the condition of the cycle is modified: whether m chromosomes have been selected. If yes, these selected chromosome markers will be used as the next generation, otherwise keep turning. erefore, the required individuals will be generated only after M random numbers are generated in each cycle, thus ensuring the diversity of the next generation population and improving the chance of selecting the best chromosome.

Design of Fitness Function.
In order to achieve the performance balance (reduce the energy consumption) on the premise of improving the work efficiency, this paper combines the service quality constraint and the energy consumption constraint to construct the fitness function. Among them, the total QoS violation Q total of virtual machines is calculated as follows: where MIPS total L and MIPS total M are all allocated millions of instructions per second and those that are not allocated on time, respectively. Total system energy consumption E is calculated as follows: where Host i is the energy consumption of the i-th host in cloud computing retrieval. A double index constraint composed of quality and cost is adopted as the fitness function. e fitness function is defined as follows: where a and b are the weights corresponding to service quality violations and total energy consumption, respectively.

Experimental Setup.
In order to test the performance of the proposed retrieval method based on the genetic algorithm, it is compared with Pseudo Feedback and extended retrieval method based on local context analysis (LCA). Experimental data were from the CISI test set. e CISI test set is a test set on information science, which consists of 1460 documents and 112 searches. e test set source url is http:// www.dcs.gla.ac.uk/idom/ir resources/test _ collections/. e test set contains the full text of the document, the retrieved initial text, and a list of document relationships. In the list of retrieved and document relationships, each retrieved related document has been given.
Each document and the initial retrieval are preprocessed (stop words eliminated). A word stem extraction algorithm is adopted to extract the word stem and establish a keyword dictionary. Extract keywords from the dictionary and calculate their weights. At the same time, the retrieval of each document is vectorized. e cosine similarity calculation method is adopted to calculate the similarity between the initial retrieval and the documents, and the documents are sorted in descending order according to the size of the similarity. e more advanced the document is, the closer it is to retrieval. In genetic algorithms, generally speaking, selecting a larger initial population can handle more solutions at the same time, so it is easy to find the global optimal solution. e disadvantage is that it increases the time of each generation selection [27,28], so the population size is generally 20-100. In the optimization process, the crossover probability always controls the crossover operator which plays a dominant role in genetic operations. e crossover probability controls how often crossover operations are used. e higher the frequency is, the greater the probability for each generation to produce new individuals is, and the better the diversity of the population is, and the faster it can converge to the optimal solution region. However, too high a frequency may also lead to premature convergence, generally taking the value of 0.4-0.9. When the maximum evolutionary algebra is used as the termination condition of the genetic algorithm, it is generally between 100 and 500 generations. In the experiment of this paper, the setting parameters of the genetic algorithm were as follows: initial population was 30, crossover probability was 0.4, mutation probability was 0.3, and maximum evolution algebra was 100.

Evaluation Indicators.
Recall and Precision are widely used evaluation criteria of Web information retrieval effect [29]. Recall is the ratio of the number of relevant documents retrieved to all relevant documents in the document collection, and Precision is the ratio of the number of relevant In search engines, the first 10 or 20 documents usually reflect the results of the first page and the first two pages. erefore, this paper uses the Recall and Precision of the first 10 or 20 retrieved documents as the evaluation indicators. Figures 5 and 6 show the Recall and Precision (the first 10 documents) of 10 different searches using three different algorithms, respectively. Note that, as mentioned in the previous section, only the first 10 documents retrieved are counted here.  Table 1 shows the comparison results of retrieval performance of the three algorithms.

Result Analysis.
As can be seen from Table 1, compared with Pseudo Feedback and local context analysis-based retrieval extension method (LCA), the retrieval method based on the optimized genetic algorithm has higher Recall and Precision, that is, better retrieval performance. e number of false feedback detected is very large, and the proportion of the relevant literature is relatively low; that is, its Precision is relatively low. is retrieval mode has poor user experience, and users need to find the information they need by themselves from a large number of check-out results. However, the algorithm proposed in this paper does a good job in this respect, and most of the checked-out documents are related to the retrieval topic; that is, most of the checkedout results are the information that users need. It does not take users too much time to pick and choose the materials they need from the results.

Conclusions
Aiming at the poor performance of traditional cloud computing resource retrieval, this paper proposes a cloud computing platform retrieval method based on the genetic algorithm. e genetic algorithm is used to design a resource retrieval method suitable for cloud computing platform, so as to overcome the problems of low Recall and Precision. In    addition, the ant colony algorithm is used to improve the convergence performance of the genetic algorithm, and the selection algorithm in the genetic algorithm is optimized by using random numbers and increasing the number of cycles. e proposed method has achieved good retrieval performance in Recall and Precision, which verifies its feasibility. However, because there are not enough teaching resources in this system, the performance of the system has not improved much. If more types of teaching resources such as video resources and audio resources can be provided, the Recall and Precision of the proposed retrieval method will be obviously improved.

Data Availability
e experimental data used to support the findings of this study are available from the corresponding author upon request.