Corpus-Based Japanese Reading Teaching Database Cloud Service Model

With the widespread application of mobile Internet technology, big data, and artificial intelligence, the demand for computing resources in the whole society is growing rapidly. Database cloud service is an online managed high-availability database service built on a cloud computing platform. It can be used as a basic service of the cloud computing platform and has the characteristics of high service availability and high data reliability.(e purpose of this paper is to design a corpus-based Japanese reading teaching database cloud service system and integrate a machine learning-based Structure Query Language (SQL) injection detection function for the system.(e proposed work is as follows: (1) Design the basic framework of database cloud service. (e module of cloud database service is introduced, and the realization logic of main system functions is analyzed. (2) A SQL injection classifier based on feature engineering andmachine learning is designed for database cloud services, which can determine whether the input SQL command is an injection statement. (e classifier is deployed between the database instance and the front-end server to integrate the SQL injection detection function into the cloud database instance.(en use the support vector machine algorithm to perform simulation experiments on the classifier model to compare the classification performance of the support vector machine algorithm when using different kernel functions. It is found that when the classification decision threshold is 0.5, the classifier using linear kernel function or radial basis kernel function has better classification performance for SQL sentences.


Introduction
Cloud service providers are progressively investing in research and development as cloud computing technology continues to advance. Database (DB) is a fundamental component of many applications and thus plays a prominent role in the application system [1]. In recent studies, the wellknown information technology (IT) consulting firm Gartner predicted that cloud services will dominate the database business in the future. By 2022, 75 percent of databases will be installed or transferred to the cloud platform, with just 5 percent evaluated for local deployment [2,3]. is report shows that the research on database cloud services has broad prospects, and the database is the foundation of many applications, which is of great significance to the research on database cloud services. Database cloud service is the trend of future database technology development. After studying a large number of Japanese reading teaching database materials, this paper proposes corpus-based research on the cloud service model of Japanese reading teaching database. Increasing the database's storage capacity while removing the need to repeatedly configure humans, hardware, and software is made possible by deploying the database in a virtual computing environment [4,5]. To some degree, traditional databases suit the demands of user data management. However, as a result of its intrinsic flaws and the fast growth of information technology, particularly in the context of the administration and use of vast data on cloud computing platforms, its downsides are becoming more apparent. Database cloud services development has progressively become the development path of a new generation of databases. It tackles the issues of high hardware cost, poor scalability, high management difficulties, sluggish business response, and low resource usage as compared to conventional databases [6,7]. In the database cloud service application, users can use database resources on demand and do not need to know the implementation details of the database. e user likes using a database on a single server. e impact of database cloud services has greatly changed the data management mode of enterprise managers. Enterprise users may swiftly construct numerous databaserelated apps on the web for small-and medium-sized businesses to accomplish full enterprise administration [8]. Data management resources for big businesses may be reduced by using a database cloud service. Data management has seen a shift from on-premises data centers to cloudbased database services [9]. e wide range of possibilities for database cloud services has spurred innovation and progress in the sector. Database as a service (DBaaS) has become more popular due to the rise of cloud service models. Due to the fact that users are freed from having to worry about database setup and administration, conventional IT departments and business departments may operate at higher levels of productivity [10]. In Section 2, the review of literature is given which highlights the related works in the field. Section 3 completes the design of the database cloud service system based on analysis of system requirements. It introduces the implementation and design of all system's submodules and comprehensively analyzes the system functions' implementation process. In Section 4, the performance test of the DB cloud service system is designed and implemented in the previous article and the support vector machine (SVM) two-category test in SQL injection detection. Finally, Section 5 concludes the study.

Related Work
Corpus construction is a basic project to provide various knowledge resources for all aspects of language information processing [11]. e establishment of a high-quality, deepprocessed corpus is a basic condition for the continuous development of natural language processing technology. However, corpus construction is a huge project. It is necessary to establish and maintain large-scale language data resources, matching unified and standardized management measures and high-end management techniques [12]. Because our country's Japanese reading teaching database is not ideal, it is critical to build a corpus-based Japanese reading teaching database. According to the needs of enterprises and individuals, database cloud services are divided into the following three categories: e first category is Microsoft's relational database cloud service SQL Azure, which can be used to store business data, consumer data, or system data [13,14]. SQL Azure is the first cloud service model that fully supports relational databases. e bottom layer uses the latest stable version of SQL server database as the engine [15]. SQL Azure has the challenge of how to balance the potential goal of optimizing resource allocation efficiency with a positive user experience and understand when cloud database customers use their database instances and when they are idle, so as to develop idle profiles for customer databases; literature [16] realized the optimal allocation of database resources by predicting the usage of Azure SQL user database resources across data centers. e second category is "SimpleDB" products. Amazon's "SimpleDB" is a web service that can provide core database functions, such as searching and querying structured data quickly and in real time [17]. "SimpleDB" is considered to be the most valuable data storage solution for building new web applications, because it usually performs many daily tasks related to database management and scalability and eliminates the burden of developers considering data modeling, maintaining indexes and performance adjustments or data operations, and it is also an ideal solution for existing applications [18]. e third category is Oracle's "ExaData" database machine, whose core part is composed of a database server and a storage server and directly provides users with a database machine [19]. Oracle's "ExaData" comes with its own CPU and memory, which realizes the transfer of query processing to the storage system and greatly reduces the amount of data sent to the server. But in terms of technology, because "ExaData" is not a true share-nothing MPP architecture, it does not support unlimited expansion of system resources. When a large number of parallel queries and data loading processes are processed, it must wait for other processing to release resources, resulting in a decrease in parallel processing performance [20]. "ExaData" series versions can provide large enterprises with a complete set of safe and reliable cloud database services through continuous optimization and upgrading of software and hardware configuration and storage services [21]. Domestic research on database cloud services by Alibaba cloud, Tencent cloud, and Baidu cloud has also achieved rapid development and significant results, especially the Alibaba cloud database which is the most prominent. Relational cloud database products and nonrelational cloud database products have been released one after another. e database cloud service abstracts the underlying relational database into a series of service nodes and gives customized SQL for database access. It is mentioned in [22] that using database management systems (DBMSs) to schedule database cloud resources cannot effectively improve resource utilization and performance. In order to solve the problem of effective allocation of virtual services to physical resources and all component configurations, a resource allocation device is designed in the literature. Multitenant uses multitenant services through application interfaces, and each database service calculates and analyzes cost, workload, and related resource models through resource allocators to improve the performance of cloud database services.
Although the database cloud service technology is developing rapidly, at present, most of the database cloud products launched on the market are based on commercial products in the public cloud environment and the database cloud service system in the private cloud environment for internal use by enterprises or institutions.
ere are few solutions. is paper studies the current status of database cloud services at home and abroad and related technologies, proposes a solution to build a database cloud service system under a private cloud platform, and studies a corpus-based Japanese reading teaching database cloud service model to meet the needs of related enterprises or institutions for private cloud databases.

Method
On the basis of system requirements' analysis, this section completes the design of the database cloud service system. First it introduces the design and implementation of each submodule of the system and then analyzes the implementation process of system functions in detail. e basic functions of this system are realized by three main modules, namely, the back-end management module, the task management module, and the instance agent module. Among them, the instance proxy module is the most complex module in the system.

Back-End Console Module.
e main function of the back-end console module is to process requests received by the database cloud service system from users or other modules within the system and to process business logic. Table 1 defines 14 user interfaces, which accept user requests and complete system functions. Table 2 defines the 6 most important internal interfaces. Internal interfaces are interfaces that are open to other modules of the system or other systems within the cloud computing platform, and they are used to perform logical interactions between internal modules. e separate design of user interface and internal interface is to prevent illegal external access and reasonably divide the categories of access authority management. When the user interface is accessed, the cloud authentication system verifies the user's identity and key, while the internal interface verifies the identity and key of the internal module that initiated the request. e separate management of the two different identities can clarify the logic of the authentication module of the cloud computing platform, which is convenient for development and maintenance.
As shown in Figure 1, it is the information flow diagram of all virtual machine resources used by the back-end console to manage the system. System tasks can be defined as two types of tasks: synchronous tasks and asynchronous tasks. When the system creates, deletes, upgrades, or automatically repairs an instance, it executes asynchronous tasks.

Task Management Module.
e task management module is an important part of the database cloud service system, and its role is to execute the asynchronous tasks of the system. It receives the information of the task initiated by the back-end console from the message queue and generates the configuration file of the task information for use by the corresponding task execution script, then calls the task script to execute the task, and manages the execution status of the task; finally, it backfills the relevant information of the task to the system database. e task management module realizes the unified management, concurrent scheduling, and status monitoring of the task execution program, so that the task execution program can be separated from the front-end, and the development efficiency is improved. Multiple task management modules can be deployed at the same time to monitor and manage tasks. After the task management module is started, a task monitoring subprocess is generated and the monitoring port is bound to perform task monitoring. As soon as a task message has been received and processed, a file relating to that task is formed and the task execution subprocess is initiated, which waits for the task to complete. Data information during task execution will generate a data file, and log information will generate a log file; if the task is successfully executed, the data will be backfilled and the log information will be sent to the system database, the task status will be changed to successful execution, and the configuration, data, and log information will be deleted. When the task is executed abnormally, backfill the data, log information to the system database, and change the task status to execution failed.

Instance Proxy Module.
e role of the instance proxy module is to receive requests from other modules, such as the back-end console and task management module, and operate on the instance's active and standby virtual machines and the database engine in the instance. e operations of the user and other modules of the system on the instance all must be implemented through the instance proxy module. e function submodules of the example proxy module are divided into the following. e functions of the instance proxy module can be divided into two categories, one is the function of interacting with the external module of the instance, and the other is the management function for the internal of the instance.
(1) e functions that interact with the external modules of the instance can be divided into remote request processing functions and monitoring functions. Remote request processing function: In order to solve the problem that other modules in the database cloud service system are inconvenient for instance operation due to separate deployment and to facilitate the unified management of database cloud service instances, it is necessary to implement the Hypertext Transfer Protocol (HTTP) server function for the instance proxy module to provide Representational State Transfer (RESTful) compliance style Application Programming Interface (API) for other modules to call and execute. e remote requests sent by other modules mainly include SQL commands and call commands for custom task scripts. Monitoring function consists of two functions: monitoring data collection and monitoring data push. Monitoring data collection refers to the collection of related operating status information of the virtual machines, engines, and components of the instance main library at regular intervals, and the collected monitoring data is written to a file. Monitoring data push refers to pushing the latest monitoring data to the cloud monitoring system at regular intervals. Users can view the monitoring information of the database cloud service instance on the front-end interface, grasp the running status of the instance, and set alarm policies.

Mathematical Problems in Engineering
(2) e functions for instance-oriented internal management can be divided into timed task scheduling functions and log management functions. Timed task scheduling: Realize the function of regularly executing custom tasks, such as regularly executing custom task script update tasks and automatic database backup tasks. is function repeatedly checks the configuration file of the scheduled task and executes the corresponding custom task script regularly according to the scheduling strategy in the configuration. Log management: It includes log output, log segmentation, and log file cleaning functions. e log output function includes defining the log category, formatting the output content, and defining the path of the output log file. If the log information is written to the same log file, as the instance runs, a single log file will become larger and larger, the I/O operation performance of the log file will be severely reduced, and it will also cause inconvenience to the log transfer. erefore, the log file needs to be divided regularly, and a single log file is divided into multiple log files named by time. In Table 1: User interface definition table.   Number  Interface name  Category  1  Instance creation  Instance management  2  Instance list query  Instance management  3  Instance information query  Instance management  4  Instance restart  Instance management  5  Instance delete  Instance management  6 Instance mating Instance management 7 Management account creation Account management 8 Manage account list query Account management 9 Manage account deletion Account management 10 Backup creation Data management 11 Backup file list query Data management 12 Backup and restore Data management 13 Backup strategy modification Data management 14 Whitelist modification Data management  order to reduce unnecessary consumption of instance disk space, log files should be uploaded to the cloud storage platform regularly and the old logs on the instance disk should be cleaned up. In summary, the instance agent module is divided into four submodules, which are the remote request processing submodule, the timed task scheduling submodule, the log management submodule, and the monitoring submodule.

SQL Injection Detection of Database Cloud Service
Instance. Data security is one of the important concerns for evaluating the pros and cons of a database service, and it is a factor that must be considered when designing a database cloud service. Database security issues include the control of database access permissions, the security audit of database operations, the fine-grained permission control of database objects, and the detection and filtering of SQL injection attacks. e web application attack statistics report released by the network security organization OWASP in 2017 shows that injection attacks rank first, and SQL injection is the most common type of injection attacks. erefore, it is of great practical significance to design SQL injection defense functions for database instances. is paper designs a SQL injection detection system integrated into the cloud database instance. At first, it introduces the design and deployment of the detection system, uses the support vector machine algorithm to train the SQL classifier and evaluate the performance of the classifier, and finally optimizes the classification threshold of the classifier.

Introduction to SQL Injection.
SQL is the abbreviation of structured query language, which is a language used to manipulate databases. SQL injection refers to the use of HTTP requests to make the web server execute malicious SQL commands. e specific method is to insert a specific SQL command into the HTTP request and then send the HTTP request to the web server to deceive the server and execute malicious SQL commands, thereby stealing, tampering, or destroying data. Web applications can usually be abstracted into three layers, including presentation layer, logic layer, and storage layer. e presentation layer is the front-end client, including browsers on PCs and mobile devices and apps with online service functions on the mobile terminal. e client is used for interaction between web applications and users. e client sends an HTTP request to the web server and receives the HTTP response from the web server, thereby realizing front-end and back-end interaction. e logic layer is located between the storage layer and the presentation layer. It is the execution part of the business logic of the web application, which executes the business and returns the result to the client. e storage layer is dedicated to providing data storage services for web services. For data security, the logic layer and storage layer of modern web services are usually deployed on different servers. e web server uses SQL commands to read and write data in the database to complete its business. If there are security holes in the logic of the web server, after receiving some malicious HTTP requests from the client, it may generate unexpected SQL commands and send them to the back-end database. If the database cannot determine the security of the SQL command, it may execute the malicious SQL and cause data loss.

3.4.2.
e Defense Method of SQL Injection. Traditionally, protection against SQL injection has been carried out in two phases. e first stage is to protect during the writing of the web service source code. Developers follow the empirical guidelines for preventing SQL injection when writing the source code and use methods such as "check parameter format," "filter special characters," and "bind parameters" to make SQL injection fails. However, the effectiveness of these methods depends on the comprehensiveness of the security knowledge of the developers, which often leave undetectable loopholes due to human oversight. e second stage is to protect when the web service is running, that is, to deploy an intrusion detection system. ere are two traditional SQL injection detection systems: SQL injection detection systems based on program analysis and SQL injection detection systems based on feature matching.
(1) e SQL injection detection system based on program analysis analyzes the web service program to find vulnerabilities and then by patching vulnerabilities it defends against SQL injection. Program analysis includes static analysis and dynamic analysis. Static analysis is to analyze the source code to find potential security vulnerabilities. Dynamic analysis is done while the program is running, and the program is tested by writing specific test cases to find out security vulnerabilities. e disadvantage of the method based on program analysis is that the source code of the web application must be known, and the analysis coverage cannot be guaranteed, and it is easy to miss the test. (2) e SQL injection method based on feature matching includes two methods. e first method is to establish the characteristic syntax tree of the current database service, extract the legal SQL command syntax tree from it, and then perform pattern matching. If the detected SQL command does not conform to the legal pattern, a warning and blocking will be generated. But this method requires the source code to be known in advance. e second method is to analyze a large number of SQL injection instances, generate a matching model of illegal SQL commands, and then perform pattern matching on the actual running SQL statement. If the detected SQL statement conforms to the illegal model, it is judged as SQL injection. is method does not require knowledge of the source code of the application. However, these two matching methods require a lot of manpower and time for analysis and are prone to omissions in detection. e SQL injection detection system designed in this article is a defense Mathematical Problems in Engineering against SQL injection during the running phase of web services. Different from traditional string matching, this detection system extracts the lexical features of the original SQL statement and uses machine learning to train the classification model and then deploys the classification model to the cloud database instance to determine whether the original SQL received by the instance is SQL injection and make corresponding interception or release.

Introduction to Machine Learning Algorithms.
ere are many algorithms suitable for two classifiers in machine learning algorithms, such as the support vector machine (SVM) algorithm. e following describes the SVM algorithm. SVM uses the strategy of maximizing the interval to find the dividing hyperplane that divides the two classes of samples in the space. For linear inseparable problems, in order to realize the search of dividing the hyperplane, it is necessary to use the kernel function to map the samples into the high-dimensional space. Following kernel functions are commonly used: (1) (2) Formula (1) gives the linear kernel function, formula (2) presents the polynomial kernel function, formula (3) expresses the radial basis kernel function, and formula (4) is the Sigmoid kernel function. Different kernel functions are suitable for different classification samples, and the classification performance of different kernel functions needs to be evaluated in the test.
Decision tree is a learning method for classification and fitting.
is algorithm requires less data and is easy to understand and explain. Decision trees generally use information gain or Gini index as an index to measure the purity of a sample set. Assuming that the proportion of the i-th sample in the sample set S is w i , (i � 1, 2, . . . , n), the information entropy is Use the feature f of the sample to divide the sample, and the information gain obtained by the division is defined as e greater the information gain, the greater the increase in purity obtained by using the attribute a to divide the sample, and the better the division effect. Decision trees are prone to overfitting, which can be solved by training multiple decision trees in an integrated learning method.

Performance Evaluation Criteria of the Classifier.
To judge the performance of the classifier, comparison methods and performance indicators quoted from medical tests are often used. e method is to calculate the confusion matrix based on the classification results and the real labels.

Experiment and Analysis
e main content of this chapter is the performance test of the database cloud service system designed and implemented in the previous article and the SVM two-category test in SQL injection detection.

Performance Test.
e performance of the database cloud service system and the performance of the mysql database instance are tested below. e performance of the cloud database service system is mainly tested in terms of response time and the number of concurrent users. e performance of the mysql database instance is mainly tested in terms of the transaction volume TPS processed by the instance per second, the query volume per second QPS, CPU computing performance, disk IO performance, etc. Siege is an open-source HTTP/FTP load test and benchmark stress test tool. It can simulate N users to visit a certain URL in a cycle of R times and return the test result in JSON format. e important parameter is elapsed time, which means the test time. In this paper, Siege is used to simulate 10,000 users to access the system simultaneously once in a loop to obtain a graph of system response time changes. e system performance test response time is shown in Figure 2. It can be seen that the system performance is better when the number of concurrent users of the system is within 5000.
Sysbench is an open-source multithreaded performance test tool that supports various tests. e main purpose is to perform OLTP benchmark tests on the mysql database to test the load situation of the database under different parameters. In addition, sysbench also supports CPU computing performance testing, disk IO read and write performance testing, etc. Next, it uses sysbench for stress test mysql. e sysbench stress test is divided into three phases, the data prepare phase, the run phase, and the cleanup phase. e data prepare phase refers to the creation of test data, the run phase is to execute test commands, and the cleanup phase refers to the removal of invalid data generated by the test. In this test, the execution pressure measurement script is "oltp_read_write.lua", which means that the "Lua" script for reading and writing is about 7 points for reading and 3 points for writing. 3 tables are created, each with 5,000,00 data items, and finally connected to the database. After the data preparation is completed, use the run command to perform stress testing. e number of threads created for the stress test is 64, the stress test is executed for 60 s, and the reporting interval is 6 s.
is test selects transactions per second (TPS) as an indicator to measure the performance of the mysql cloud database instance. Comparing the performance of the database cloud service system instance TPS with that of the local database instance TPS, the comparative result presented in Figure 3 shows that the database cloud service system instance performs better.
Comparing the performance of the database cloud service system instance TPS with that of the local database instance TPS, the comparative result presented in Figure 3 shows that the database cloud service system instance performs better. e performance of the cloud database instance has reached a good level and can meet the performance requirements of users for cloud database instances.

Support Vector Machine Two-Classification Test.
Use the training data set to train SVM two classifiers and input the test data set to the classifier for testing. For each input, the classifier outputs a probability within the range of [0, 1]. Select the default classification threshold (the default is 0.5); when the score is greater than the threshold, it will be judged as 1, which means negative (normal SQL); when the score is less than the threshold, it will be judged as 0, which means it  Mathematical Problems in Engineering is positive (SQL injection). Finally, the classification results are compared with the real labels in the test training set, and the ROC curves of the SVM two classifiers using different kernel functions are drawn, and Figures 4 and 5 are obtained. In the figure, the x-axis direction is the FPR; the larger the value, the worse the classification performance; the y-axis direction is the TPR; the larger the value, the better the classification performance. From the ROC graph, it can be judged that the SVM two classifiers using the Sigmoid kernel function have a poor classification effect on SQL statements, and the other three kernel functions have a better classification effect.

Conclusion
e database industry will be dominated by cloud computing in the future, and more databases will be placed on or transferred to cloud platforms than there already are. Because of technical advancements and budgetary concerns, this is unavoidable. Because of the lower costs, small-and medium-sized businesses may choose public cloud databases. A private cloud database will be built for major organizations and institutions who are concerned about data protection. Cloud database service system solutions are few in the present private cloud environment, and the Japanese corpus-based teaching database for reading in Japan is much more difficult to locate. Based on the laboratory private cloud platform, this paper proposes the construction of database cloud services under the private cloud platform. It designed and implemented a database cloud service system. e proposed work is as follows: (1) Research the technical background of database cloud service, analyze the key characteristics of the database cloud service system, and clarify the requirements of system. (2) e design and implementation of the database cloud service system divided the components of the database cloud service system and designed the task execution mode. Tests show that the system can provide highly available database cloud services.
Data Availability e datasets used during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest
e author declares that there are no conflicts of interest.