Workload-Aware Performance Tuning for Multimodel Databases Based on Deep Reinforcement Learning

. Currently, multimodel databases are widely used in modern applications, but the default confguration often fails to achieve the best performance. How to efciently manage and tune the performance of multimodel databases is still a problem. Terefore, in this study, we present a confguration parameter tuning tool MMDTune+ for ArangoDB. First, the selection of confguration parameters is based on the random forest algorithm for feature selection. Second, a workload-aware mechanism is based on k-means++ and the Pearson correlation coefcient to detect workload changes and match the empirical knowledge of historically similar workloads. Finally, the ArangoDB confguration parameters are optimized based on the improved TD3 algorithm. Te experimental results show that MMDTune+ can recommend higher-quality confguration parameters for ArangoDB compared to OtterTune and CDBTune in diferent scenarios.


Introduction
With the rapid development of the Internet of Tings technology and network applications, the scale of data is growing explosively, and the types of data are becoming richer [1,2]. For example, applications such as social commerce and smart water conservancy usually include structured relational data and unstructured graph data. Traditional relational databases have difculty meeting the needs of storage and querying diverse data structures. Te emergence of multimodel database [3] (MMDB) provides a new solution to efectively address the shortcomings of traditional database. As a new trend in the feld of database management systems [4][5][6], multimodel database can store data of various structural forms in a single engine, without the need to deploy diferent databases for data of various structures. Multimodel database is also considered to be the next generation of data management system combining fexibility, scalability, and consistency [7,8].
Currently, multimodel databases are widely used in modern applications [9,10], but their default confguration often cannot achieve the best performance. Multimodel databases have the problem of confguration parameter optimization, which generally needs to be adjusted according to the actual workload and application confguration [11]. Confguration parameter tuning is always a key challenge and is the subject of signifcant research in the database feld. Tuning is usually performed by a database administrator (DBA) with extensive tuning experience. However, the method of tuning by DBAs has some limitations. First, tuning confguration parameters is an NP-hard problem [7]. Tere are hundreds of parameters in a database system, and there are connections between parameters, which makes it difcult for DBAs to complete the tuning of these confguration parameters. Second, the scheme for confguring parameters cannot be reused for database systems deployed in diferent environments (such as local hosts, clouds, or memory) [12][13][14]. It is difcult for DBAs to achieve high database performance by efcient parameter tuning under changing scenarios. Finally, DBAs are usually good at tuning the systems they are familiar with, but it is difcult to tune unfamiliar systems. Te research on the confguration parameter tuning of multimodel database, as a new kind of database, is less than that of traditional databases, and the tuning experience applied to traditional databases cannot be fully applied to multimodel database.
In our previous work [15], we proposed MMDTune, a parameter-tuning method for multimodel databases based on deep reinforcement learning. MMDTune has the problems of long training time and low computational efciency, which cannot be dynamically adjusted in the course of practical application.
To solve the above problems, we improved MMDTune by introducing a workload-aware mechanism. A confguration parameter tuning tool MMDTune+ (multimodel database tune plus) for ArangoDB [16] is proposed, which includes three modules: a confguration parameter selection module, a workload-aware mechanism, and a tuning algorithm. Among them, the confguration parameter selection module uses the random forest algorithm for feature selection, and the confguration parameters with high correlation to the current tuning indicators are extracted. Random forest algorithm can efectively handle high-dimensional, noisy, and correlated data while avoiding overftting and providing a measure of variable importance. Te workload-aware mechanism is based on k-means++ and Pearson correlation coefcients to detect changes in workload and match empirical knowledge of historically similar workloads. Te advantages of using k-means++ and Pearson correlation coefcients in matching similar workloads lie in their ability to efectively cluster and measure the similarity of workload characteristics, resulting in improved accuracy and efciency in workload matching [17]. Finally, Ara-ngoDB confguration parameter tuning is implemented based on the improved TD3 algorithm [18]. TD3 can handle high-dimensional parameter spaces, optimize performance over long training times, and achieve great performance on a variety of benchmark tasks [18]. At the same time, we optimize TD3 to make it converge faster. Considering that a benchmarking tool is needed in the tuning experiment to generate indicators that can simulate the workload in the real environment and measure the performance of the reaction system [19,20], we design and implement a benchmarking tool MMDBench (multimodel database benchmarking), which includes a workload generator and a metric collector.
Te contributions of our work are summarized as follows: (1) we propose an automatic confguration parameter tuning tool MMDTune+ based on a workload-aware mechanism and deep reinforcement learning for the multimodel database ArangoDB. (2) We develop a database benchmarking tool to provide comprehensive performance testing of our proposed architecture. (3) Based on the above performance benchmark tool, we verify our proposed method under diferent scenarios, loads, query, and insert modes. Compared with the existing automatic database tuning methods OtterTune [21] and CDBTune [22], our proposed method has the tuning efect and provides more advantages in resource consumption.
Te subsequent sections are organized as follows: In Section 2, we introduce some related works about database tuning. In Section 3, we present the proposed MMDTune+ and MMDBench. In Section 4, we mainly describe our experimental setup and experimental results. In Section 5, we draw some conclusions and point out some future directions for future work.

Related Work
Te workload is an important criterion for database performance tuning [23], which requires that automatic tuning is able to identify workload changes. Terefore, the frst step to realizing automatic tuning of confguration parameters is to accurately classify workload. Workload is varied, and diferent times and scenarios may cause changes in workload. At present, the related methods for workload-aware mechanisms mainly include two methods: supervised learning-based methods and unsupervised learning-based methods.
Supervised learning-based workload-aware methods have a limitation in that a large amount of labeled training data is required. Zewdu et al. [24] chose two algorithms to realize the awareness of database workload, including hierarchical clustering and classifcation regression trees. Teir experiments on TPC benchmark query and transaction type workload verifed that this method could efectively predict the category of workload. Elmafar and Martin [25] proposed the prediction framework of psychic skeptic prediction (PSP) to realize the self-optimization of DBMS, in which workload identifcation is performed by using a decision tree classifcation algorithm in two diferent workload sets, TPC-H + TPC-C and TPC-W. Te workload is divided into three types: OLTP, DSS, and hybrid, and the advantages of PSP classifcation are verifed.
Labeled workload datasets are rare in practice, so most database workload-aware methods use unsupervised learning. Literature [26] only uses SQL query structures to carry out similar matching; that is, the author carries out vectored mapping for diferent SQL query structures and then uses BetaCV, DunnIndex, and other methods to standardize these structural data pairs, which improves the accuracy of Aligon, Aouiche, and other algorithms based on clustering. Te efectiveness of the standardized method is verifed. Takahashi [27] proposed a perception method based on the k-means clustering algorithm and clustering efectiveness index, which took into account various data specifcations such as heavy load and light load, and divided all sensor data into multiple clusters. In research studies [21,28], the factor analysis (FA) method and k-means method are used to reduce the dimension of the internal state features of the database to improve the execution effciency, and then, the Euclidean distance is calculated according to these features to match similar workloads. Generally, in system tuning tasks, it is necessary to rely on tuning the rich experience and professional knowledge of expert databases to realize the optimal performance of the system. A database self-tuning system can also use this form. According to the historical tuning experience data in the search for a similar workload of empirical knowledge, the tuning system can draw lessons from the experience knowledge, which can improve the efciency of optimization. Terefore, methods for sensing workload changes have been studied and used, so that the tuning system can learn from historical experience knowledge to tune after the perceived workload changes, to efciently recommend parameters for the system.

Confguration Parameter Tuning.
Most of the previous work on automatic database tuning has focused on optimizing the physical design of the database [29], such as selecting indicators [30,31], partitioning schemes [32], or materialized views [33]. At present, the methods of database confguration parameter tuning can be divided into two representative studies according to whether the learningbased form is used: confguration parameter tuning based on rules and learning methods [34,35].
Te rule-based method is the selection of database management system knobs according to a predefned set of rules or heuristics, and the knobs are the confguration parameters of the database system. In most cases, this approach is designed for a specifc database and only for a specifc set of confguration parameters. IBM DB2 [36] releases a self-tuning memory manager to provide adaptive tuning of the database memory heap and cumulative database memory allocation. Tis technology combines control theory, runtime simulation modeling, cost-beneft analysis, and operating system resource analysis to provide memorytuning technology in the form of heuristics. Te authors in [7] propose a recursive tuning method BestConfg, which divides the high-dimensional parameter space into subspaces and uses the restricted derivation principle to search for the optimal confguration from the given confguration resources.
Deep learning has been successfully applied to solve computationally intensive learning tasks in many felds [37][38][39][40][41]. Zheng et al. [42] propose a self-tuning method using deep neural networks, and the authors suggest using statistical methods to identify key system parameters and neural networks to match the confguration of specifc workloads. Xiong et al. [43] propose a multiobjective tuning framework MQTuner, which not only considered the performance indicators (throughput) concerned in most studies but also extended the latency indicators. Te overall structure uses an artifcial neural network (ANN) to learn the mapping relationship between confguration parameters and performance, input the predicted performance and confguration into the genetic algorithm at the same time, and then use the genetic algorithm to search for the globally optimal solution. IBTune [44] is proposed for a large-scale cloud database cache tuning framework. Te author designs a two-layer deep neural network, according to the characteristics of the measured instance to predict the upper bound of the request and response time. Te size of the target bufer pool can be adjusted only when the predicted response time is within the safety limit. However, there are still some limitations to the above methods. It is difcult for them to achieve optimal performance under limited training samples, and they are prone to overftting [45][46][47].
Reinforcement learning, as a research hotspot of machine learning methods, has been used in some studies [22,28,48] to optimize the confguration parameters of database systems [49]. Zhang et al. [22] designed an endto-end cloud database automatic tuning system (CDBTune) based on the deep reinforcement learning algorithm DDPG (deep deterministic policy gradient). Te system uses DDPG to learn the mapping between database state features and high-dimensional confguration features. A two-state algorithm double-state deep deterministic policy gradient (DS-DDPG) built by QTune [48] combines neural networks and deep reinforcement learning methods. It utilizes the structure and internal characteristics of the database query and DS-DDPG input to perform database tuning simultaneously, taking into account the rich features of SQL query, and the model can use the query feature to predict changes in the value system's internal states. However, the system needs to collect a large number of real environments generated to train the neural network, which is a time-consuming process. Te authors in [28] extend GPR, deep neural networks (DNN), DDPG, and improved DDPG+ for the confguration tuning of a real database production and application scenario of an international bank. Due to the overestimation problem of the DDPG algorithm, the errors in the training process are constantly accumulated, which may have a negative impact on the results.
In summary, previous studies have been able to optimize the performance of databases to some extent. However, they have some limitations. First, tuning with traditional machine learning methods and deep learning methods relies on a large number of high-quality training samples, which are often the experience data accumulated by DBAs. However, for multimodel databases, DBAs are relatively short of experience in this aspect. Second, there are often hundreds of parameters in the database, which are interrelated with each other. Te simple regression method is far from sufcient for achieving the optimal goal. Finally, as a new trend in the database feld, the research on confguration parameter tuning of multimodel database is less than that of traditional databases, and the previous tuning experience cannot be directly transferred to multimodel database. Te confguration parameter tuning method based on depth DDPG can solve the frst two problems. In the absence of high-quality empirical data, it can reduce the difculty of data acquisition through empirical trial-and-error methods and achieve better performance in high-dimensional space confgurations, but the estimation problem still exists.

Framework of MMDTune+
To realize automatic tuning of multimodel database, a tuning tool MMDTune+ for multimodel databases is proposed. Figure 1 shows the overall process of parameter tuning, which consists of four parts: the confguration parameter selection module, the workload-aware mechanism, the International Journal of Intelligent Systems tuning algorithm, and the benchmarking tool. Among them, a feature analysis model is proposed based on the random forest algorithm for database confguration parameter selection. Te core of the workload-aware mechanism is to detect workload changes based on k-means++ and the Pearson correlation coefcient, calculate the similarity between the current workload and the historical workload database, match the historical workload, and then provide the tuning experience of similar workloads to the tuning algorithm. Te proposed algorithm is mainly based on the improved TD3 algorithm to provide automatic tuning of database.

Confguration Parameter Selection Module.
A multimodel database management system provides hundreds of confgurable parameters to meet various requirements, which greatly increases the difculty of tuning database confguration parameters. However, only a few confguration parameters provided by the database may have a signifcant impact on the performance. Tuning unnecessary confguration parameters is a waste of resources and inefcient behavior. Terefore, we propose a feature analysis model based on the RF algorithm. Random forest algorithm can be used for database tuning feature selection, which involves selecting the most important features in a database to improve its performance. Random forest can handle large datasets with a large number of features, which is common in databases. Meanwhile, random forest can handle noisy data, which can be common in databases with incomplete or inconsistent data. Also, it can identify complex relationships between features, which are important for identifying key features in databases that may be related to performance issues. Moreover, it can provide a ranking of feature importance, which can help identify the most important features that need to be tuned to improve database performance. Overall, random forest is a powerful and fexible algorithm for database tuning feature selection that can provide valuable insights into the most important features that need to be tuned to improve database performance. So, random forest algorithm is utilized to select the tuning parameters of the multimodel database. With confguration parameters and tuning performance indicators as inputs, a large number of confguration parameters are screened at the initial stage, and the parameters are sorted according to the weight of characteristic variables. Finally, the sorted confguration parameter labels are output. Compared with other feature selection methods, the RF algorithm has the advantages of fast training speed, good robustness, and high accuracy.
Te main applications of the RF algorithm are classifcation and regression. It is composed of multiple decision trees, and each node in the tree is a condition about a certain input feature. In terms of regression application, the fnal result is the mean value of each decision tree. Te commonly used objective functions for ftting RF regression tasks include the mean square error (MSE) and mean absolute error (MAE). MSE is used as the objective function in this paper. Te specifc process is shown in Algorithm 1.
After the model is trained, a feature correlation parameter feature_importance is generated. Te module can sort confguration parameters according to featur-e_importance. Te impact of confguration parameters on performance is positively correlated with this value. Te sorted confguration parameters are saved in the database. One can set the number of confguration parameters to select the confguration parameters that are highly relevant to the tuning indicator.

Workload-Aware Mechanism
Algorithm. Workloads in a database environment are constantly changing over time, and high-quality confguration parameters for a previous workload applied to the current target workload can degrade system performance. Terefore, the workload changes of the system should be monitored before confguration parameter  tuning so that the confguration parameter tuning system can sense the workload changes and recommend highquality confgurations for diferent workloads. At the same time, there are certain similarities in the confguration parameter adjustment strategies between similar workloads. If the tuning algorithm can be fne-tuned based on the experience of similar workloads, the tuning efciency can be improved to a certain extent.
Te workload data of this research are not labeled data, so the algorithm based on supervised learning cannot be selected. To solve the above problems, we propose a workload-aware mechanism solution for ArangoDB. Te kmeans ++ clustering algorithm is used to divide the historical workload into various categories. Tis algorithm is a popular clustering algorithm utilized in machine learning for data clustering. In the domain of multimodel database workload classifcation, the use of k-means++ ofers several advantages. First, it is highly scalable and efcient, enabling its application to large volumes of data. Second, it is less sensitive to initialization conditions, thereby enhancing its stability and performance. Finally, it can lead to more accurate clustering, especially when the data are distributed in a nonuniform manner. Tese advantages position k-means++ as a viable option for multimodel database load classifcation tasks. If the new workload is diferent from the previous workload after classifcation, then the current workload has changed. Te system can automatically use Pearson correlation coefcients to match the most similar workloads from the history library belonging to the new type and then tune for the new workload.
As a clustering algorithm [50], k-means++ can classify samples well without labeled training data, and it has the characteristics of fast computation speed and high robustness. Specifcally, k-means++ is a variant of k-means. Te kmeans algorithm itself is a distance-based clustering algorithm. Clustering is based on the similarity between data; that is, similar things are divided into a class, but it uses random rules to determine the initial clustering center. A poor initial cluster center setting can have a very bad efect on the results. In view of this, k-means++, on the basis of kmeans, improves the original k-means method of initializing cluster centers. Te basic idea is that the initial cluster centers should be as far away from each other as possible.
First, the internal state data vector is converted to Zscore standardization, so that these variables are on the same order of magnitude, and then, the important information is screened. Tis is processed by the principal component analysis (PCA) method. Ten, k-means++ is used to detect the change in workload, and the Pearson correlation coefcient is used to calculate the similarity between the new workload and the historical workload under this workload type. Te Pearson correlation coefcient is a measure of the linear correlation between two variables X and Y. It is a widely used statistical measure to evaluate the strength and direction of the relationship between two continuous variables. Te value of the Pearson correlation coefcient ranges from −1 to +1, where a coefcient of +1 indicates a perfect positive correlation, 0 indicates no correlation, and −1 indicates a perfect negative correlation. A positive correlation means that as one variable increases, the other variable also tends to increase, while a negative correlation implies that as one variable increases, the other variable tends to decrease. Te workload-aware mechanism module selects the previous samples with high similarity as the initial model parameters of the TD3 algorithm, which are then fne-tuned to improve the tuning efciency and performance improvement rate. Te whole process does not require human intervention, and the tuning system will automatically complete the subsequent tuning work after detecting the workload change.
3.3. Tuning Method. MMDTune+ implements ArangoDB confguration parameter tuning based on the deep reinforcement learning algorithm TD3. To make the TD3 algorithm more suitable for database confguration parameter tuning, the network structure of TD3 is improved. In addition, tuning algorithms SVR, GPR, DNN, and DDPG are extended in MMDTune+ for tuning ArangoDB parameters and evaluating the efectiveness of the TD3 algorithm.
TD3 is a deep reinforcement learning algorithm that has shown promising results in multimodel database tuning. Tis algorithm is particularly useful in settings where the objective function is nondiferentiable and noisy. One of the key advantages of TD3 is its ability to handle continuous action spaces, which is particularly relevant in the context of database tuning, where tuning parameters typically vary continuously. TD3 uses a deterministic policy, which enables it to efectively optimize continuous action spaces. Additionally, TD3 uses a twin network structure, which consists of two separate critic networks, to reduce overestimation of the value function. Tis twin network structure (1) Input: confguration parameter sets X � a 1 , a 2 , . . . , a n (2) performance indicator Y � m { }; number of trees n; feature labels labels (3) Output: sorted confguration parameter labels F � a 1 ′ , a 2 ′ , . . . , a n ′ (4) RFModel.construct() (5) RFModel.train() (6) let coefs � RFModel.feature_importance (7) let agg_feature_coefs � zip(coefs, feature_labels) (8) sorted(agg_feature_coefs) ALGORITHM 1: Confguration parameter selection module.
International Journal of Intelligent Systems enables TD3 to learn more accurate value estimates and improve the stability of the learning process. Another advantage of TD3 is its ability to handle the explorationexploitation tradeof efectively. In multimodel database tuning, it is important to balance the exploration of new parameter confgurations with the exploitation of already learned information. TD3 uses a replay bufer to store past experiences and a target policy network to estimate the value of future states. By using a combination of both these techniques, TD3 can efectively explore the search space while also exploiting previously learned information. Moreover, TD3 also has the advantage of being a model-free algorithm, meaning that it does not require a priori knowledge of the underlying system dynamics. Tis makes it particularly useful in the context of database tuning, where the relationship between tuning parameters and system performance can be complex and difcult to model accurately. By not requiring a model, TD3 can learn directly from the data and adapt to changes in the environment without needing to update a model. Tese features make TD3 a powerful tool for optimizing the performance of multimodel databases.
TD3 consists of two kinds of networks: an Actor and two Critics, each of which has a target network corresponding to it. Te Actor network takes the internal state indicator s obtained after the environment executes the workload as the input and outputs the actions in the size range 0 to 1. Ten, MMDTune+ obtains the confguration parameters according to the action mapping to overwrite the original confguration of the environment. As shown in Figure 2, the target network μ ′ is the same as the current network structure μ.
In the network, the activation functions LeakyReLU and Tanh are used to capture the nonlinear relationships between variables. Te average function value after Tanh is close to zero, which is conducive to the learning of neurons in the next layer. LeakyReLU can alleviate the situation of gradient disappearance and gradient explosion and speed up the convergence of the model to a certain extent. Finally, a dropout layer is added to the network to prevent overftting of the model and increase exploration of the confguration space.
TD3 has two Critic networks Q 1 and Q 2 , and corresponding two target Critic networks Q 1 ′ and Q 2 ′ . Teir network structure is the same, and they share the experience playback pool. Te design of the network structure is similar to that of the Actor strategy network, except that the weight update frequency is diferent from that of the Actor. Critic networks are used to evaluate the value of state action and guide Actor behavior according to value feedback.
Te Critic network structure is shown in Figure 3. Both Critic networks take the internal state s of ArangoDB and action a output of Actor network as input. After deeply connecting the network to learn the relationship between state and action, the fnal output is the evaluation Q value of state s and action a.
Te reward function is crucial for reinforcement learning because it determines the feedback between the agent and the environment. Te goal of the agent is to maximize the total revenue it receives, and the reward function must make the agent to achieve the goal while maximizing the revenue.
First, the performance time change rate with respect to the initial performance and the time is calculated respectively. Te computations are performed according to the following equations: According to equations (1) and (2), the reward function is shown in equation (3), where Δ t−1,t indicates the diference between the current performance and the ArangoDB performance under the default confguration, and Δ t,0 indicates the diference between the current performance and the historical optimal performance of ArangoDB.
According to equation (3), a nonnegative reward will be obtained only when the performance of the database is better than the initial state and the historical optimal performance at the same time. Considering that the ultimate goal of tuning is to achieve better performance than the initial setup, there is a need to reduce the impact of the intermediate process of tuning on the design of the reward function.
Terefore, when the result of Δ t−1,t is positive and Δ t,0 is negative, the reward must be set to 0.
Diferent tuning tasks may choose diferent tuning metrics. Terefore, a weight coefcient ω i is assigned to the tuning indicator to indicate the tuning direction, so that the tuning system can simultaneously tune multiple indicators. Tus, the fnal reward function is as defned in equation (4), 6 International Journal of Intelligent Systems where the sum of all ω is 1. In this study, the ratio of throughput and latency is the same, both 0.5.
Algorithm 2 describes the proposed tuning algorithm. First, the Actor networks, the two Critic networks, and their corresponding parameters of the target network are initialized according to the hyperparameters, and the structure and parameters of the target network are the same as those of its corresponding current network. Ten, before tuning, initialize the ArangoDB runtime environment, benchmark with ADBench in the initial confguration, and obtain the initial state of the database and external metrics. Ten, in the iterative training stage, the Actor network generates future actions according to the current moment and noise. Tis strategy gives the Actor network a stronger exploration ability of tuning. Tis ensures the diversity of the generated samples and reduces the possibility of local optimization of the algorithm. Finally, when the number of samples in the experience pool is greater than R, samples will be taken to train the network, and then, the parameters of the Critic network will be updated separately according to the gradient. Te corresponding parameters will be updated using the optimizer.

Benchmarking Platform.
Te benchmark measurement tool plays an important role in the confguration parameter tuning. Te status data and performance indicator data during the tuning process are obtained during the benchmarking of ArangoDB [51]. Database benchmarking tools focus on simulating the most realistic production environment workload and how to accurately measure the indicators refecting database performance. As a result, excellent database evaluation tools such as TPC-C [52], TPC-DI [53], YCSB [54], Sysbench [55], and UniBench [56] have been created, but they have some limitations.
Among them, YCSB is a relatively perfect benchmark, but it cannot monitor the system resource utilization indicator, and the workload type is relatively simple. UniBench is designed for multimodel database, but it does not support long-running and multithreading, and only one indicator can be monitored. To solve the above problems, we implement a benchmark measurement tool MMDBench for ArangoDB to perform stress tests and measure performance indicators. It consists of two main parts: the load generator and the metric collector. Te load generator generates the appropriate workload for the tuning task and launches the execution workload to the ArangoDB, which includes simple read and write operations, complex graph operations, and aggregation operations. Te metric collector collects  International Journal of Intelligent Systems statistics that refect database performance. Tis part integrates Prometheus to monitor cloud server resource utilization in real time.

Experiments
In Section 4.1, the relevant experimental environment and dataset are introduced. Ten, the various components of MMDTune+ are tested and analyzed in Section 4.2. In Section 4.3, we compare MMDTune+ with other existing works under diferent execution workloads.

Experimental Environment and Dataset.
We utilize four Aliyun Cloud servers to build an experimental environment, three of which act as the server to build a three-node cluster, and the remaining one serves as the client to access the database cluster. Table 1 shows the cloud server confguration.
In this study, we design related multimodel database operations on two datasets, including one social commercial network dataset and one hydrologic-related dataset. Te data structure and size of each dataset are shown in Table 2.
Te databases built by these two datasets both contain multiple data models. Te social business network includes the collection of customer, order, stamp post, product, invoice, and evaluation feedback. It also contains the graph formed by the existing relationships between these collections. Hydrologic datasets mainly contain geographic location information and sensor data, including provinces, cities, monitoring stations, and sensor data.
Finally, the multimodel database operations designed by MMDBench based on the above two datasets are shown in Table 3. Due to space limitations, only the multimodel database operations required in the experiment are listed.

Tuning Experiment and Result Analysis.
In this section, the efect of MMDTune+ tuning on ArangoDB is evaluated, and relevant experiments are carried out on the three modules included in it to verify its efectiveness and necessity.

Parameter Settings.
To verify the generalization ability of the TD3 algorithm in MMDTune+, we select diferent workloads to execute. Te workloads involved in the following experiments, and their corresponding execution (1) Initialize replay bufer R (2) if isExist(model) then (3) model.load() (4) else (5) Initialize actor network μ and critic network Q with weights θ μ and θ Q (6) Confgure multimodel database with C t (13) Perform workload q and observe new state s t+1 ⟵ cost(C t , q) and r t ⟵ reward(s t+1 ) (14) Push (s t , s t+1 , a t , r t ) into R (15) Sample a random mini-batch (s i , s i+1 , a i , r i ) from R (16) target if t mod d then (20) Update end if (24) s t ⟵ s t+1 (25) end for (26) P ⟵ a T (27) end for (28) Table 4. Considering the limitation of the number of CPU cores in the system, it is not the case that the larger the number of threads, the higher the throughput. Terefore, a unifed thread number of 10 is selected, which can be adapted according to the CPU version of the machine. Te parameters of the data request mode are selected according to diferent scenarios. Te execution parameters of the tuning algorithm TD3 are set as follows: the maximum step length of ofine training optimization is 1000, the size of the experience playback pool is 10000, and the number of samples is 16. We chose Adam as the optimizer for model training, with a learning rate of 0.0005 for the Actor network and 0.0001 for the Critic network. Te discount factor is 0.99, and the exploration strategy parameter is set to 0.2. Te update frequency of the target network is 2, and the soft update coefcient is 0.01.

Confguration Parameter Selection Experiment.
To verify the efectiveness of the feature sorting method based on the RF algorithm for tuning in MMDTune+, we conduct a set of experiments in workload W1 and tune ArangoDB by increasing the number of confgurations sorted by the RF algorithm.
As shown in Figure 4, with the increase in the number of confguration parameters, the throughput improvement rate increases, while the latency decreases continuously. Compared with the default confguration, the performance is signifcantly improved. Additionally, in the case of the same number of confgurations, the performance improvement of the confguration selected based on the RF algorithm is generally higher than that of the randomly selected confguration. Depending on the important feature parameters of the selected confguration, the RF algorithm can select a confguration with a high correlation of performance impact, making it easier for ArangoDB to achieve optimal performance. Figure 5 shows the convergence rate of the TD3 network model under diferent confgurations selected by the RF algorithm in the previous experiment. It can be seen that with the increase in the number of confgurations, the number of iterations of network training also continues to increase. Te reason for this situation is that the greater the  International Journal of Intelligent Systems number of confgurations, the more parameters in the model, the more complex the structure, and the longer it takes the model to reach consumption. When the number of confguration parameters ranges from 60 to 75, the increase in the number of confguration parameters does not signifcantly improve the tuning efect. Tis is because the increase in the number of confguration parameters does not signifcantly afect the performance. Terefore, to improve the efciency of tuning, one can select an appropriate number of confgurations for tuning when the tuning efect is met. In the subsequent throughput and delay tuning experiments, the confguration parameters recommended by the tuning algorithm were all 75 confgurations selected by the RF algorithm.

TD3 Ofine Training Tuning Experiment.
In this part, the TD3 algorithm in MMDTune+ will be used to tune the 75 confguration parameters selected above in the ofine phase, and the efectiveness of the TD3 algorithm for tuning ArangoDB confguration parameters will be further verifed. At the same time, the tuning data will be collected for use in the online tuning phase. First, we select workloads W2, W6, W7, and W9 as target workloads for ofine training tuning. Table 5 shows the performance changes of the above four workloads after tuning. Under these four workloads, the corresponding throughput and 99% operation latency have been optimized to diferent degrees. Terefore, it can be concluded that the TD3 algorithm can be applied to Ara-ngoDB confguration parameter tuning, and the performance improvement after tuning is also considerable. Tis is because the TD3 algorithm can adapt well to the highdimensional confguration space and recommend highquality continuous confguration parameters for the system. Meanwhile, the trial-and-error strategy adopted is similar to the DBA tuning strategy, which can continuously explore the confguration space and reduce the possibility of falling into local optima.

Workload-Aware Mechanism Tuning Experiments.
Without using a workload-aware mechanism, the training of the model or tuning process has difculty using historical experience for learning and requires the user to specify the  workload to learn. However, the artifcial selection of a similar workload for multimodel database applications is more complex. Terefore, MMDTune+ uses the k-means++ algorithm to cluster workloads to detect changes in workloads. Te fnal clustering result is shown in Figure 6. Historical workloads can be roughly divided into seven types. k-means++ can classify similar workloads into the same class by using distance.
Te specifc operation of detecting workload changes is that when MMDTune+ fnds that the current workload type is diferent from the newly arrived workload type, it can start to use the similarity calculation method to match and fne-tune the parameters using the pretraining model of the previous similar workload, and fnally recommend high-quality confguration parameters. Next, a new type of workload, W3, will be used to validate the subsequent actions that detected the change using a workload-aware mechanism. Figure 7 shows the order of correlation coefcients between the new workload W3 and the historical workload. Te most similar workload to the target workload is Q25, which consists of multiple queries. Next, the pretraining model of Q25 will be migrated to the current workload W3 to tune its parameters to verify whether the tuning process can take advantage of historical learning to improve tuning efciency.
In the ofine phase under the condition of not using the training model, the TD3 algorithm requires over 500 inclines to achieve convergence. For workload W3, in the use of load sensing history after the training model for fne-tuning results as shown in Figure 8, one can see a new workload tuning, and within 5 steps, one can achieve a good performance. Compared with the previous results, the overall tuning efciency is greatly improved, and the tuning efect also increases with the increase in the number of tuning steps.
Te following conclusions can be drawn from the above experiments. First, MMDTune + based on the deep reinforcement learning algorithm TD3 can learn and gain decision-making experience in complex environments. Second, the tuning algorithm can learn from past experience to improve the tuning efciency, and the workload-aware mechanism module of MMDTune+ is able to accurately match the empirical knowledge of the historical workload.
During the confguration tuning process, the results of each workload execution will be stored in the data warehouse by MMDTune+. As the confguration tuning system continues to execute, the historical data will increase, which will make the online tuning efect even better.

Comparison Experiment with Existing Tuning Algorithms.
In this section, we fully evaluate the efects of workloadaware mechanism tuning on all modules of MMDTune+ on multiple diferent workloads and compare the TD3  data preprocessing, workload-aware mechanism, confguration parameter generation, deployment and execution of benchmark evaluation, and model weight update. Te experimental results are shown in Table 6. Te consumption time of data preprocessing and the workload-aware mechanism of each algorithm are the same, so they will not be discussed. Te time complexity of machine learning algorithms is typically expressed in terms of big O notation, which describes the upper bound on the growth rate of the computational resources required to solve a problem of a given size. SVR is a linear regression algorithm that fnds the hyperplane that best separates the data into classes. Te time complexity of SVR is typically O (n 3 ), where n is the number of training samples. Tis makes SVR relatively fast for small datasets, but it can become slow for larger datasets. GPR is a nonparametric regression algorithm that models the relationship between the inputs and outputs using a Gaussian process. Te time complexity of GPR is typically O (n 3 ), where n is the number of training samples. Tis makes GPR relatively fast for small datasets, but it can become slow for larger datasets. DNNs are composed of multiple layers of artifcial neurons and are used for a wide range of tasks, including image classifcation, speech recognition, and natural language processing. Te time complexity of DNNs is typically O (mn 2 ), where m is the number of training samples and n is the number of neurons. Tis makes DNNs relatively slow for small datasets, but they can scale to large datasets. DDPG is a reinforcement learning algorithm that is used to train agents in control tasks. Te time complexity of DDPG is highly dependent on the complexity of the environment, and the number of interactions required to train the agent. As a result, the time complexity of DDPG can vary widely and can be difcult to estimate. TD3 is a variant of DDPG that is used to train agents in control tasks. Similar to DDPG, the time complexity of TD3 is highly dependent on the complexity of the environment, and the number of interactions required to train the agent. As a result, the time complexity of TD3 can vary widely and can be difcult to estimate. Te actual time complexity of these algorithms can vary widely depending on the specifc implementation, the size and structure of the data, and the hardware used. As seen in the millisecond level, the TD3 algorithm used by MMDTune+ is second only to DDPG in tuning efciency, but the diference is not signifcant, and the efciency is faster than other algorithms. SVR, GPR, and DNN are based on supervised learning methods. Tis type of algorithm requires each step to complete model training convergence and add Gaussian noise for exploration to meet the tuning requirements. Moreover, with the increase in training samples, its time consumption will gradually increase.
Meanwhile, we compare the tuning results of diferent algorithms with the proposed methods. First, experiments are carried out on workload W4 composed of multiple queries. Te workload contains three data models and diferent data operations, which can better represent the characteristics of a multimodel database. As can be seen from Figure 9, multiple tuning algorithms are able to optimize performance well, including extended related algorithms of other studies. On workload W4, the throughput is improved by more than 132.54%, and the 99% operation latency is reduced by more than 7%. Among them, DDPG and TD3 are better than SVR, GPR, and DNN which use regression characteristics, and TD3 is the best.   Related experiments were performed on write workload W5 and transaction workload W8. As shown in Figure 10, the throughput of workload W5 increased by 31.1% after TD3 tuning, and the 99% operation latency decreased by 25.55%. Other tuning algorithms also achieved good tuning efects, but they were slightly weaker than TD3, and the efect of W8 was similar to that of W5.
Te tuning results based on the TD3 algorithm were the best on most workloads, and its speed was not far from that of the most efcient DDPG algorithm, which is more efcient than other supervised learning methods. Tis is because reinforcement learning uses well-known exploration and utilization strategies through the interaction of agents with the database environment, which not only gives full play to the capability of the model but also explores a better confguration that has never been tried before and reduces the possibility of falling into local optima. Although DDPG is also a deep reinforcement learning algorithm, its cumulative errors have a negative impact on the tuning efect, which makes the tuning efect of this method inferior to that of TD3. As supervised learning algorithms, SVR, GPR, and DNN use the characteristics of regression to predict performance by using confguration, which makes them rely on a large amount of high-quality training data.

Limitations.
MMDTune+ realizes the performance optimization of ArangoDB, but there are still some problems that need to be further studied and solved. First, the MMDBench benchmarking tool requires improvements as it currently only conducts tests on ArangoDB, a multimodel database, and simulates a limited number of real-world scenarios. Currently, MMDTune+ has only been applied to one tuning object. As the number of data models and storage engines increases, the search space of tuning parameters can become increasingly large, making it more challenging to fnd an optimal solution. Te performance of the MMDTune+ may depend on the specifc characteristics of the database, such as the number and distribution of data types, the size of the database, and the complexity of queries. Diferent types of queries and data access patterns may require diferent tuning parameters, and it may be difcult to identify a single set of parameters that works well across all possible workload scenarios.

Conclusion
Te emergence of multimodel databases provides a new solution to efectively address the defciencies of traditional database. ArangoDB is a widely used multimodel database. However, ArangoDB has a difcult problem with confguration parameter optimization, which needs to be tuned for the actual workload or application. At the same time, the tuning is not only for a single workload but also for the dynamic change of the workload exploratory automatic tuning, and how to sense the changes of the workload and further use the historical experience to improve the tuning efect is particularly important. Under the above background, we study the confguration parameter tuning of the multimodel database ArangoDB.
Aiming to address the ArangoDB confguration parameter tuning problem, an ArangoDB confguration parameter tuning tool MMDTune+ combined with a workload-aware mechanism is proposed, which includes a confguration parameter selection module, workloadaware mechanism, and tuning algorithm. Te confguration parameter selection module uses the random forest algorithm to select confguration parameters with high correlation to the tuning indicator, to meet the performance improvement rate, and to improve the tuning efciency. Workload-aware mechanism is based on k-means ++ and Pearson correlation coefcients to detect workload changes and match historical empirical data or pretraining models of similar workloads and migrate them to the current task to improve performance gains and efciency. Finally, based on our improved TD3 algorithm, we perform database tuning for ArangoDB.
Te current limitations with our existing work include insufcient workload, a single tuning target, and a lack of targeted optimization for diferent tuning targets in the TD3 algorithm. In future work, we can apply the confguration parameter tuning algorithm to other multimodel databases, build more workloads with more diverse datasets, and simulate more realistic complex scenarios.

Data Availability
Te data used to support the fndings of this study are included within the article.

Conflicts of Interest
Te authors declare that there are no conficts of interest regarding the publication of this paper.