Applying Bayesian Optimization for Machine Learning Models in Predicting the Surface Roughness in Single-Point Diamond Turning Polycarbonate

Faculty of Mechanical Engineering and Mechatronics, PHENIKAA University, Yen Nghia, Ha Dong, Hanoi 12116, Vietnam PHENIKAA Research and Technology Institute (PRATI), A&A Green Phoenix Group JSC, No. 167 Hoang Ngan, Trung Hoa, Cau Giay, Hanoi 11313, Vietnam School of Mechanical Engineering, Hanoi University of Science and Technology, Dai Co Viet Street 1, Hanoi, Vietnam Faculty of Engineering, Vietnam National University of Agriculture, Gia Lam, Hanoi 100000, Vietnam Faculty of Mechanical Engineering, University of Economics and Technology for Industries, 456 Minh Khai Str., Hai Ba Trung District, Hanoi 100000, Vietnam


Introduction
e elements used in optical applications are mostly made of glass, crystalline materials, polymers, or plastic materials. Properties such as degree of transparency, refractive index, and spectrum are the deciding factors for the choice of optical materials [1]. In recent years, polymer optics have become increasingly widely used. ey offer advantages over traditional glass materials in different applications, due to their lightweight, low cost, high impact resistance, and flexibility in complex geometric design, despite their lower optical quality in comparison with glass materials [2]. Recently, various methods have been developed to massmanufacture polymers for optics application such as roughness and vibration during the turning of AISI 5140 within an accepted range of error of 10%. ey also found that the feed rate was the most affecting parameter in increasing the surface roughness, which is in accordance with Krolczyk et al. [10] in turning steel. For machining of alloys, Gupta et al. [12] have constructed an optimization procedure to estimate the machining responses of nickel-based superalloy. e optimization problem has been conducted using a combination of response surface methodology, particle swarm optimization, and teaching learning-based optimization techniques. Different parameters have been considered including cutting speed of machining, feed rate, and cutting tool angle, whereas the machining responses were cutting force input, the potential of tool wear, surface roughness, and the length of tool-chip contact.
Recent research has suggested that machine learning may accurately predict surface roughness in the turning process. Eser et al. [13] have estimated the surface roughness of AA6061 alloy in milling using artificial neural networks and response surface methodology. It should be noticed that in such a study, the impact of the cutting parameters on the prediction has been characterized by using variance analysis [13]. In another work, Elangovan et al. [8] built a Multiple Linear Regression (MLR) model to predict surface roughness on the basis of input parameters: feed rate, depth of cut, spindle speed, flank wear, and vibration signal. e artificial neural network (ANN) method has been applied to predict the roughness for different cutting parameters (cutting speed, depth of cut, and feed rate) [14][15][16][17].Özgören et al. [18] have employed the ANN technique to predict the power and torque values obtained from a beta-type Stirling engine. e best ANN's architectures have been determined such as 5-13-9-1 and 5-13-7-1, respectively, by using Levenberg-Marquardt learning algorithm [18]. In recent years, besides ANN modeling, some new ML methods have been introduced: Support Vector Regression (SVR), Gradient Boosting Regression (GBR), Linear Regression (LR), and Random Forest Regression (RFR) [19][20][21]. Pimenov et al. [22] have tested different machine learning models such as random forest, standard multilayer perceptrons, regression trees, and radial-based functions, for the prediction of surface roughness deviations from face milling machining processes. It is worth noticing that, in [22], the final machine learning prediction model has been developed in an automatic real-time manner for the machining processes. In addition, comparative research into ML models has been carried out to find the best in output prediction. e results for the SVR, polynomial regression, and ANN models indicated that ANN performs best in predicting lifetime but also worst in predicting cutting force and Ra [21]. is shows that not only does the regression of various machine learning models yield different results, but also a specific model has a certain effect on one type of output. erefore, this research sets out to investigate the application of SVR, Cat Boost Regressor (CAT), XG Boosting Regressor (XGB), Decision Tree Regressor (DTR), GBR, and ANN on surface roughness prediction. e paper is organized as follows. Section 2.1 introduces the database used in this study, while Section 2.2 outlines the research methodology. Section 2 gives a brief overview of all the machine learning models used in this work. Finally, Section 3 provides the results and discussion thereof.

Data Collection and Analysis.
In this study, the results of surface roughness measurements for 35 experiment runs are harvested from the available literature (Bolat [23], published in open access mode). In Bolat [23], the PC sheet was cut into a workpiece whose diameter was 30 mm and thickness was 10 mm. All workpieces were numbered as shown in Figure 1(a). en, the workpiece was mounted inside a fixture, and the material of the fixture was not affected by temperature changes (shown in Figure 1(b)). During the machining process, the mixture between Kerosene and air was used as cutting lubrication. e monocrystalline diamond tool S95843 was mounted on a tool holder (Figure 1(c)). According to the tool numbering for monocrystalline of C0.5mLG, the cutting tool has properties such that the type of radius is controlled waviness tool, noise radius is 0.5 mm, the top rank angle is 0°, and front clearance angle is 10°. To implement the experiments, the Precitech Freeform 700 U four-axis diamond turning machine tool was used for 35 experiment runs with feed rate, spindle speed, and depth of cut being 1-12 μm/rev, 1000-2250 rpm, and 3-50 μm, respectively, as shown in Table 1.
e surface roughness prediction dataset consists of two subdatasets. ese subdatasets were combined to make the prediction data. Table 1 shows all these 35 measurement results, with seven attributes. ese attributes contain information about six independent variables-feed rate, cut depth, spindle speed, and respective vibrations of X-, Y-, and Z-axis-and a dependent variable: surface roughness.
To measure the roughness of the finishing workpiece, the white light interferometer method was used by the Zygo NewView 5000 device with a vertical resolution of 0.1-0.2 nm. e surface roughness of the finished workpiece was measured in terms of three different positions. e measured values of surface roughness shown in Table 1 are root mean squared of three measured results obtained from the interferometer measurements.

Methodology.
e data used in this study have been extracted using exploratory data analysis (EDA). e resulting dataset has been standardized by the log(1 + x) function to ensure a consistent scale and distribution for all variables. e input data are divided into training and testing datasets, respectively, representing 78% and 22%, corresponding to 27 trials for training and 8 trials for testing, respectively. e training of the models consisted of six regression methods: multilayer perceptron neural network (MLP-NN), SVR, CAT, XGB, DTR, and GBR. e testing dataset was used to validate the model. To assess the performance of the proposed model, various error metrics-RMSE, MAE, and R 2 -were employed. Figure 2 presents the flowchart of the model used in this paper: dataset extraction, feature selection, and different combinations of the dataset made to predict surface roughness. e figure also shows how to fine-tune the model parameters by using the Bayesian optimization algorithm to seek the best parameter for each model and then to determine the best model to predict Ra. Finally, the results and observations are discussed in Section 3.

Artificial Neural Network (ANN).
ANNs are complicated computational models inspired by biological neural networks, which are capable of regression, classification, and pattern recognition.
ere are different ANN-type algorithms, such as backpropagation neural networks [24], probabilistic neural networks [25], convolutional neural networks [26], time-recurrent neural networks [27], and long short-term memory networks [28]. With straightforward and original characteristics, the multilayer perceptron (MLP) model has been chosen in this study. MLP consists of three main layers, fully connected: the input layer, hidden layer, and output layer [29]. anks to its properties, MLP has been used to predict tool wear flank and surface roughness [30][31][32].
e main advantages of the ANN model are the capability to work with any type of input data (complete or incomplete) [33]; the information is stored on the entire network instead of a database and the capability of parallel computing, which helps to reduce the computational time. On the other hand, some drawbacks of this approach need to be mentioned such as the hardware dependence, which requires a lot of computer resources when the input data is large. In addition, the behavior and the duration of the network are hard to control, which requires a lot of trials.

Support Vector Machine (SVM)
. Support Vector Machine (SVM) theory was developed by Cortes and Vapnik [34], and a version of an SVM for regression was launched in 1997 [35]. Support Vector Machine consists of different branches in which Support Vector Regression (SVR) is an important application. Many ML algorithms follow the principle of empirical error minimization, while SVR follows the principle of structural risk minimization across a restricted range of learning patterns, so it can obtain better generalization [36]. SVR is a distinguished analytical tool. It is applicable because it uses linear improvement techniques to seek out optimal solutions to nonlinear prediction issues in relation to higher-dimensional features. erefore, it has been widely used for forecasting in the fields of finance, agriculture, hydrology, the environment, etc., and especially, in mechanical machining [21,44,45]. Consequently, SVR is an appropriate model to predict Ra in ultraprecision machining PC for applications in optics.
In terms of advantages, SVM can work really well with high-dimensional input space and is relatively memory efficient. In terms of drawbacks, unlike ANN, SVM is not suitable for large datasets; it does not perform well with any type of data (for example, data set with more noise).

Cat Boost Regression (CAT).
Most popular implementations of gradient boosting use decision trees as base predictors. It is convenient to use decision trees for numerical features, but, in practice, many datasets embody categorical features that are vital for prediction. CAT is a novel gradient boosting technology, developed by Yandex. It is associated with improved implementation within the gradient boosting tree algorithmic framework. is framework relies on a symmetrical decision tree algorithmic rule with few parameters, support for categorical variables, and high accuracy [46]. CAT improves the accuracy of the algorithm and its generalizability [47]. It has been successfully applied in many fields such as weather forecasting, media popularity prediction, evapotranspiration, and biomass [48,49]. It is for this reason that the model is applied here to predict the performance of ultraprecision machining. e advantages of gradient boosting approaches are their predictive accuracy compared to other machine learning models. ese types of approaches have a lot of flexibility, which means that they can be optimized using different loss functions or several hyperparameter tuning options that make the function fit flexible. ey can work directly with the input data, which means that no data preprocessing is required. In terms of drawbacks, gradient boosting approaches are usually computationally expensive. e minimization of errors in these approaches can cause overfitting, and the influence of parameters is quite heavy on the behavior of the approach.

Decision Tree Regression (DTR).
Decision trees (for classification and regression) are classic ML algorithms. As a group, their learning ability is not outstanding, but they are well known for their generalizability and feature filtering. When used for regression tasks, they are called regression trees [50]. As the number of iterations increases, the model continues to learn. e training is stopped when triggered by hyperparameters such as the number of selected features, the maximum depth of the tree, and the minimum sample size of branches.
Compared to other algorithms that require data preprocessing, the decision tree algorithm requires less effort in this process. A decision tree does not require normalization or scaling of data. e decision tree algorithm can also work with incomplete data. In terms of disadvantages, a small change in data requires a lot of changes in the structure of the decision tree, which can eventually cause instability. e computational time of the decision tree is often expensive, especially when training the model.

Gradient Boosting Regression (GBR). Gradient Boosted
Trees (GBTs) are a set of DTs, whose results are a combination of predictions of base models. DT-based ensembles like GBT have often been used in regression and classification problems, as they perform well [51]. GBT is an iterative algorithm, which means that each tree can take account of the error in the previous one. e final result of the GBT is the mean of the predicted results from all trees.

Extreme Gradient Boosting Regression (XGB)
. XGB is the most popular ML algorithm, developed in 2015. Regardless of the data type, it is well known to provide better solutions than other ML algorithms, because of its rapidity, efficiency, and scalability [52,53]. It has been the focus of research in various fields [54][55][56]. In particular, in mechanical machining [52,57,58], XGB is a good choice to predict tool wear and surface roughness. XGB is used for supervising learning problems, where we use the training data (with multiple features) x i to predict a target variable y i . Before we learn about trees specifically, let us first review the basic elements in supervised learning. Below are the analytical formulas for the regression math on which XGB is based [52,59].
Let D � (x i , y i ) be a dataset composed of n samples and m features |D| � n, x i ∈ R m , y i ∈ R. A number of kadditive functions f k (x) are used in tree ensemble models to evaluate the function ∅(x i ). is function can be expressed as [52] where k represents the number of trees, x i is the i th iteration of the training process, and f k denotes the decision rules of Mathematical Problems in Engineering 5 the tree and weight of leaf score. e regression space F can be expressed as follows [52]: with q, T, and w representing the tree structure, the number of leaf nodes, and the corresponding weight, respectively. e errors of the model can be minimized using a regularized objective function, as shown as follows [52]: with ψ being a differentiable convex loss function, in terms of regression type, which can be the mean squared error function, while Ω is the regularization term that penalizes the complexity of the model to avoid overfitting, defined as [52] Ω f k � cT + 0.5λw 2 .
c denotes the complexity, and λ is a constant coefficient. Since XGB is an additive algorithm, the prediction F K (x i ) of the i th instance at the k th iteration can be expressed as follows [52]: e main objective of XGB is to determine an additive function f k that minimizes the objective function using the gradient descent optimization algorithm.

Bayesian Algorithm Optimization.
In XGB and CAT, the main goal of hyperparameter optimization (i.e., tuning) is to minimize the objective function defined in equation (5).
ere are two popular hyperparameter optimization methods: random search (RS) and Bayesian optimization (BO). In random search, the hyperparameters are randomly chosen from the predefined search domain, and the searching is independent of the previous boosting result [60,61]. e main advantage of RS is that it can be applied to high-dimensional problems. Bayesian optimization can be considered a probability approach, using probability  theories to optimize the hyperparameters [62]. In this work, we have chosen Bayesian optimization because of its performance, which has been demonstrated in previous studies in the literature [63,64]. For simplicity's sake, in the rest of the paper, we use XGB_opt and CAT_opt to denote the XGB and CAT models built by using BO, respectively. Before GBR and XGB training, we need to initialize the model's hyperparameters. However, the selection of network hyperparameters, based on experience or RS such as GridSearchCV and RandomizedSearchCV, currently requires a large number of attempts. On the other hand, to optimize the model's performance and reduce computational time, the hyperparameter screening process must be optimized-this is the main purpose of using BO. e Bayesian optimization framework utilizes historical data to optimize the search domain and constantly predict the posterior piece of information [65]. In particular, suppose that we have a functional relation between the hyperparameters and loss function: where p is the set of all hyperparameters in which p is the set of hyperparameter combinations, p * is the optimal parameter combination obtained from the final optimization, and loss(p) is the objective function. In the proposed model, the hyperparameters are maximum tree depth (D), number of nodes in each tree (c), number of trees (K), learning rate (η), regularization parameter (λ), and number of samples (N), as introduced in equations (1)-(4). e loss function is defined by the RMSE as [65] loss p j � where p j is the j th hyperparameter combination, y is the true value, and y i (p j ) is the model output results obtained using the j th hyperparameter combination p j . e next step of BO is to construct the dataset D � (x i , y i ) , where x i is the i th set of hyperparameters and y i is the corresponding error of the model output [65]: e posterior probability P(y|x, D) is denoted as D. A Gaussian distribution is applied to the alternative model M whose mean and variance are denoted as µ and K, respectively. e specific functional expression M is obtained by fitting the dataset D [65]: Based on M, the next observation is calculated using the acquisition function α(p) [65]: In the Bayesian decision theory, the capture function works by calculating the expected loss corresponding to the hyperparameter space p loss. In each iteration, the dataset D is updated by receiving the parameters and losses from the previous one. e main characteristic of BO is model construction based on historical data to optimize the hyperparameters for each model [66].

Performance Assessment Criteria.
ree statistical metrics have been used in this study to assess the performance of the proposed AI model in predicting surface roughness: root means square error (RMSE), mean absolute error (MAE), and coefficient of determination (R 2 ). ese metrics are defined as follows [67][68][69][70]: where y i and y i are the measured and predicted values, respectively, and N is the total number of predicted data points. Higher values of R 2 represent the better performance of the model. On the other hand, the better performance of the model is reflected by lower values of RMSE and MAE. When contrasting the values of metrics, we will prioritize the evaluation criteria selected in RMSE, because this is a more suitable method than MAE when the model error follows a normal distribution. Moreover, RMSE has a distinct advantage over MAE in that RMSE avoids using absolute value, which is highly undesirable in many mathematical calculations [71]. erefore, if comparing the prediction accuracy of various regression models, then RMSE is a better choice as it is simple to calculate and differentiable. Moreover, a higher value of R 2 is considered desirable.

Prediction Accuracy of Various Baseline Models.
e analyzed performance of MLP-NN, SVR, CAT, XGB, DTR, and GBT baseline regression models in terms of Ra prediction for diamond ultraturning is reported in this section. Table 2 shows the results of the various models acting on the training and testing datasets, sorted in ascending order of RMSE for the testing dataset. It can be seen that the predicted Ra varies considerably from one model to another. Using the testing dataset, XGB exhibits the best performance, in terms of all error metrics-this model yields the highest value of R 2 and the smallest values of RMSE and MAE. With the training dataset, XGB also exhibits similar performance, with the highest R 2 . e DTR model performs best in MAE with the training dataset and best in RMSE with the testing dataset. On the contrary, DTR exhibits the worst performance out of all models, in terms of both the training and testing datasets.

Mathematical Problems in Engineering
In this work, a comparison between different activation functions for the MLP-NN model was performed with initial parameters, layers (32, 16), optimizer ("Adam"), and activation ("relu", "identity", "sigmoid", "tanh", and "logistic"). Table 3 shows the results of using different activation functions acting on the assessment criteria using training and testing datasets, respectively. It should be noticed that parametric study was only conducted on activation functions in the present study.
As shown in Table 3, the "identity" activation function exhibits the best performance, in terms of all error metrics-this model yields the highest value of R 2 and the smallest values of RMSE and MAE. Figure 3 illustrates the performance of the six models on metrics on the training dataset. Each color of the bars corresponds to one metric. e ranking of the models is marked above each bar. As can be seen from Figure 3, there is a slight difference between the rankings in relation to the training and testing datasets. With the training dataset, the accuracy rankings with respect to the metrics of RMSE are GBT, XGB, DTR, CAT, MLP-NN, and SVR, while with metrics of MAE, the order is GBT, XGB, DTR, CAT, MLP-NN, and SVR. e rankings with metrics of R 2 are XGB, GBT, DTR, CAT, MLP-NN, and SVR.
However, Figure 4 shows that there is a small difference in terms of performance ranking when using the testing dataset. Furthermore, the performance ranking varies as a function of error metrics. e accuracy ranking with RMSE metrics in increasing order is XGB, CAT, GBT, MLP-NN, SVR, and DTR, and that of MAE in increasing order is XGB, GBT, CAT, DTR, MLP -NN, and SVR, while ranking results with R 2 metrics are XGB, CAT, MLP-NN, GBT, SVR, and DTR. Considering the performance using both the training and testing datasets, it can be concluded that XGB yielded the best performance whereas DTR exhibited the worst performance out of these models. e line and scatter plots of the measured Ra and the predicted values found by the six ML baseline models with the training and testing dataset are presented in However, as discussed above, RMSE is the preferred criterion for selecting better models, in relation to the testing   dataset. erefore, to continue to improve the prediction accuracy for Ra, we choose 2 models: XGB and CAT, for the next procedure, in keeping with the approach illustrated in Figure 2.

Description of the Optimization Problems.
e XGB and CAT methods require tuning hyperparameters to prevent overfitting and improve model performance. Table 4 presents the hyperparameters of the XGB and CAT models. As mentioned above, XGB and CAT belong to regression with a large number of hyperparameters. e value of the parameter is crucial and so must be carefully selected. However, to date, only heuristic methods have been put forward.
As discussed in Section 2, the overfitting in XGB and CAT is negated by optimizing their hyperparameters using the Bayesian optimization approach. To save computational time, in this paper, we focus only on the hyperparameters that have a significant effect on the model performance, as found in previous studies.
e two optimization problems (one for XGB and one for CAT) of this study are described in Table 4. For XGB, the decision variables are learning_rate, max_depth, subsample, colsample_bytree, reg_alpha, max_leaves, gamma, and min_child_weight. On the other hand, for CAT, the decision variables are learning_rate, depth, bagging_temperature, and num_leaves. e inferior and superior bounds of these variables are also given in Table 4. In this work, the objective of the optimization problems is to minimize the value of RMSE between predicted and experimental data points.

Hyperparameter Tuning with Bayesian Optimization.
e settings of the Bayesian optimization algorithm search domain are derived from historical data and also from initial tests [72][73][74]. In terms of other hyperparameters such as "n_estimators" (number of boosted trees), "min_-child_samples" (minimum number of data points needed in a leaf ), and "subsample_for_bin" (number of samples for constructing bins), default settings in Python have been applied [54]. Table 5 presents the evolution of the optimization procedure for XGB, for instance, whereas Table 6 indicates the best value found for each hyperparameter of the two methods XGB and CAT, respectively.
As discussed above, the objective of this study is to find the best model for predicting surface roughness. erefore, XGB and CAT have been selected for performance comparison, with the Bayesian optimization algorithm being applied to each model. Table 7 demonstrates the comparison models under the metrics of RMSE, MAE, and R 2 found by XGB and CAT, based on their hyperparameters in the best in the BO and the default in the baseline models. By comparing, we found the effect of hyperparameters on the metrics of the models. e correlation exhibits a promising RMSE and MAE in XGB_opt, achieving the lowest values shown in Figures 7 and 8 . Moreover, the metric of R 2 significantly increases from 0.9999 to 1 in performance on the training dataset, and for the testing dataset, it increases from 0.7227 to 0.8924. e line and scatter plots of the measured Ra and the values predicted by the optimized models XGB_opt and CAT_opt on the training and testing datasets are presented in Figures 9 and 10 for each trial, respectively. e different colors of the scattered points in the figures represent the values predicted by different baseline models. In the training and testing dataset, the predictions of the XGB_opt model are the closest to the original measured values of Ra. e value predicted by the CAT model does not closely correspond to the original values of Ra measured in both datasets.
However, as discussed above, RMSE is the preferred criterion for selecting better models in relation to the training dataset. erefore, to continue to improve the prediction accuracy for Ra, we choose 2 models: XGB and Mathematical Problems in Engineering    Mathematical Problems in Engineering 11 CAT, for the next procedure, in keeping with the approach illustrated in Figure 2.

Discussion
With the rise of modern technology, in the production process of PC for optical applications by single-point diamond turning, the equipment maintains a stable operating state, so the production process is also carried out stably. erefore, for given strip specifications, most turning processes will obtain relatively stable datasets without wide variability. e key point of prediction models is to improve prediction performance. e main goal of this paper is to determine the optimal predictive model in relation to surface roughness, by comparing the performances of different models. In addition, in order to preserve the principal characteristics of the considered models, in this paper, we have not coupled those models with other optimization algorithms. en, we performed hyperparameter tuning by using BO to find the best model.
By executing the XGB, we can determine which features contribute most heavily to the result. In short, the importance of each feature can be found and plotted. e feature importance graphs for each dataset are plotted in Figure 11. From Figure 11, we can clearly see that Z-axis vibration is the most important feature which has contributed to the prediction of the results, followed by the Y-axis vibration and Xaxis vibration. is result is in accordance with Kara and Bayraktar [75], in which experiments showed that the surface roughness increased while increasing vibration of the machining tool (i.e., increase of the cutting speed). For instance, in Kara and Bayraktar [75], the cutting speed exhibited 42.14% of the influence on the surface roughness using the analysis of variance. However, it is worth noticing that the contrary is not confirmed (low value of vibration does not imply good surface quality) Bolat [23]. Also shown in Figure 11, the feed rate is a crucial variable highly affecting the surface roughness of the material. Kara and Bayraktar [75] and Krolczyk et al. [10] have also found similar results in their works. It has been obtained in these studies such that a low value of feed rate provides a small value of surface roughness and vice versa. An important point should be noticed such that there is an influence of feed rate on the surface roughness and energy consumption. Such consideration has not been conducted in the present study. Moreover, another limitation of the present study is that the lack of confirmation tests. Such procedures should be carried in order to assess the performance of the prediction model. Finally, analysis of variance should be processed in further studies to reveal how the input variables influence each other and the surface roughness. Such information is crucial to control the turning process.

Conclusions
In this work, six machine learning models-namely, ANN, SVR, CAT, DRT, GBT, and XGB-have been applied to predict surface roughness in the SPDT process. We have compared the performances of the six models using experimental data. ree quality assessment metrics (RMSE, MAE, and R 2 ) have been used to evaluate the performance of each model. Subsequently, hyperparameter optimization using BO has been applied. e results are summarized as follows: (1) e result comparison has shown that there is little difference in terms of ranking when using testing RMSE. For example, the values of RMSE for XGB, CAT, GBT, GBT, MLP-NN, SVR, and DTR are 0.4033, 0.4704, 0.4812, 0.4829, 0.5032, and 0.5712, respectively. However, it has been concluded that overall, XGB, and CAT perform better, followed by GBR, SVR, and DTR, respectively (2) e results in relation to hyperparameters using BO stability show inconsistency in the evaluation metrics. XGB_opt performs best with RMSE, MAE, and R 2 in both the training and testing datasets. It can be seen that using BO has improved significantly the performance of the 2 considered models, especially CAT. Indeed, the value of R 2 when using CAT has increased from 0.9565 to 0.9893 for the training dataset and from 0.6229 to 0.7355 for the testing dataset. e value of MAE when using CAT has decreased from 0.1224 to 0.0579 for the training dataset and from 0.3503 to 0.2870 for the testing dataset.
e value of RMSE has decreased from 0.1594 to 0.0792 for the training dataset and from 0.4704 to 0.3940 for the testing dataset (3) Comprehensively considering the prediction accuracy of the six models, XGB appeared to be the best model for predicting surface roughness. Indeed, this model has shown strong performance in terms of different qualitative estimators. For example, XGB exhibited the highest values in terms of R 2 for both training and testing datasets. In terms of other estimators such as RMSE and MAE, XBG also produced one of the best values. In addition, XGB also performs best when combined with Bayesian optimization algorithm (4) We can clearly see that machine vibration is the most important feature in contributing to the prediction of the results-especially, vibration on the Z-axis, followed by the Y-axis vibration and X-axis vibration. e least important feature is the depth of cut (5) Overall, out of six initially proposed models, we have succeeded to filter out the 2 best models based on different qualitative estimators such as RMSE, MAE, and R 2 for predicting the surface roughness in turning polycarbonate for optical application. In addition, we have improved the performance of these two best models using the Bayesian optimization algorithm Certainly, further studies should be conducted to enhance the results of the present paper. From an experimental point of view, confirmation tests should be carried out in order to test the performance of the proposed method in real conditions. Besides, working parameters such as feed rate, cut depth, spindle speed, and cutting speed should be related to the surface roughness by an explicit equation, providing an easy and direct application by engineers/researchers in practice.
is means that the proposed machine learning model should be employed to derive such an equation in further research studies. From a computational point of view, uncertainty quantification should also be performed, in order to propagate the variability of the experimental database. Besides, the number of data points for training and testing the models was not large, which may yield some unexpected effect in the results. For such a reason, more data points should be collected in further studies, or experimental tests should be conducted by the researchers' group. Vibration of X-axis (mgRMS) VY:

Mathematical Problems in Engineering 13
Data Availability e Excel data used to support the findings of this study are available from the corresponding author upon request.