Forecasting the Friction Coefficient of Rubbing Zirconia Ceramics by Titanium Alloy

Te thermal issues generated from friction are the key obstacle in the high-performance machining of titanium alloys. Te friction between the workpiece being cut and the cutting tool is the dominant parameter that afects the heat generation during the machining processes, i


Introduction
One of the main reasons for energy wasting in milling machines is friction [1].Friction is the key reason for the heat generated during the metal cutting process [2,3].However, friction is inherent to any rubbing surface.It is usually uncontrolled and undesirable due to the corresponding collateral damages regarding the tool deteriorations, surface thermal damages, unwelcome energy consumption, etc.It is scientifcally proven that about 20% of the lost energy in the world is consumed through friction [4].Tus, building a predictive model to forecast the behavior of friction experiments is crucial in saving the time and cost of such experiments (e.g., universal mechanical tester (UMT) pin-on-disc universal tribometer).Te UMT pin-on-disc universal tribometer is one of the wide range of machines that are used to perform frictional tests.
Nonetheless, friction as a phenomenon is greatly afected by many parameters such as the applied load, sliding velocity, the rubbing materials, the temperatures, and the humidity.[5,6].Terefore, predicting the friction coefcient is a complicated mission.Tus, an efcient model of predicting the friction coefcients based on machine learning or deep learning can represent the key in estimating the cutting tool operating life, estimating the critical contact temperature (generated from the friction) to avoid thermal damages, maintaining the sustainability of the energy by keeping the friction coefcient in specifc ranges, etc.
Te eforts to predict the parameters related to the friction experiments have been conducted in a few pieces of research [7][8][9].All of these eforts include a numerical method to simulate friction experiments.To the best of the authors' knowledge, there is no research conducted to predict the friction coefcient and energy consumption during the machining operations or no research utilized machine learning techniques for the sake of friction coefcient prediction.An efcient friction coefcient prediction model can reduce the machining costs by decreasing the number of defective products due to thermal damages and the tooling by extending the operation life.Furthermore, energy consumption is inherent to the friction coefcient.Tus, maintaining the friction coefcients within specifc limits can decrease energy consumption signifcantly, which is refected positively in the economy, environment, and energy savings.Tese issues motivate the current work.All of these points motivated this work to predict the friction coefcients.
Tere are several applications for forecasting the friction coefcient in a cutting machine [6,10].For instance, a lubrication system integrated into the machining cycle could be implemented.A closed-loop control system could be used to control the friction coefcient within the safe limit.Te corresponding forecasting model is responsible for evaluating the friction coefcient and forecast whenever the friction coefcient exceeds the safe limit.Te control system can readjust the lubrication parameters (lubricant pressure, fow rate, etc.) to lessen the friction before the failure of the tool and the workpiece.
In another example, forecasting the friction coefcient can help in reducing the power consumption of the experiment.For instance, if the cutting machine should work for 20 minutes, then the force sensor should work for 20 minutes as well to monitor the friction coefcient.Using the forecast model, the force sensor and the forecast model can work interchangeably (i.e., sensor force works for only ten minutes); thus, the time the forecast model is working the force sensor can be turned of.In addition, turning of the force sensor for a while and then reusing this sensor improves the sensor's accuracy.
In this context, we proposed framing the task of forecasting friction coefcient in the UMT pin-on-disc universal tribometer experiments as a time series forecasting task.Te friction coefcient changes over time in frictional experiments.Tus, we proposed designing models to capture the behavior of changes in the friction coefcient values over time and then forecast future values.In the frst step, we generated a dataset of friction coefcient values.Tese values are obtained by attaching force sensors to the UMT pin-on-disc tribometer and then an experiment was conducted to collect the friction coefcient values at diferent time points (e.g., every millisecond).
In this vein, we proposed designing a predictive model of the friction coefcient of milling machines.We started by recording the friction coefcient between the titanium alloy (i.e., Ti-6Al-4V) against zirconia ceramics (i.e., ZrO 2 ).Ten, we proposed utilizing the autoregressive integrated moving average (ARIMA) statistical model for forecasting purposes.Te main drawback of the statistical models is that there is no mechanism for updating these models.Terefore, if there are new patterns in the data, then the statistical model should be rebuilt.On the other hand, deep learning-(DL-) based forecasting models support updating the model's weights as the data pattern changes.Tus, we proposed harnessing the DL methods due to its capability of updating and identifying nonlinear and complex patterns in diferent domains (e.g., weather forecasting [11], energy markets [12], and e-commerce products [13,14]), where historical observations of a variable are analyzed to introduce a model to describe the underlying relationship.Te proposed DL-based model is built using the gated recurrent unit deep neural network (GRU-DNN) architecture, a variation of the well-known recurrent neural network (RNN) architecture, which has shown impressive performance in the time series forecasting feld [15,16].
Te contributions of this work can be summarized as follows: (1) To our knowledge, we proposed the frst DL-based model (i.e., GRU-DNN model) and the frst statisticalbased model (i.e., ARIMA model) for forecasting the friction coefcient in cutting machines.Te proposed GRU-DNN architecture and ARIMA model's parameters were designed to ft friction coefcient data patterns.Tis model is utilized in scenarios where the forecast model needs to be updated.(2) Tis work provides a publicly available friction coefcient dataset that can be used to improve the task of predicting friction coefcient.(3) Te proposed forecasting models are evaluated thoroughly using four diferent scenarios on four diferent evaluation metrics.Te obtained solutions outlined the efciency of the proposed solution to forecast accurate friction coefcient values.
Te rest of the paper is organized as follows.Section 2 discusses the existing machine learning-based methods utilized for handling the predictive tasks related to metal cutting tools.In Section 3, the background is discussed.Section 4 exposes the proposed methodology.Te evaluation of the proposed prediction model is presented in Section 5. Discussion is shown in Section 6.Finally, the paper is concluded in Section 7.

Related Work
Tis section discusses the research eforts undertaken to explore phenomena associated with cutting tools.Te employment of machine learning methods in these attempts 2 International Journal of Intelligent Systems demonstrates variability, with some studies using such algorithms and others do not [17][18][19].Tese phenomena have a substantial impact on various difculties, including the reduction of dimensional accuracy in the cut surface, tool breakage, and machine downtime.Song et al. [17] proposed a predictive model for estimating cutting forces in carbon fber reinforced polymer (CFRP) materials based on nonlinear regression analysis.Saha et al. [18] presented an energy-based model for the prediction of cutting forces in machine operations.Authors aimed to examine the factors that contribute to the onset of adhesion in the context of progressive tool wear.Geng et al. [20] introduced an enhanced predictive model for estimating the torque and thrust force for the GJV450 under conditions of elevated temperature, big deformation, and signifcant strain rate.Te proposed model relies on a calculus approach to depict the variation of partial working angles during tool machining.
Recently, machine learning algorithms have been widely recognized for their capability to address nonlinear and complex interactions by means of data training.Several studies have used machine learning algorithms in tribological research to analyze and assess various phenomena, such as cutting force prediction or cutting tool lifetime, to improve machine efciency [21].Te thermal efects on machine operations have been predicted using machine learning models [22,23].
Saravanan et al. [22] presented a nonmechanical engineering model that included neural approaches, namely, logistic regression (LR), k-nearest neighbors (KNN), and random forest (RF).Te primary objective is to forecast the thermal efciency of a c-shaped fnned solar air heater (SAH).Meanwhile, Zhang et al. [8] proposed an analytical model to investigate coated cutting tools' rake face temperature distribution in the machining of H13 hardened steel.Furthermore, Singh et al. [23] examined the machine learning ability to forecast the maximum temperature in an elastohydrodynamic lubricated (EHL) line contact.Te operation of the EHL system under excessive loads or rates can elevate the temperatures, hence increasing the likelihood of unexpected system failure [24].Te integration of a neural network model with a machine learning model results in a signifcantly high level of accuracy, approximately 0.998.Tis integration also enhances the model's capability to efectively record the nuances of the EHL system.
Te authors in [25] used supervised machine learning regression-based techniques to make predictions on the ultimate strength friction (USF) stir of stir welded magnesium joints.Te XGboost algorithm has exhibited superior accuracy, as evidenced by its coefcient of determination value of 0.816.Te XGboost algorithm outperformed other machine learning models such as DT, RF, and AdaBoost.However, additional investigation is required in order to comprehensively comprehend the infuence of other output parameters on stir welded joints, not limited to the USF output parameter.Furthermore, the authors of [26] utilized various machine learning techniques to make predictions regarding the collective characteristics, including tool failure diagnostics, and real-time control of friction stir welding (FSW).
Te authors in [27] conducted a series of experiments on an annealed Ti-6Al-4V alloy.Te aim was to assess the efcacy of their suggested machine learning approach for predicting cutting force.Te authors used a support vector machine (SVM) classifer with a polynomial kernel to ascertain the correlation between the properties of the cutting force signal and the wear of the tool.Te classifer's accuracy and F1-score rates were 91.43% and 86.94%, respectively, as reported in the study.
Diaz-Rozo et al. [28] devised a diagnostic tool to evaluate the performance of machined spindles using three clustering algorithms.Te present study focused on the analysis of machine spindle behavior patterns through the utilization of clustering algorithms in an unsupervised manner.By examining the collected instance data, signifcant information on spindle data can be extracted and the spindle data is partitioned into distinct groups based on their inherent characteristics.Krishnakumar el al. [29] proposed a tool monitoring system using a classifcation model to categorize and monitor tool conditions in a high-speed precision milling center.Statistical features such as count, amplitude, and mean are captured using an acoustic emission sensor signal.Hence, the dominant features with maximum entropy are selected for classifcation using DT and SVM models.
Lawrence et al. [30] examined the impact of tool vibration and cutting parameters on bufered impact dumpers-(BID-) assisted boring process using artifcial neural networks (ANNs).Te ANN model demonstrated a high level of accuracy in predicting various aspects of cutting performance, including surface roughness, tool vibration, and cutting force.Te experimental results indicated a signifcant reduction in the cutting force, surface fnish, and tool vibration by 85%, 95%, and 93%, respectively.
Te prediction of cutting tool lifetime and torque values relies on the use of torque data as a dependable indicator [31].Oberlé et al. [31] developed a regressor model to forecast tool wear along with utilizing recorded torque data from the machining center and then measuring tool wear directly.Te performance of the regressor model was evaluated by using the random forest (RF) technique, resulting in an R2 score of 74%.In [32], the tool wear throughout the machining process was predicted using a conventional neural network (CNN) approach.

Gap Analysis.
Te discussion of the literature on friction coefcient prediction outlines the current eforts to use diferent machine learning approaches to several related problems, such as predicting tool wear and tool lifetime.Despite these eforts, we did not fnd any literary evidence proposing the prediction of the friction coefcient during metal cutting procedures using any learning-based model.Tus, utilizing deep learning and statistical models is required to investigate their accuracy levels.

Parameters Afecting the Friction.
In the following text, the parameters that noticeably afect the friction are discussed.Since the nanofuid is delivered to the contact area in International Journal of Intelligent Systems the form of droplets.Tus, the contact angle and the surface tension of the droplets control the efciency of the lubrication.For instance, at GNPs content of 0.03, the contact angle jumps to 34.475 ° [1].Higher GNPs contents of 0.1, 0.2, 0.3, and 0.4 wt.%, which recorded 36.89°, 40.73 °, 43.47 °, and 48.51 °, displayed the same pattern.Tis trend is explained by nanoparticles' propensity to congregate in nanofuid droplets in larger concentrations.As a consequence, the surface free energy increases [33].Additionally, as concentrations rise, the surface-to-volume ratio rises as well.As a result, the electrostatic repulsive force is outweighed by the attractive Van der Waals force, which raises surface energy and contact angle [34,35].As a result, with larger GNPs concentrations, the nanofuids' wettability region shrinks.Te GNPs reduce the impact of the higher contact angle and maintain a strong antifriction layer between the workpiece and the grinding wheel.Te normal and tangential forces are signifcantly decreased as a result of the antifriction layer that the PG0.03 droplets create.Additionally, the same general behavior is observed for all nanofuids derived from palm oil with GNP levels ranging from 0.03 to 0.4 weight percent.However, the increased contact angles and relatively higher viscosity of the palm oil-based GNPs nanofuids prevent them from spreading easily across the surface of the workpiece.Because of this, there is direct contact between the grinding wheel and the workpiece in some areas.Te machined surface's surface roughness rises as a result of this close contact.Te applied load plays an unspecifed role in the friction.Terefore, the role of applied load is usually selected based on the application and frictional test standards [36].Interestingly, the lubrication medium is a dominant parameter in the friction operation, since the used medium is palm oil which is a member of class of oils known as triglyceride and fatty acid composing oils.Te strong polarity of palm oil makes it a great lubricant (−COOH in fatty acids and −COOR in triglycerides) [37].
In this section, we briefy review the two forecasting models used in the proposed forecasting problem.Two classes of forecasting methods, namely, a statistical approach (i.e., ARIMA model) and a DL-based approach (i.e., GRU-DNN model), are proposed for forecasting the friction coefcient of milling machines.

ARIMA.
ARIMA model is a popular and widely used linear model in time series forecasting [38], thanks to its statistical properties and the Box-Jenkins methodology [39] in its building process.Te ARIMA model is composed of three diferent types of time series, namely, pure autoregressive (AR), pure moving average (MA), and the integration of AR and MA (ARMA) series.Tus, for a p th order autoregressive model, (AR p ) is given by equation (1).Similarly, for a q th order moving average model, (MA q ) is given by equation (2).
where In the ARIMA model, a variable future value is considered to be a linear function of several past observations and random errors.Specifcally, the whole process that defnes the time series has the form that is represented as follows: where c 0 is a coefcient to be estimated.Random errors (ϵ i ) are supposed to be independent and distributed identically such that Typically, we assumed

Gated Recurrent Unit Neural
Networks.Recurrent neural network (RNN) is one of the deep learning methods that have been widely used in a range of applications successfully.In specifc, it has been applied to time series forecasting.Te RNN has been utilized to successfully address several problems [40][41][42].It is a robust model that can learn a wide range of complex associations through vast amount of data.However, the depth of RNN results in two well-known problems, exploding and vanishing gradient problems.Terefore, two variations of the recurrent model were introduced, i.e., GRU [43] and LSTM [44], to address the aforementioned problems that arose with the vanilla RNN.LSTM and GRU architectures are characterized by similarity in design which include a gating mechanism for regulating the information fow through the unit.Nevertheless, due to the complex structure of the LSTM, its training and converging time is too long.GRU-DNN has simpler architecture compared to LSTM; thus, a GRU-DNN model is faster to train, relative to an LSTM model [45].
Te GRU model was introduced to allow recurrent units to capture patterns and dependencies of diferent time scales.Compared to the LSTM cell, GRU has no separate memory gate, which makes it more efcient and fast in data training.Figure 1 depicts a standard cell architecture for a GRU model.A typical GRU is composed of a group of cells in which each cell includes two gates (i.e., update z (t) and reset r (t) ) and a hidden state vector denoted as h (t) over the current time point t.A neural network of one layer is used to express each gate.A GRU cell architecture can be better illustrated in the light of the following equations: equations ( 6)- (9).Te cell is fed by two values, namely, the preceding 4 International Journal of Intelligent Systems hidden state cell and the current input sequence vector which is denoted as h (t−1) and x (t) , respectively.Te cell output h (t) stands for a hidden state.
where ⊙ represents an element-wise multiplication, σ(.) and tan h(.) are the sigmoid and hyperbolic tangent activation functions of the NN, respectively,  h (t) represents the candidate hidden state, W z , W r , and W h are the cell model's weight matrices for the feed-forward neural networks, while U z , U r , and U h are the cell model's weight matrices for the recurrent neural network, and b z , b r , and b h are the model bias.
A GRU cell's output h (t) , given by equation ( 9), forms a linear interpolation between a current candidate state  h and a previous hidden state h (t−1) .Tis linear interpolation mechanism is mainly used for learning long-term dependencies.More precisely, as z (t) approaches 1, the preceding hidden state remains unaltered and is kept unchanged for a few successive time steps.On the contrary, as z (t) approaches 0, the output cell will be equal to the value of the candidate state  h (t) .In this case, the candidate state's value  h (t) depends highly on the current input and the previous hidden state.Similarly, the candidate state  h depends also on the reset gate r (t) , which forces the cell to exclude or keep the last hidden states.and q, where p denotes the order of the autoregressive part, d is the required number of diferencing needed to convert the data series into stationary series, and q is the moving average order.Additionally, if the data series has a seasonality trend, then more seasonal parameters should be considered by the model.Terefore, to obtain the ARIMA parameters (nonseasonal parameters e.g., p, d, and q, and seasonal parameters e.g., P, D, Q, and S), we perform a grid search (using auto_arima method (https://alkaline-ml.com/pmdarima/modules/genera ted/pmdarima.arima.auto_arima.html))over a search space, as presented in Table 1.In other words, several model parameter combinations are evaluated; meanwhile, the model parameter combination that achieves the best ftting score is reported.

Te Proposed Stacked GRU-DNN Model.
Te proposed stacking GRU-DNN is depicted in Figure 2. Te model architecture consists of an input layer, a GRU layer, a fully connected layer (FC), and an output layer.Te input layer accepts the model input whereas the output layer involves one neuron to produce the predicted value.Te primary intention for using such a model structure is to utilize a recurrent layer that has the ability to learn and model time series patterns in the dataset.However, the intermediate fully connected layers are benefcial for recombining the extracted representation acquired from preceding layers and gaining supplementary representations for higher levels of abstraction.
Neural network models are prone to overftting or underftting problems, which are caused by the excessive/ less training epochs of the neural network model [46].Terefore, one possibility for resolving the overftting or underftting problems in DL-model is to apply an early halting strategy.Te training is devoted in using this strategy when generalization performance commences degrading over a successive number of epochs.As a consequence, to follow up on the generalization performance, the training data is split into training and validation groups.
Another method to tackle the overftting problem is to use the dropout method [47].Dropout is a regularization method that permits training neural networks with diferent architectures in parallel, where a certain ratio of layer neurons are randomly ignored or dropped out.Dropout is represented in the fully connected layers by the black neurons as shown in Figure 2.
Adam optimizer [48], which is an adaptive optimization algorithm, is used with its default learning and decay rate settings.Adam optimizer has demonstrated its efciency in solving practical DL problems, and its results outperform the other stochastic optimization methods.Te proposed DL model uses the mean square error (MSE) loss function, given by equation (10), that is, given a training data of N observations, the proposed GRU-DNN model is trained with the objective of minimizing the following loss function: where w is the network coefcient, F: IR k ⟶ IR 1 denotes the neural network fow, and k is the input vector size (i.e., number of lag features).an asynchronous distributed hyperparameter optimization method [49].Moreover, regarding parameter searching and optimization, we utilized the Tree-structured Prazen Estimator (TPE) methodology [50] from the Hyperopt package (https://hyperopt.github.io/hyperopt/).Table 2 shows the GRU-DNN model hyperparameters and the search spaces applied to obtain the optimal hyperparameter values of the model.Figure 3 depicts the required steps to build the proposed GRU and ARIMA models, starting from collecting the data to training the proposed models up to the evaluation phase.It is worth pointing out that for building the ARIMA model, the validation data set is not used; therefore, the validation data are appended to the training set.
Te source code of the proposed work is freely accessible online on the author's GitHub website (https://github.com/Ahmed-Fathalla/Friction_coef-forecasting) in order to guarantee the reproducibility of the experimental models, parameter confgurations, and reported results.

Dataset.
Te sliding tests were done via the universal pin-on-disc tribometer standards (https://www.astm.org/Standards/G133.htm).Besides, the sliding tests were done between Ti alloy and Zirconia ball as a counter ball.Te whole sliding tests were accomplished at a sliding speed of 0.1 m/s (reciprocating speed) and 10 N as an applied load.Te tests were done at lubricating conditions of dry, LB2000, water GNPs, PG-0.0,PG-0.03,PG-0.1, PG-0.2, PG-0.3, and PG-0.4.Te whole tests were done for 12,000 cycles/ 1,200 cycles per second.Furthermore, the whole tests were done at room temperature (25 °C) and humidity of 72%.Te frictional coefcients were obtained automatically from the machine by dividing the tangential force over the normal force (applied force).Te fuids were delivered to the cutting zone using MQL.Te chemical and physical properties of the used material are shown in Tables 3 and 4, respectively.
Te main target of the study is to apply the proposed technique in the feld of cutting processes, especially in grinding processes.Te contact between the machined surface and the cutting edges of each particle (the cutting tool is the abrasive edges) is considered a point contact.Tus, the applied load of 10 N has been chosen to simulate the same Hertizian stresses as the stresses generated by the abrasive edges [57].On the other hand, the sliding speed of 0.1 m/s has been chosen to simulate the linear federate during the grinding operation.Furthermore, the whole friction experiment and the applied parameters have been chosen according to the standards [36].
Te rubbing tests were done on a Ti-6Al-4V sample with a dimension of 25 × 12 mm and a counter 9.5 mm diameter ball of zirconia ball using UMT pin-on-disc universal tribometer with the ASTM standards [36], as shown in Figure 4.
x (t)  reset gate forget gate h (t) Figure 1: GRU cell architecture.

International Journal of Intelligent Systems
Te Ti-6Al-4V sample was kept fxed while the counter ball was reciprocated with a sliding speed of 0.1 m/s under an applied load of 10 N. Te tribopairs were dried with hot air and cleaned with acetone to remove any existing impurities.Besides, the fuids were delivered to the contact zone as aerosols with minimum quantity lubrication technique.For the nanofuid base, we utilized palm oil and distilled water with the chemical compositions shown in Table 3.
Te experiments were performed against a commercial grade 5 (as a workpiece) alloy of Ti-6Al-4V given by Dongguan Luyuan Metal Material Co., Ltd., while the cutting tool was described as a ZrO2 ball.Te chemical composition of both the workpiece and the counter ball are shown in Table 4 while GNPs with technical details are shown in Table 5. Te GNPs were combined with palm oil and distilled water, using various graphene levels, as listed in Table 6, to prepare the nanofuids.
Te mixtures were mechanically mixed for 1 hour so that nanoadditives were homogeneously dispersed.Nonetheless, for four hours, mixtures were sonicated at 40 kHz frequency at a temperature of 35 °C to prevent agglomerations and nanoadditives sedimentation and preserve a stable GNP suspension in the palm oil.However, the preparation route diferentiates in the case of distilled water-based nanofuids due to the absence of the polar heads which afect the stability of the dispersion of the nanoadditives in the distilled water.Terefore, sodium deoxycholate (SDOC) with a content percentage of 0.46 mg/ml was added as a surfactant and mechanically mixed with the distilled water for 10 min.Next, ethanol with a percentage of 10 wt.% was appended to the solution and mechanically stirred for 20 min to enhance the dispersion of the nanoadditives in the distilled water and decrease the possibility of sedimentations.GNPs additives were fnally added to the solution and sonicated for 24 hours at constant ambient temperature (25 °C) and 40 kHz of frequency.International Journal of Intelligent Systems

Data Preparation.
Te dataset consists of a number of observations that are gathered over 14.76 minutes, where the observations are collected every 0.01 seconds.Tus, the dataset has 14.7645 × 60 × 100 � 88, 587 observations.In order to evaluate the proposed models, the dataset is divided into a train set and a test set while the test set results are reported.Furthermore, for deep learning models, data series are standardized (or normalized), resulting in data of zero mean and scaling to unit variance (given by equation ( 11)), which leads to better training and fast convergence of neural networks [58], before training, and the predicted values are back-transformed into original scale.
where μ is the data series mean and σ is its standard deviation.

Experiments.
Te evaluation of the proposed models is achieved by performing a set of four classes of experiments with diferent objectives and confgurations.All of the reported results are the average of running the proposed forecast models on the obtained ten fles, as explained at the end of Section.We utilized the confdence interval at the 95% level.First, experiment-I includes performing single-step ahead forecasting.In this experiment, the proposed models read a set of n successive real measured friction coefcients and then predict a single friction coefcient.Te value of n is tunable and can be adjusted to set the saving of the power consumption.For instance, if n � 4, then for each successive friction coefcient, four coefcients are read from the sensor, and one coefcient is predicted.Tis makes the power saving of the friction coefcient sensor 20%.
In experiment-II, the predictive models are trained on the real measured friction coefcients of the frst m minutes of the overall running time of the UMT.Ten, each forecasting model begins to forecast the friction coefcients for the remaining time of the experiment time.Tus, if the overall running time of UMT is T, then the save in the power consumption of the friction coefcient sensor is (T − m) × p, where p represents the power consumption of the aforementioned sensor per minute.Of note, the predictive model is trained on the real friction coefcient of the frst m minutes, but the window of the predictive model consists of the last w forecasted values, that is, no actual readings are used in the forecasting process for this experiment.Specifcally, for experiment-II, we performed three experiments of the three training time intervals, m ∈ 7, 10, 13 { }.In experiment-III, the forecasting models are trained on the frictional coefcient of the frst m 1 minutes of data.Ten, the predictive models forecast the friction coefcients of the next m 2 minutes while the friction coefcient sensor is not working.Ten, the friction coefcient sensor is started again   Te goal of this experiment is to reduce the power consumption sensor and to make the accuracy of this sensor more accurate for the upcoming interval, as the sensor is turned of while the proposed model forecasts the readings.
In addition, we suggested an updating mechanism for the weights of the GRU model.Finally, in experiment-IV, as depicted in Figure 5, the proposed predictive models forecast the friction coefcient values of s seconds in advance (i.e., the brown color intervals in Figure 5).Te forecasted values are obtained from the model which is trained on the real friction coefcient values (i.e., the green color intervals in Figure 5).Te model utilized in this experiment can be used to provide an alert when the predicted friction coefcient value approaches a certain critical threshold that afects the product to be cut.Ten, the lubrication parameters (i.e., lubricant pressure and fow rate) can be readjusted to lessen the friction before the failure of the tool and the workpiece.To obtain the most accurate results, in this experiment, the attached force sensors to the UMT pin-on-disc tribometer were not turned of like in the three previous experiments, as the main goal of this experiment is to avoid workpiece deterioration not reducing power consumption.Tus, the model is trained on real data instead of forecasted data.
To study the efect of various training set sizes on models' performance, we proposed four training and test set pairs of the original dataset.Te four datasets are varied by changing the number of minutes used to collect the training data, as listed in Table 7.

Accuracy Metrics.
To better assess the forecasting performance of the proposed models, we use numerous widely used time series forecasting evaluation metrics [59][60][61].Terefore, four metrics are utilized, namely, mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and prediction of change in direction (POCID).
MAE measures the mean of the absolute diferences between the actual values and the forecasted values, which is given by equation (12).MSE is the square value of the error between the actual values and the predicted values, which is given by equation (13).RMSE measures the root of the average squared error deviation of the predicted values with respect to the actual values, which is measured by equation (14).Te lower the values for MAE, MSE, and RMSE, the better a forecasting model is.
Finally, POCID denotes the percentage of the model correctly forecasting trend (i.e., up/down) relative to the actual data trend.POCID metric is given by equation (15).A higher POCID value indicates better trend forecasting produced by the model.International Journal of Intelligent Systems where where N denotes the number of test set data records, Y i represents the actual observation values, and  Y i is the model output values.

Results.
To mitigate the stochastic nature of the neural network, due to the initial random weight and bias values, we ran each experiment fve times, whereas the average run's outcomes were reported.Moreover, we reported the mean and standard deviation of the fve diferent runs, as shown in Figure 6.In contrast, the ARIMA model produces the same result for diferent runs.As a result, its error bar is a single point; thus, we ignored reporting the standard deviation of the ARIMA models.Ten, the ARIMA model should be trained again using the same previous data points plus a new actual observation to forecast a new time point ahead, and so forth, that is, because each time the ARIMA model is trained (at each time point) without having any experience from the last trained model of the previously trained models.Additionally, ftting the ARIMA model on 2 minutes' observation, the smallest portion of the dataset to be used as training data in experiment-I takes around 52 seconds.Terefore, due to the large size of the test set, it is not applicable to train 100 ARIMA models per second to forecast the next 100 time points.By this at hand, the ARIMA model is not applicable to be utilized in experiment-I confgurations.
In Figure 7, one can notice that as the k value increases, the MAE value decreases till it reaches a plateau at k � 9, while k � 10 results in the least MAE value.Terefore, we set k � 10 for the GRU-DNN model experiment-I and other remaining experiments.Table 8 reports the GRU-DNN model performance of experiment-I in terms of the accuracy metrics.Of note, the training and validation loss in Figure 8 is much higher than the test loss, reported in Table 8, as they are measured during the training (while data are normalized to a wider scale than the original data scale (equation ( 11))).
Figure 9 presents the frst 100 values for the actual and predicted values produced by a GRU-DNN model trained on dataset III of the PG-0.1 sample.Figure 9 represents the same behavior of the other samples of Table 6.Furthermore, Figure 8 presents the loss values of the same model, i.e., training and validation loss (i.e., MAE), while other models have similar behavior.Te training and validation losses confrm the model's ability to generalize to unseen data, that is, the model does not sufer from an overftting problem.9 presents the ARIMA results for experiment-II.In this experiment, we performed a grid search to fnd the best ARIMA model parameter values before ftting the model.Te hyperparameter grid search shows a nonseasonality trend in the data series; therefore, P � 0, D � 0, Q � 0, and S � 0.  11.

Experiment-III.
Te ARIMA results for experiment-III are listed in Table 12.One of the main drawbacks of the ARIMA model is that it does not provide a useful experience of ftting ARIMA on previous training.13 lists the values of the four used metrics for predicting 60 seconds in advance (i.e., one minute).Te ARIMA model shows a better performance for all the evaluation metrics except POCID.Tis shows that for the scenario of predicting a set of n seconds in advance, the ARIMA model should be preferred over the GRU model.Te required time to prepare the new model for forecasting the next minute is 30 and 20 seconds for the ARIMA and GRU models, respectively.

Discussion
Te proposed GRU model is considered more suitable than the ARIMA model for the scenario of forecasting a few future friction coefcients, i.e., experiment-I, due to its time  11 and 14, respectively.Te listed memory sizes, training, and forecast times show that the proposed GRU-DNN model requires fewer resources relative to the ARIMA model.For instance, comparing the ARIMA and GRU models' memory for experiment-II, the ARIMA model's size is four orders of magnitude larger while the forecasting time of the ARIMA is smaller than the GRU-DNN model by an order of magnitude.
For training and forecasting times, the proposed ARIMA model outperformed the GRU-DNN model in terms, as listed in Table 9.Based on this discussion, in experiment-II, the ARIMA and proposed GRU-DNN models achieved very close results where the ARIMA model slightly performed better.Tus, the ARIMA model is preferred over the proposed GRU-DNN for scenarios similar to experiment-II.
In experiment-III and experiment-IV, the performance of the ARIMA model is clearly better, as the error metric values are lower for the ARIMA model.Te results of Table 13 are for forecasting one minute in advance.Tus, comparing Tables 12 and 13 for one minute forecasting only outlines a better performance for experiment-IV.Tis can be linked to the fact that the predictive models used in experiment-IV are trained on real sensors' readings, as the force sensors are not turned of during experiment-IV.For experiment-III, the predictive models are trained on mixed data of real sensors' readings and forecasted readings; this is because the force sensors were turned of during the forecasting period to reduce the power consumption.As the forecasted readings include some errors, these errors afect the predictive model performance.

Conclusion
Te task of monitoring the friction coefcient of the surface friction is vital in any metal cutting technique, but it is a time power consuming as well.Besides, the ability to predict future friction coefcients from historical data is a vital task to reduce or avoid thermal damage.In this work, we proposed the frst predictive models to capture the patterns of the friction coefcient during a metal cutting process.We generated a real friction coefcient of the surface friction using UMT.Ten, we proposed ARIMA and GRU-DNN models to perform the forecast task on a real dataset.Te ARIMA and GRU-DNN model parameters are tuned to get the best performance.Finally, the proposed models are tested on four diferent power consumption reduction scenarios, and the fourth scenario proves the ability of the proposed model to avoid thermal damage.Te proposed models show a signifcant performance in terms of prediction accuracy.Te future directions include using a hybrid model of the ARIMA and GRU-DNN models using ensemble learning models to combine the results of these two models.Besides, increasing the dataset by collecting friction coefcient data from diferent milling machines might increase the prediction accuracy rates.

Figure 3 :Figure 2 :
Figure 3: Te block diagram of the proposed method.

5. 5 . 1 .
Experiment-I.Running the ARIMA model for experiment-I requires training the model on the training set (e.g., observations for 2 minutes, dataset I) and forecasting a time point ahead.

Figure 7 :
Figure 7: Te MEA values of the proposed GRU-DNN model for diferent values of k.

Figure 8 :Figure 9 :
Figure 8: Te proposed GRU-DNN model's training and validation loss of experiment-I using dataset III.

Table 1 :
Te hyperparameter search space for the proposed ARIMA model.

Table 2 :
Te hyperparameter search space of the proposed GRU-DNN model.

Table 3 :
Palm oil fatty acids contents and physical properties.

Table 4 :
Ti alloy and zirconia ball chemical and physical properties.friction coefcients of the next m 3 minutes.Meanwhile, the predictive models are updated with the values obtained during the m 3 minutes producing no predicted values.Tis scenario is repeated until the end of the experiment.Te observed real readings of the m 3 minutes enable the proposed model from capturing the new patterns.

Table 6 :
Te composition of the nanofuids.
Table 10 reports the ARIMA hyperparameters values, training time, and model sizes for experiment-II.Notably, as the training data increases, the training time, and model size increase as well, as listed in Table

Table 7 :
Diferent dataset train test split length.For the task of forecasting an extended number of friction coefcients, and measurements for tens of seconds, the ARIMA model outperforms the GRU model in all evaluation metrics but not in the POCID.Tus, the ARIMA model can predict the friction coefcient for a prolonged period of time in advance better than the proposed GRU model while the GRU model predicts the changes in the friction coefcient over a short period of time better, e.g., experiment-I.Regarding the predictive model requirements, the memory and time requirements of the GRU-DNN model for experiments I and II are listed in Tables

Table 8 :
Evaluation metric scores of the proposed GRU-DNN model for experiment-I.

Table 9 :
Evaluation metric scores of the proposed models for experiment-II.

Table 11 :
Te proposed GRU-DNN model performance for experiment-II.

Table 12 :
Evaluation metric scores of the proposed models for experiment-III.
Bold values indicate best results achieved for each experiment.

Table 13 :
Evaluation metric scores of the proposed models for experiment-IV.Bold values indicate best results for achieved each experiment.

Table 14 :
Te proposed GRU-DNN model performance for experiment-I.Applications of the Proposed Predictive Model.Tere are two possible applications of the proposed work.Te main target of the study is to predict the friction based on real-time readings of the friction coefcients and control the lubrication parameters like the fuid mist pressure to keep the friction coefcients at minimum levels.Achieving this goal leads to a signifcant reduction in energy consumption.Te proposed technique can be applied directly to the feld of manufacturing whereas cutting processes are considered a heavy energy consumption sector and friction is the main suspect.Te second possible application is to utilize the proposed model in reducing or avoiding surface thermal damage.By predicting the high level of friction coefcients in advance, it is possible to readjust the lubrication parameters (fuid pressure, fuid fow rate, etc.) to maintain lower ranges of friction coefcients, as high ranges of friction coefcients may deteriorate the workpiece.