Inverse Method of Centrifugal Pump Blade Based on Gaussian Process Regression

+e inverse problem is always one of the important issues in the field of fluid machinery for the complex relationship among the blade shape, the hydraulic performance, and the inner flow structure. Based on Bayesian theory of posterior probability obtained from known prior probability, the inverse methods for the centrifugal pump blade based on the single-output Gaussian process regression (SOGPR) and themultioutput Gaussian process regression (MOGPR) were proposed, respectively.+e training sample set consists of the blade shape parameters and the distribution of flow parameters. +e hyperparameters in the inverse problem models were trained by using the maximum likelihood estimation and the gradient descent algorithm. +e blade shape corresponding to the objective blade load can be achieved by the trained inverse problem models. +e MH48-12.5 low specific speed centrifugal pumpwas selected to verify the proposed inverse methods.+e reliability and accuracy of both inverse problemmodels were confirmed and compared by implementing leave-one-out (LOO) cross-validation and extrapolation characteristic analysis. +e results show that the blade shapes within the sample space can be reconstructed exactly by both models.+e root mean square errors of the MOGPR inverse problem model for the pump blade are generally lower than those of the SOGPR inverse problem model in the LOO cross-validation. +e extrapolation characteristic of the MOGPR inverse problem model is better than that of the SOGPR inverse problem model for the correlation between the blade shape parameters can be fully considered by the correlation matrix of the MOGPR model. +e proposed inverse methods can efficiently solve the inverse problem of centrifugal pump blade with sufficient accuracy.


Introduction
Due to the complex internal flow constraints of fluid machinery, there is a complex implicit relationship among its geometric parameters, internal flow, and hydraulic performance. e problems related to the internal flow of fluid machinery can be summarized as the direct problem and the inverse problem. e direct problem focuses on the flow structure by experimental and numerical approaches [1][2][3], and the inverse problem mainly studies how to acquire the blade geometry shape according to the objective flow field distribution [4][5][6][7]. Usually, the inverse problem of fluid machinery can be considered as the design issues. In recent decades, with the rapid development of computational fluid dynamics (CFD) and modern flow testing techniques, the researches on the direct problem of centrifugal pump have made much progress. Compared with the direct problem of flow field analysis for pumps, the inverse problem is much more difficult. Hawthorne et al. [8] and Tan et al. [9] first proposed the inverse design method of fluid machinery. Borges [4] developed the theory of inverse method to threedimensional under incompressible conditions, and then Zangeneh and Goto [5] extended the three-dimensional inverse method to compressible conditions. Furthermore, Zangeneh et al. [6,7] used the three-dimensional inverse method to suppress secondary flow in the pump impeller. Bonaiuti et al. [10,11] combined the inverse design method with optimization techniques to realize the optimization design of the pump blade. At present, the inverse methods for the centrifugal pump impeller can in general be classified into two categories: one is the inverse method based on the general theory of relative stream surfaces proposed by Wu [12] and the other is the iterative method based on the iteration between flow simulation and modification design of impeller geometry [13][14][15]. In the first method, the blade shape is reconstructed with the assumption that the flow must be aligned to the blade surfaces [5-7, 16, 17]. e CFD model is greatly simplified, which leads to the flow simulation without sufficient accuracy, so the accuracy of this inverse method is hard to ensure. In the second method, the more accurate turbulence model is employed to simulate the flow field and the accuracy of numerical simulation is improved. However, it is hard to get a feasible blade geometry which agrees well with the specified flow field distribution, for the modification of blade geometry is highly dependent on the experiences of designers. Recently, the proper orthogonal decomposition (POD) reduced-order model was proposed for the inverse method of centrifugal pump blade. Zhang et al. [18] proposed the inverse method of centrifugal pump blade based on the POD model and then performed the inverse design of the two-dimensional and three-dimensional centrifugal pump blades by the Gappy POD model. e inverse method based on the POD model has a feature of quicker convergence, but the accuracy of it needs to be further improved.
Gaussian process regression (GPR) is a machine learning approach based on the Bayesian theorem, which provides a flexible framework for probabilistic regression and has good adaptability to deal with the high-dimensional and small-sample problems, etc. [19]. Gaussian process regression is mainly divided into the SOGPR model and the MOGPR model. e SOGPR model has less unknown parameters and can be easily explained, which has been widely used in dimensionality reduction [20], timeseries analysis [21,22], nonlinear regression [19], etc. Based on the SOGPR model, the MOGPR model is improved to learn the correlation information between the outputs of models, which can provide more accurate predictions in comparison with modeling outputs individually by the SOGPR model [23][24][25][26]. Liu et al. [27] constructed a multiresponse surface model for airfoil design based on the MOGPR model. Chai et al. [28] took the multitask Gaussian process to compute the inverse dynamics problem for a robotic manipulator. Wu et al. [29] combined the MOGPR model with the optimization algorithm to optimize the supercritical airfoils.
e GPR method has high prediction accuracy in the case of a small number of samples. To improve the accuracy of the centrifugal pump blade inverse problem, the GPR was introduced to the inverse method of centrifugal pump blade.
e blade load distributions of the pump were considered as the model input, the blade shape parameters were taken as the output, and then the inverse problem calculation of blade shapes was implemented by both of the models. e reliability of both models was verified, respectively, and the accuracy was compared and analyzed.

Single-Output Gaussian Process Regression Model.
A brief introduction to the SOGPR model is provided here, and a more detailed description can be found in [19]. A training sample set is defined as D � (x i , y i ) i � 1, 2, . . . , n , where x i is the d-dimensional input and y i is the one-dimensional output. e input x i corresponds to the random variable function f(x i ), and the collection of f(x i ) satisfies the joint Gaussian distribution, which can be interpreted as where m(x) is the mean function, which is commonly assumed to be zero, and k(x, x ′ ) is the covariance function. f(x) is completely specified by the mean function and the covariance function. In many realistic scenarios, the values of outputs are replaced by observations, which can be expressed as is independent and identically distributed, which accounts for the measurement errors and is known as noise in the Gaussian process models. erefore, the joint Gaussian distribution of the vector y is expressed as where δ ij is the Kronecker delta function. e joint prior distribution of observed values y and the function value where K(X, X) ∈ R n×n is the covariance matrix of inputs with the element K ij � k(x i , x j ), which can describe the information between the inputs in the training sample set. According to the inference of conditional distribution based on multivariate Gaussian distribution, the posterior distribution of f * is analytically derived as where the prediction mean f * can approximate the value of the unknown output of the test sample and the prediction variance cov(f * ) can provide the uncertainty in the solution of the unknown output; they are, respectively, given as During the calculation of the covariance matrix K in the previously mentioned SOGPR model, according to the Mercer theorem, the well-known squared exponential (SE) kernel function was used to calculate the covariance matrix K. e SE kernel function has good performance within the kernel machine field, and it is expressed as 2 Mathematical Problems in Engineering where the signal variance σ 2 f controls the output scale of the kernel function and the characteristic length scale l represents the level that the output result is impacted by different dimensionalities of the input x.
e vector of hyperparameters θ is defined, which contains the characteristic length scale l, the signal variance σ 2 f , and the noise variance σ 2 n . e values of the hyperparameters were optimized by the maximum likelihood estimation and the gradient descent algorithm.
It is known that the vector of the outputs y obeys the multivariate Gaussian distribution, the negative log marginal likelihood (NLML) of which is defined as e optimal values of θ can be obtained by using the gradient descent algorithm to minimize the NLML. In order to avoid optimization falling into local minima in the NLML, the hyperparameters usually need to be initialized randomly for multiple times and the hyperparameters with the lowest NLML will be selected [30].

Multioutput Gaussian Process Regression Model.
Based on the SOGPR model, the MOGPR is improved to model t outputs simultaneously to learn their correlation, which can outperform individual modeling. e flow charts of the SOGPR model and the MOGPR model are shown in Figure 1. e main difference between MOGPR and SOGPR is the construction of the kernel function matrix K. A correlation matrix based on the Kronecker product is employed to describe the correlation between t outputs.
Consider a set of training samples D � (x i , y i ) i � 1, 2, . . . , n}, where x i is the d-dimensional input and y i contains t one-dimensional outputs. For a model with n samples, each sample has t outputs, and the kernel function matrix can be expressed as where ⊗ represents the Kronecker product and K x is the same as the kernel function matrix in the SOGPR model and has a size of n × n. It describes the relationship between the inputs of all samples. K c expresses the correlation between outputs, which is referred to as the correlation matrix and has a size of t × t. e diagonal elements of K c represent the correlation between outputs and themselves, and the nondiagonal elements describe the correlation between the different outputs. If K c is the identity matrix, all outputs would be considered to be independent and have no correlation. Generally, K c is initialized as the identity matrix. K c and K x lead to a matrix of size nt × nt for K. θ c and θ x are vectors including hyperparameters for K c and K x , respectively. e number of elements in θ c increases rapidly with the increase of outputs. θ c and θ x can be learned together by optimizing the NLML.

Inverse Method of Pump Blade Based on GPR
e inverse problem of centrifugal pump blade also belongs to the pump design issues, the target of which is to get the desired blade shape which can produce the prescribed flow field distribution. In this research, according to the models mentioned above, the blade load distributions x i and the blade shape parameters y i were defined as the input and output of both models, respectively. e objective blade shapes can be obtained by both of the trained inverse problem models when the objective blade load distributions are given. e algorithm of the proposed centrifugal pump blade inverse methods can be summarized as follows: Step 1. Parameterize the prototype blade shape and obtain the initial samples of the blade shape by experimental design Step 2. Simulate the inner flow by the CFD method and calculate the blade load distribution of all samples Step 3. Combine the blade load distribution data and blade shape parameters of all initial samples into a Step 4. Give the initial values of the hyperparameters and get the optimal hyperparameters θ[l, σ f , σ n ]of SOGPR or θ[l, σ f , σ n , K c ] of MOGPR by using the gradient descent algorithm to minimize the NLML Step 5. Get the mean and variance of blade shape parameters corresponding to the given objective blade load by equations (5) and (6) Step 6. Plot the blade shape and the corresponding 95% confidence interval according to the mean and the variance of the blade shape parameters, respectively In step 1, as we can see in Figure 2, the prototype blade shape was parameterized by the cubic Bezier curve. e blade inlet and outlet angles were, respectively, controlled by the slopes of edges AB and CD. e blade wrap angle was controlled by the movement of point D in the circumferential direction of the outlet diameter of the impeller.

Sample Set and Training Data Generation.
e MH48-12.5 low specific speed centrifugal pump (Q � 12.5 m 3 /h, H � 30.7 m, n � 2900 r/min, η � 53%) was applied to verify the inverse method of centrifugal impeller based on Gaussian process regression. e meridional plane and end view of the prototype centrifugal impeller are shown in Figure 3. e wrap angle Φ of the impeller is 143°, the inlet blade angle β 1 is 30°, and the outlet blade angle β 2 is 17°. e prototype blade shape was parameterized by the cubic Bezier curve. e uniform design of experiments was employed to generate the sample set. In the condition that the meridional plane of the impeller was fixed, there were eight initial blade samples generated by the perturbation of β 1 , β 2 , and Φ for 5°, 5°, and 10°on the basis of the prototype blade shape,  respectively. e wrap angle, inlet blade angle, and outlet blade angle of all blade samples are listed in Table 1, and all blade shapes are shown in Figure 4. e angular coordinates of 20 points uniformly distributed along the radius of the impeller from the blade inlet to the outlet, as shown in Figure 2, were defined to control the blade shape. Hence, the blade shape can be expressed as In the field of hydraulic machinery, the change of the energy gradient of fluid from the impeller inlet to the outlet determines the internal flow characteristics and hydraulic performances of pump impellers. erefore, the gradient of the flow head zH i /zm was directly defined as the blade load, which can be obtained by CFD simulation [31,32]. In order to get blade load data conveniently, the hexahedral structured grids were adopted to discretize the computational domains of the impeller, volute, inlet pipe, and outlet pipe. e details of the grid independence test at the design flow rate are shown in Figure 5, and the final grid number for the computation was determined as 1.1 million. e commercial code ANSYS FLUENT 16.0 with the RNG k − ε turbulence model and SIMPLEC algorithm was employed for CFD calculation. A uniform velocity was set to the inlet boundary condition, and the free outflow was set to the outlet boundary condition. e near wall flow was treated by standard wall function. e convergence criterion of all residuals was set as 10 − 5 . e calculated blade load distributions of all samples are shown in Figure 6. As we can see, the blade load distributions vary with the blade shapes, the blade load distributions on the front section of the blades are obviously affected by the inlet blade angles, and the blade load distributions on the trailing section of the blades are significantly influenced by the outlet blade angles. e blade load values of 63 grid nodes from the inlet to the outlet of the pump blade were considered as the model input, and they were expressed as T . e rows of X and Y both represent the number of samples. e columns of X and Y represent the dimension of the model input and output, respectively.

Results of Inverse Methods.
e SOGPR model and the MOGPR model were, respectively, used to construct the inverse methods for the centrifugal pump blade. Both of the inverse problem models were programed by using the MATLAB code. In the SOGPR model, X was used as the model input, and each column of Y was taken as the onedimensional output of the model, so that 20 SOGPR models were trained independently. e angular coordinates of 20 points on blade shape can be obtained by the 20 trained models according to the given blade load distribution. e impeller blade geometry can be reconstructed by the prescribed meridional plane and these 20 points on blade shape. In the MOGPR model, however, the entire Y was taken as the output, which has 20 dimensionalities. During the    Mathematical Problems in Engineering training of this model, all dimensionalities of the output were trained simultaneously to consider their correlation so that there was only one model that needs to be trained.
Firstly, the prototype blade was selected as the objective blade to achieve the inverse problem calculation for the centrifugal pump blade. e hyperparameters were initialized according to the training samples [33]. e characteristic length scale l, the signal variance σ 2 f , and the noise variance σ 2 n were finally initialized to 2.6, 0.9, and 0.05 for both models, respectively. e correlation matrix of the MOGPR model was initialized as an identity matrix. Regarding hyperparameter learning, the gradient descent algorithm was employed to minimize the NLML and the maximum number of function evaluations was set as 800 for both models. e blade shapes, shown in Figures 7 and 8, were, respectively, calculated by the trained MOGPR and SOGPR models according to the objective blade load distribution. It can be seen that the blade shapes generated by the SOGPR model and the MOGPR model both are almost coincident with the objective blade shape, and the variances of blade shape parameters for each model are low enough, which are illustrated by the 95% confidence interval. e average variances of 20 blade shape parameters calculated by SOGPR and MOGPR models are 1.20 and 0.94, respectively. It can be concluded that the blade shapes obtained by both inverse problem models have sufficient accuracy and low uncertainty.
Secondly, the LOO cross-validation was employed to fully confirm the reliability of the Gaussian process regression models to achieve the inverse problem calculation for the centrifugal pump blade. In the LOO cross-validation, the sample set was reconstructed including eight training samples and the prototype sample. Each blade shape was predicted by both of the constructed inverse problem models, which were trained by the other eight samples, and the prediction errors were analyzed. ese nine samples were cross-validated, respectively. e blade shape parameters were calculated by both models, respectively, according to the blade load distribution of the test sample. Table 2 shows the values of the root mean square error (RMSE) between the inverse design blade shapes calculated by both inverse problem models and their objective blade shapes during the LOO cross-validation. e RMSE is defined as where θ d and θ d denote the angular coordinates of the objective blade shape and the inverse design blade shape, respectively; Φ is the wrap angle corresponding to the blade shape; and N stands for the total number of angular coordinates. We can see in Table 2     see that the accuracy of the SOGPR model is higher than that of the MOGPR model, which is not as expected. is phenomenon may be caused by the following reason: We mentioned above that the number of hyperparameters in the MOGPR model is much larger than that in the SOGPR model. During the optimization of hyperparameters in the MOGPR model, the results of optimization may only succumb to local minima in the NLML, not the global minima. As a consequence, the training of the MOGPR model is not enough, and finally, the accuracy of it is lower than that of the SOGPR model. erefore, in order to ensure the accuracy, efficiency, and reliability of inverse problem calculation for pump blade design, reasonable selection of the initial values of hyperparameters during the model training is necessary.
Finally, an objective blade shape out of the sample space was generated to analyze the extrapolation characteristics of both inverse problem models. e blade shape out of the sample space is shown in Figure 4, and we refer to this blade as an extrapolation blade. e sample set reconstructed in the LOO cross-validation was used to train both models. Regarding model parameter settings of both models, the characteristic length scale l was initialized to 2.68, the signal variance σ 2 f was initialized to 1.08, and the noise variance σ 2 n was initialized to 0.57. e correlation matrix of the MOGPR model was also initialized as an identity matrix. e maximum number of function evaluations was set as 800 for both models. In the gradient descent algorithm of the MOGPR model, the NLML value shows a tendency toward stabilization with the increase of iteration numbers. As shown in Figure9, we can see that the NLML value remains almost constant after 800 steps of iteration, so the optimization results of hyperparameters at 800 steps were used to calculate the inverse problem of extrapolation blade. e blade shapes corresponding to the objective extrapolation blade load were obtained by the trained SOGPR   and MOGPR models, respectively. We can see in Figure 10 that the extrapolation blade shape calculated by the MOGPR model almost approaches its objective blade shape, and the blade shape is smoother and continuous for the correlation between the blade shape parameters that is taken into account by the correlation matrix. Some elements of the learned correlation matrix K c are shown in equation (11). It can be seen that the correlation matrix is a symmetric matrix, and the farther the elements are from the principal diagonal, the smaller the element values are, which reveals that the correlation between any output parameters of blade shape decreases gradually with the increase of their distance. As shown in Figure 11, however, the extrapolation blade shape acquired by the SOGPR model is messy in the middle section of the blade because the 20 parameters for blade shape were calculated by the 20 SOGPR models individually and the correlation information between the blade shape parameters is ignored. Consequently, the MOGPR inverse problem model has better extrapolation characteristic than the SOGPR inverse problem model. e correlation matrix can restrict the relationship between blade shape parameters so that the blade shape characteristic is formulated more exactly. In addition, comparing Figures 7 and 10, we can find that the inverse problem of extrapolation blade has higher uncertainty than that of interpolation blade.

Conclusions
(1) e Gaussian process regression method was introduced to the inverse problem of centrifugal pump blade. e complicated inverse problem was converted into the problem of getting the posterior distribution from the known prior distribution based on the Bayesian theorem in the background of machine learning. Both of the inverse problem models for pump blade design have good interpolation characteristics.
(2) e LOO cross-validation was carried out, respectively, on both models, and the results were compared and analyzed. e blade shapes within the sample space can be achieved exactly and efficiently by both of the SOGPR and MOGPR inverse problem models according to the given objective blade load distributions. Both inverse problem models are robust to calculate the inverse problem of pump blade.
e RMSE values of the MOGPR inverse problem model are generally lower than those of the SOGPR inverse problem model. e research shows that the accuracy of the MOGPR inverse problem model to   calculate the inverse problem of pump blade is better than that of the SOGPR inverse problem model. (3) e extrapolation characteristics of both models were tested and compared. e extrapolation blade obtained by the MOGPR inverse problem model almost approaches its objective blade shape, and the blade shape is continuous and smoother. However, the extrapolation blade shape acquired by the SOGPR inverse problem model is messy, which is unable to achieve the inverse design. Since the outputs are considered dependent on each other in the MOGPR inverse problem model, the correlation between the outputs is adequately learned by the kernel function matrix. e extrapolation characteristic of the MOGPR inverse problem model is much better than that of the SOGPR inverse problem model. In addition, the accuracy of the proposed inverse methods for the interpolation blade is higher than that for the extrapolation blade.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.  Mathematical Problems in Engineering 9