Investigation on the Optimal Design and Flow Mechanism of High Pressure Ratio Impeller with Machine Learning Method

The optimization of high-pressure ratio impeller with splitter blades is di ﬃ cult because of large-scale design parameters, high time cost, and complex ﬂ ow ﬁ eld. So few relative works are published. In this paper, an engineering-applied centrifugal impeller with ultrahigh pressure ratio 9 was selected as datum geometry. One kind of advanced optimization strategy including the parameterization of impeller with 41 parameters, high-quality CFD simulation, deep machine learning model based on SVR (Support Vector Machine), random forest, and multipoint genetic algorithm (MPGA) were set up based on the combination of commercial software and in-house python code. The optimization objective is to maximize the peak e ﬃ ciency with the constraints of pressure-ratio at near stall point and choked mass ﬂ ow. Results show that the peak e ﬃ ciency increases by 1.24% and the overall performance is improved simultaneously. By comparing the details of the ﬂ ow ﬁ eld, it is found that the weakening of the strength of shock wave, reduction of tip leakage ﬂ ow rate near the leading edge, separation region near the root of leading edge, and more homogenous outlet ﬂ ow distributions are the main reasons for performance improvement. It veri ﬁ ed the reliability of the SVR-MPGA model for multiparameter optimization of high aerodynamic loading impeller and revealed the probable performance improvement pattern.


Introduction
Centrifugal compressor [1] has the advantages of a singlestage high-pressure ratio, compact size, and wide stable operation range. Therefore, it usually is used in small power gas turbines and turboshaft engines. In the past, most researches have been focused on the centrifugal impeller with pressure ratio from below 6, such as Eckardt impeller [2] and Krain impeller [3]. Nowadays, several single-stage centrifugal compressors with a total pressure ratio higher than 6 have been designed and assembled in engines in order to increase specific power and decrease specific fuel consumption [4,5]. But few public literatures expose the flow details and loss mechanisms to provide effective guides for very highloading design with more complex shock wave structures, flow separation in corner zone, and recirculation [4]. So, the test results often fail to meet the design targets [5]. Fur-thermore, the complexity of the geometry shape and the flow field increases the design difficulty. More advanced optimization methods are needed to be developed to deal with these problems with large-scale variables and high time consumption.
The optimization design methods can be divided into local gradient-based algorithms and advanced global optimization methods, for example, evolutionary algorithms, simulated annealing algorithm, or surrogate model. The gradient-based methods, according to the way in which gradients are calculated, include the finite difference method, the linearized method, and the adjoint method [6]. The main disadvantage of both the finite difference method and the linearized method is that the time cost of evaluating the objective function derivatives is usually proportional to the number of design variables. Although the adjoint method is more efficient in large design spaces, it is still easy to be trapped in a local optimum.
The development of the surrogate model has solved the above disadvantages well. The principle of the surrogate model is that it constructs a model by fitting a database to replace the process of generous CFD to reduce computational cost and improve optimization efficiency. Nowadays, the optimization framework that combines a surrogate model with an evolutionary algorithm has become mainstream. It not only had global superiority and accuracy for optimization but also had low time-cost. Pierret and Braembussche [7] constructed a response surface model, and simulated annealing (SA) was employed for optimization. The kriging model which combines with multipoint genetic algorithm (MPGA) was represented by Nishida et al. [8] for the optimization of a multistage centrifugal compressor. Liu et al. [9] developed a method that the performance agent model of impeller type parameters was constructed based on the Gauss radial basis function. Lian et al. [10] proposed a multipoint genetic algorithm, the reliability-based design optimization method for Rotor 67. John et al. [11] compared some previous multioptimization results for Rotor 37, including adjoint optimization and response face with different evolutionary algorithms. A range of parameter numbers was used, with the maximum being around 30. Although the maximum efficiency was improved by 1.7%-1.9%, some optimization results failed a mechanical stress constraint or some reduced the stall pressure ratio of the blade. It is convinced that the traditional surrogate model still has defects for multiparameter multipoint optimization.
In the past 10 years, machine learning has attracted considerable attention due to extraordinary results obtained in fields like image recognition and artificial intelligence. The principle of machine learning is similar to the surrogate model that mines the relationship between the objective function and the independent variable from the data. Since Rumelhart et al. [12] proposed a neural network error back-propagation algorithm in 1986, the neural network technology has been further developed, which has promoted progress in the field of aerodynamic optimization. Miao et al. [13] combined the neural network and genetic algorithm to optimize the turbine blades, and the efficiency increased by 3%. Guo et al. [14] used the convolutional neural network (CNN) to model and analyze the three-dimensional flow field, and the time-cost was greatly reduced and the accuracy was almost unchanged. Based on deep neural networks, Templeton et al. [15] constructed a turbulent Reynolds stress model that can predict anisotropy and tensor simultaneously using a large amount of high precision data. Besides neural networks have received widespread attention and application, Ladický et al. [16] used regression forest technology (RF) to analyze unsteady flow fields and improve computational efficiency. Milani et al. [17] analyzed the sensitivity of each parameter in the turbulence model to solve the optimal diffusion coefficient based on random forests. Kaya [18] used the SVR model to give a functional relationship between the spanwise twist distribution and the generated torque. A machine learning-based RBDO approach, called the fuzzy multi-SVR learning method, was proposed by Zhang et al. [19] to optimize the turbine blades. The results showed that the stress and deformation of the turbine blades were reduced, and the comprehensive reliability degree of the blade was improved. Choi and Park [20] choose aspect ratio, taper ratio, and back-swept angle as optimization variables   International Journal of Aerospace Engineering and constructed the SVR model to optimize the wing.
Recently, Joly et al. [21] presented a machine learning framework to speed-up the design optimization of a highly loaded transonic compressor rotor. The optimized rotor geometry features precompression that relocates and attenuates the shock without the stability penalty. In summary, the SVR has a good ability to solve various optimization problems [22,23].
In the present paper, focusing on a centrifugal impeller with pressure ratio 9, the meridional curves and main and splitter blades are parameterized with a total of 41 parameters to realize the three-dimensional geometry generation. Then, the SVR [24] is employed to train the model based on the database results of different centrifugal compressor geometries. A multipoint genetic algorithm is applied for global optimization automatically. The optimal geometry for the very high load impeller and the flow mechanism will be given in detail.

Numerical Method
The Numeca/Fine turbo, a high-quality commercial solver, is selected for all the numerical calculations. Three-dimensional steady compressible Reynolds averaged Navier-Stokes (RANS) equations are solved. These equations are discretized in space by cell-centered control volume technique and in time by the explicit four-stage Runge-Kutta scheme. The Spalart-Allmaras (SA) turbulence model is designed for fully turbulent flow with low freestream turbulence, which is usually a good compromise between accuracy and simplicity. So, the SA turbulence model is used for all the cases.
A high pressure ratio 5.7 centrifugal compressor designed by Krain at DLR [3] is chosen to validate the numerical solver. This compressor stage is composed of an unshrouded centrifugal impeller with thirteen main and splitter backswept blades, tip clearance varying from 0.5 mm at the blade inlet to 0.3 mm at the exit, and a vaneless diffuser. Further geometry and test details can be taken from [3].
The results at the ratio 1.9 of vaneless diffuser outlet to impeller tip are extracted to compare with test data. As shown in Figure 1, for the total performance in the design rotational speed line, the computed choked flow is slightly higher than the test. Nearly all of the simulations in the literatures show similar trends. The computed peak efficiency is lower near 1.5% than the test. For the 44000-rpm line, the performance is very close between the simulation and experiment. The relative Mach number contours at the impeller outlet sections are shown in Figures 1(b) and 1(c) for the experiment and simulation. Whether in the quality or quantity, the results are very consistent. Based on the above comparisons and analysis, it can be seen that the numerical method is highly credible.

Validation
The research object in this paper is an engineering-applied centrifugal impeller with a high pressure ratio 9. Few public works are carried out for the very high-pressure ratio impeller because of its complexity. For designers, it is very beneficial to know how to keep the loss low in the design procedure.
The three-dimensional geometry and the meridional curves are shown in Figure 2. The impeller consists of 11 main and splitter blades with a high back-swept angle.
Four sets of O4H-type grids, 1.03, 1.52 (shown in Figure 3), 1.98, and 2.44 million, are generated by the Auto-grid5 for the grid dependency study. The value of Y plus is less than 10 in the first cell, which meets the requirements of the SA turbulence model. As shown in Figure 4, the performance curves of the four grids nearly coincide. The 1.52 million grids are selected for subsequent computation based on the grid dependency study.

Optimization Method Based on
Machine Learning 4.1. Optimization Process. The basic flowchart of the optimization process can be seen in Figure 5. The process is as follows: (1)  Obtain the optimal geometry of the impeller by using the genetic algorithm considering the multipoints.
Based on the SVR model, this paper uses the geometry database generated by the Design 3D with 41 parameters to construct a machine learning model for the objective function with constraints. The model is used to replace the objective function, and the model prediction value is used to replace the CFD calculation result.

Parameterization.
The parameterization form of the impeller with the modeling software Autoblade is shown in Figure 6. The meridional channel is controlled by six control points for the shroud and six control points on the hub using the Bezier lines. Three spanwise sections at the hub, 50% span height, and shroud are constructed by the camber line and thickness distribution. The camber line is parameterized using the Bezier curve with seven control points, and five of them are variables. Similarly, the thickness distribution is parameterized using the Bezier curve with five control points, and four of them are variables. In summary, a total of 41 parameters control the shape of the impeller, as shown in Table 1.

SVR.
The Support Vector Machine is one of the common algorithms for machine learning, which has a good ability to solve the classification and regression problems. This paper introduces SVR into the optimization of the high pressure ratio impeller.
For a problem of multi-input single-output (MISO) nonlinear system modeling, a training sample set is assumed to   International Journal of Aerospace Engineering be fitted by a hyperplane (decision function). The decision function is as follow: where w, b, and T denote the weighting vector, bias, and transpose, respectively.
Then, the problem of solving the regression function with ε precision can be transformed to obtain the following optimized solution according to Vapnik's statistical learning theory.    S3_camber_H4 S2_camber_H4 S1_camber_H3 S1_camber_H4 S1_camber_H2 S1_camber_H1 S1_camber_H2 S1_camber_H3 S3_camber_H3 S2_camber_H1 S3_camber_H5 S3_camber_H1 S1_camber_H5 Shroud_Z3 Shroud_Z2 Figure 7: Feature importance.

International Journal of Aerospace Engineering
Sometimes, due to the harsh constraint condition, it is difficult to meet the requirements. One can introduce the slack variables ξ i , ξ i * and penalty factor C to deal with the otherwise infeasible constraints. So, the optimization function and constraint function become: Using the Lagrange function method together with the dual variables to find the solution in the above problem can lead to a quadratic programming (QP) problem: where α i and α * i are the Lagrangian multipliers. The solution of the above function involves ϕðx i Þ T ϕðx j Þ, which represents the inner product that a sample is mapped to the feature space. Due to a large number of features, the solution is timeconsuming, so the concept of kernel function is proposed: In this study, the radial basis function (RBF) is used as a kernel function of the SVR model, because the function has a strong nonlinear mapping ability.
Then, w and b can be got by the Karush-Kuhn-Tucker condition. The equations are as follows: After getting the optimal parameters ω and b, Equation (1) is got.
In this paper, the decision factor (R 2 ) is used for evaluating the accuracy of the SVR model. The definition of R 2 is as follows: where y i is the calculated value, f i is the predictive value, and y is the average value. [25] is a bagging learning model based on the decision tree and further introduces feature selection in the training process comparing to the decision tree. For each characteristic parameter, OOB (out of bag) error1 error1 is obtained by the testing model with the OOB sample. Then, keep the other features unchanged and   International Journal of Aerospace Engineering randomly change the feature with a certain range, thereby obtaining error2. If jerror1 − error2j is larger, which means that the change of this feature has a great influence on the result; otherwise, it means that the feature is not important. The behavior is similar to the principal component analysis (PCA). As illustrated in Figure 7, it is implied that Shroud_ R5 (R coordinate value of the fifth control point on the shroud) has the greatest influence on the aerodynamic performance. This conclusion can guide in controlling the number of optimization parameters. It is hoped that characterizing the aerodynamic performance with the most suitable parameters by feature selection improves the reliability of the model. According to the results in Figure 8, it can be seen that the accuracy of the model fluctuates as the least relevant parameters decrease. Finally, it is concluded that while considering all the parameters, the model best reflects the relationship between the parameters and aerodynamic performances.

Genetic
Algorithm. The genetic algorithm is an adaptive optimization algorithm based on the "survival of the fittest." It was presented by Holland [26] in the 1970s and improved by Goldberg [27]. The method realizes the evolution of the population through natural selection, crossover, and mutation. Starting from any initial population, constructing the fitness function and setting the search strategy are used to find the global optimal solution in the search space. In this study, GA is used to search for the optimal geometry of an impeller.

Comparison of the Total Performance.
In the process of multipoint optimization, the optimization objective is to maximize the efficiency at the design point (OP1) and restrain the pressure-ratio at the near stall point (OP2) and choked mass flow (OP3). The accuracy of the three models is evaluated by the coefficient of determination (R2), which are 0.98, 0.96, and 0.95, respectively. The optimization results at 100% rotational speed line are shown in Table 2 and Figure 9.
It can be seen that the choked mass flow only varies by 2%, and the total pressure ratio near the stall point and isentropic efficiency at peak point increase 0.32 and 1.24%, respectively. The increase in efficiency at the very high pressure ratio is remarkable.

5.2.
Comparison of the Impeller Geometries. The meridional hub and shroud curves of the optimal and datum blades are compared in Figure 10. It can be seen that the curvature change is smoother from the axial to the radial transition  In Figure 11, the airfoil shapes of the datum and the optimized main blades at three different spanwise height, 0%, 50%, and 100%, are compared. It is shown that the major discrepancy occurs in the near hub section. At hub span, the inlet camber angle increases about 3°, and the outlet back-swept angle decreased more obviously. This raises the aerodynamic loading near the outlet of the impeller blade, as shown in Figure 12(a)). In Figure 12(c)), for the tip section, there is a very notable shock wave near the leading edge of the datum impeller. And the loading distribution is not very smooth. The optimized impeller improves this phenomenon with shock wave reduction, so the loss becomes lower.

Flow Analysis at Peak Efficiency
Points. Based on the above analysis, the change of shock wave structure should be analyzed in detail. Figure 13 shows the cloudy comparison of the relative Mach number distribution near the suction surface between datum and optimized impeller at the peak efficiency point. It is distinct that there is a heavy supersonic region near the leading edge in the tip zone of the datum impeller. The high strength of shock wave induces the strong tip leakage flow (as shown in Figure 14(a)) and high losses (as shown in Figure 15(a)). For the optimized impeller, the supersonic region decreased obviously and the strength of shock wave is weakened, as shown in Figure 13(b) and Figure 15(b). Also, the tip leakage flow rate near the leading edge, red lines in Figure 14, is lessen. All these improve the aerodynamic performance and increase the efficiency.
On the other hand, in Figure 13(a)), a reverse flow separation region exists near the root region near the leading edge. The change of hub and middle sections improve the flow in this region, as shown in Figure 13(b)). In Figure 14, the complex secondary flow structure in the high pressure ratio impeller should be taken care of. Because of the high pressure ratio and flow direction turning from the axial to radial direction, the flow near the hub end wall is shifted to the shroud wall and then strides the tip gap. Moreover, this air flow mixes with the leakage flow generated near the leading edge and then transports to the outlet.
At 50% span (seen in Figures 15(c) and 15(d)), the low momentum flow region in every passage outlet is reduced    15(h) show the comparison of the passage cross-sections. In the channel section, there is obviously a low-speed region near the tip and the flow inhomogeneity in whole sections, continuing to the outlet to form jet/wake structure. The optimized impeller alters the uneven status partly, and the jet/wake structure is improved. So the performance is increased.

Conclusion
This paper describes an advanced optimization method based on machine learning and successfully applies the method to a pressure ratio 9 centrifugal impeller using 41 design parameters. The Bézier curve is used to achieve the parametric modeling of the centrifugal impeller for the hub and shroud meridional curves, camber lines, and thickness distribution in the three sections in spanwise. The SVR algorithm is used to generate an approximate function between  the geometric parameters and aerodynamic performance. Based on this model, a genetic algorithm is used for cyclic optimization. The peak efficiency has been increased by 1.24%, the maximum pressure ratio has been increased by 0.32, and the stall margin keeps unchanged basically. Comparing the two optimized aerodynamic shapes, it is found that the main change is the meridional curves and camber angle distribution at the hub and shroud section. By comparing the details of the flow field, it is found that the weakened strength of shock wave, reduction of tip leakage flow rate near the leading edge, separation region near the root of the leading edge, and more homogenous flow status distribution near outlet are the main reasons for performance improvement.

Data Availability
The data used to support the findings of this study have not been made available because of the industrial product.

Conflicts of Interest
The authors declare that they have no conflicts of interest.