In the current performance evaluation works of commercial banks, most of the researches only focus on the relationship between a single characteristic and performance and lack a comprehensive analysis of characteristics. On the other hand, they mainly focus on causal inference and lack systematic quantitative conclusions from the perspective of prediction. This paper is the first to comprehensively investigate the predictability of multidimensional features on commercial bank performance using boosting regression tree. The dimensionality in the financial-related fields is relatively high. There are not only observable price data, financial fundamentals data, etc., but also many unobservable undisclosed data and undisclosed events; more sources of income cannot be explained by existing models. Aiming at the characteristics of commercial bank data, this paper proposes an adaptively reduced step size gradient boosting regression tree algorithm for bank performance evaluation. In this method, a random subsample sampling is performed before training each regression tree. The adaptive reduction step size is used to replace the reduction step size setting of the original algorithm, which overcomes the shortcomings of low accuracy and poor generalization ability of the existing regression decision tree model. Compared to the BIRCH algorithm for classification of existing data, our proposed gradient boosting regression tree algorithm with adaptively reduced step size obtains better classification results. This paper empirically uses data from rural banks in 30 provinces in China to classify the different characteristics of rural banks’ performance in order to better evaluate their performance.
The traditional Malmquist index [
Machine learning technology has certain applications in performance evaluation in the financial field. Taking the fund performance analysis and evaluation model as an example [
Most of the existing bank performance evaluation models are based on the Malmquist index method, but the information dimensionality in the financial-related fields is relatively high [
In summary, the contributions and innovations of this paper can be summarized as follows: This paper proposes the use of predictive models to evaluate the performance of commercial banks for the first time. In our opinion, compared with explanatory models, the predictive models can unearth more complex laws in the datasets. Aiming at the characteristics of commercial bank data, this paper proposes an adaptively reduced step size gradient boosting regression tree algorithm for bank performance evaluation. This study uses real commercial bank data from 30 provinces to conduct experiments. The experiment shows that the adaptively reduced step size gradient boosting regression tree algorithm proposed in this paper reveals the performance of commercial banks more objectively. This research not only uses predictive model methods to study bank performance evaluation from a more comprehensive perspective but also provides useful inspiration for commercial bank operations and management.
The rest of this paper is organized as follows: Section
Ravi et al. presents a soft computing based bank performance prediction system [
A series of modeling techniques were employed to predict bank insolvencies on a sample of US-based financial institutions. The empirical results indicate that the method of random forests (RF) has a superior out-of-sample and out-of-time predictive performance, with neural networks also performing almost equally well as RF in out-of-time samples [
Several machine learning algorithms have been used on a real bank credit dataset for comparative analysis and to choose which algorithms are the best fit for learning bank credit data. These algorithms gave over 80% accuracy in prediction [
There are a large number of empirical studies to analyse and evaluate machine learning techniques in the bank risk management [
We can clearly conclude that machine learning algorithms have been widely used in various areas of banking, including performance assessment, credit evaluation, risk management, customer retention, and fraud detection. However, when we carefully review the above work, it is easy to see that the machine learning algorithms used in the above work are mostly explanatory models, which are used to verify the causal relationships between observable variables in the theory. Unlike the previous work mentioned above, our work in this paper is based on predictive analysis, which has appeared less frequently in empirical studies of finance and banking. The method proposed in this paper does not assume a causal relationship between variables, and most of the models that fit well do not assume a specific functional form between variables (e.g., linear relationship, U-shaped relationships, and exponential relationships), and thus predictive models are able to uncover more complex patterns in the datasets.
This paper studies the performance of China’s provincial rural banks, that is, provincial rural banks represent the regional heterogeneity of rural banks. Fukuyama and Weber [
In order to evaluate the performance of rural banks in different provinces, this paper selects 30 provincial rural banks across the country except Tibet as the research object and uses 4 years of data to evaluate the productivity growth and decomposition efficiency indicators of provincial rural banks in China. According to the concept in the literature [
Data description.
Index | Capital stock | Staffs | Deposit | Profit | NPLR |
---|---|---|---|---|---|
Mean | 71.25 | 21976 | 1210.84 | 1449.96 | 12.43 |
SD | 61.71 | 15232 | 1163.65 | 1361.5 | 18.3 |
Min | 0.58 | 2087 | 28.26 | 33.01 | −29 |
Max | 234.15 | 60896 | 5940.71 | 6957.89 | 104.29 |
Banks use capital and human resources to make profits. Bank deposits are considered as a special resource because banks strive to attract deposits and use them as a positive indicator of performance evaluation. At the same time, they use these deposits to earn future profits. In DEA banking literature, deposit is a controversial topic. Compared with other input-output variables, deposits have the characteristics of dynamic variables. Therefore, rural bank deposits are defined as a carry-over variable. From a more comprehensive analysis, nonperforming loans (NPLs) represent bad debt risks, and there is an inevitable symbiotic relationship between bad debt risks and profits. Therefore, the nonperforming loans of rural banks are defined as the nonperforming output of rural banks.
The provincial rural bank is defined as the decision-making unit of the performance evaluation of the rural bank, and it is the research object of the performance evaluation of the rural bank. In period
In the traditional dynamic DEA model, (
Solving the above program for each DMU, we can obtain
Finally, using the above formula, we can decompose the sources of catching-up effect as
According to the above, we can decompose the sources of frontier-shift effect as
In conclusion, we decomposed the dynamic Malmqusit model as
In the clustering part, we use hierarchical clustering, gradient boosting regression tree algorithm, and other related algorithms to further cluster the above index results. The hierarchical clustering uses the BIRCH algorithm. This algorithm is mainly used when the amount of data is large and the data type is numerical. We use the adaptively reduced step size gradient boosting regression tree algorithm proposed in this paper to optimize, so as to make the clustering effect better.
The gradient boosting regression tree algorithm is widely used in clustering research in the financial field. The existing gradient boosting regression tree method has certain shortcomings. Firstly, the existing methods rely too much on data quality, which makes us often unable to achieve the desired prediction accuracy in actual modeling. Secondly, the existing methods require careful adjustment of parameters, and the training time may be relatively long. Finally, the improvement effect of existing methods is relatively limited.
Next, we will introduce the adaptively reduced step size gradient boosting regression tree algorithm. In the gradient boosting regression tree algorithm, the reduction step size is fixed, and it is determined as a parameter when starting to train the model. We now analyze the loss function of the model. Let
Given
Then, we have
Therefore, the reduction step size can be automatically updated with the current learning result to adapt to the minimization of the function.
Then, we can write the improved gradient boosting regression tree Algorithm
Input: FOR Update reduction step END FOR Output: improved gradient boosting regression tree model
The experimental data in this paper are the four-year data of 30 provincial rural banks except Tibet, including deposits, capital stock, employees, profits, and nonperforming loan rates. The five efficiency indexes decomposed by the Malmquist index method are SuEC, PuTC, SEC, DPC, and TPC. Taking Yunnan Province as an example, these five indicators are shown in Table
Five indicators of Yunnan province.
Inland | DSTFP | SuEC | PuTC | SEC | DPC | TPC |
---|---|---|---|---|---|---|
Yunnan | 0.99 | 1.00 | 1 | 1 | 0.89 | 1.12 |
The experimental process of this paper is shown in Figure
The experimental process outline.
The classification part is to divide the rural banks in 30 provinces into several groups, so that the above groups can be divided into different performance categories based on the characteristics of the efficiency of rural banks.
In clustering, we use the BIRCH algorithm and the algorithm proposed in this paper, respectively. Use the original classification results of 30 provinces as a reference to check the accuracy of clustering by these two algorithms.
As shown in Figure
The clustering result of Birch algorithm.
As shown in Figure
Gradient lift regression tree clustering.
According to the cluster analysis result and the character of decomposed efficiency in Chinese rural banks, we merge special groups for analysis, such as Group 4 and Group 6 as TPEI (traditional pure economic improved type) and Group 2, Group 5, and Group 7 as SuECI (sustainable efficiency change improved type). This grouping sounds more realistic and good to empirical analysis, so we distinguish Chinese rural banks into four type of performance as shown in Table
Four type of performance of Chinese rural banks.
No. | Abbreviation | Norm |
---|---|---|
(I) | DPCL | Dynamic progress limited type |
(II) | SuTECL | Sustainable technical efficiency changes limited type |
(III) | TPEI | Traditional pure economic improved type |
(IV) | SuECI | Sustainable efficiency changes improved type |
Types (I) and (II) rural banks perform lower than type (III). While from the sustainable development viewpoint, types (I) and (II) belong to potential banks and type (III) exists implicit crisis. We refer to type (III) as cash cows in Boston matrix. Rural banks of type (IV) are diverse. However, the unified advantage of sustainable efficiency makes them stand out as part of a sustainable development strategy. Hereafter, we analyze the four types.
Most DPCL banks are located in inland areas in China. The performances of rural banks seriously lag behind other three type banks. The main characteristic is that DPC is the only bottleneck that constrains their performance. From purely a profit viewpoint, PuTC is on the efficient frontier and TPC improves productivity growth. This suggests allocation of inputs and desirable outputs are effective and their quality growths are positive. However, the undesirable outputs and links are ineffective. That is to say, these banks aim at pursuing short-term profit and ignore long-term sustainable profit.
As shown in Figure
Decomposed efficiency indexes of DPCL.
The SuTECL banks are located in the coastal panhandle of the east area, which includes seven provinces. Besides the coastal panhandle of the east area, Shanxi rural bank also belongs to the SuTECL. The performances of these rural banks are in the bottom half in Chinese rural banks. The main characteristics are that TPC is the only benefit, and lower PuTC and medium below SuEC in SuTECL type banks limit the productivity growth of these banks. This suggests the local developed economy drives the improvement of performance. However, allocation of inputs and desirable outputs loses the customary advantage in the eastern area. That is to say, this is a big challenge because these banks ignore the basic control of factor efficiency.
As shown in Figure
Decomposed efficiency indexes of SuTECL.
The banks of TPEI are located in the panhandle of northern and central regions of China as
Decomposed efficiency indexes of TPEI.
As shown in Figure
The SuECI banks have the advantage of being new, and these type banks include Henan bank and the three municipality banks in Chongqing, Beijing, and Shanghai. The performances of these rural banks differ significantly. The main characteristics are that SuEC and TPC are higher. The characteristics mean that the performances of SuECI banks have benefited from local economic advantages and sustainable development strategy. This is an opportunity for great performance improvement because of the sustainable advantage.
As shown in Figure
Decomposed efficiency indexes of SuECI.
After analysing the four types of rural banks in my country, the results of the model before and after using machine learning technology are compared. This can more clearly show our contribution to empirical analysis.
Based on the above model, we compared the total factor productivity of China’s rural banking industry. On the whole, the use of machine learning technology has a more obvious positive effect on bank performance evaluation, especially for high-efficiency banks. It refers to provinces that are purely economically efficient, ignores sustainable development, and emphasizes short-term development. Among these banks, the rolling effect of efficiency and loan interest rates restricts the sustainable development of rural banks. As an inefficient bank in a purely economic sense, sustainable dynamic efficiency has a positive impact on its performance. For example, Xinjiang Rural Bank has a good performance and sustainable dynamic efficiency has played a positive role. The development model of the region is in good condition and needs attention.
In summary, it is still a process to incorporate sustainable development strategies into the operation and management of rural banks in my country. It can be seen from the above model that the productivity growth of rural banks is affected by catching up with the effective frontier and shifting from the effective frontier. We have made a comparative analysis of its performance from the perspective of pure economy and sustainable development.
In the current performance evaluation works of commercial banks, most of the researches only focus on the relationship between a single characteristic and performance and lack a comprehensive analysis of characteristics. On the other hand, they mainly focus on causal inference and lack systematic quantitative conclusions from the perspective of prediction. This paper is the first to comprehensively investigate the predictability of multidimensional features on commercial bank performance using boosting regression tree. Aiming at the characteristics of commercial bank data, this paper proposes an adaptively reduced step size gradient boosting regression tree algorithm for bank performance evaluation. Compared to the BIRCH algorithm for classification of existing data, our proposed gradient boosting regression tree algorithm with adaptively reduced step size obtains better classification results. This paper empirically uses data from rural banks in 30 provinces in China to classify the different characteristics of rural banks’ performance in order to better evaluate their performance.
Based on the hierarchical cluster analysis, the banks in China are divided into four groups: DPCL, SuTECL, TPEI, and SuECI. This paper also summarizes some interesting findings about the productivity growth of various types of rural banks in China, such as SuECI is worthy of attention; TPEI is potentially dangerous. The reason is that although this type of bank has good profit performance, it performs poorly in the evaluation of NPLR.
The follow-up research includes four aspects. First, we will apply external weights to all inputs, links, and outputs [
All data used in this study can be made available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.