A Random Forest Method for Identifying the Effectiveness of Innovation Factor Allocation

This paper makes a new attempt to identify the effectiveness of innovation factor allocation with a random forest method. This method avoids the evaluation bias of the relative effectiveness caused by the noneffective selection of production frontier in the nonparametric DEA method. It does not refer to other optimal subjects but shifts the focus to the judgment of its own effectiveness. In addition, it also gets rid of the constraints of the model and variables in the parameter SFA method, ensuring the reliability of the measurement results by resampling thousands of times. The data is collected from 30 provinces in China from 2009 to 2018. The findings show the innovation factor allocation in more than half of the provinces is not fully effective. It indicates that how to make use of innovation factor inputs to achieve the actual innovation output higher than own optimal levels is currently still in a period of exploration in China. To further improve innovation factor allocation efficiency, it deeply analyzes the impacts of innovation factor inputs and finds out the important innovation factor inputs. Furthermore, this study presents the nonlinear characteristics and optimal combination of important innovation factor inputs. According to this, it offers the detailed suggestions about how to adjust current important innovation factor inputs for each province in order to greatly enhance the effectiveness of innovation factor allocation in the future.


Introduction
After the outbreak of the new crown pneumonia epidemic, the global supply chain is forced to terminate, international and domestic demand are blocked, the barriers to the flow of factors are strengthened, and the phenomenon of factor misplacement aggravates. And the five major technologies of big data, cloud computing, Internet of ings, blockchain, and artificial intelligence emerge contribute to the vigorous development of the real economy and bring the great value of the digital economy. As a basic resource in the digital economy, data factors promote the transformation and upgrading of other production factors. e optimization of the spatial layout of regional innovation factors is inseparable from the support and guidance of government policies.
is not only reflects that new innovation factors such as data and institution have significantly affected the transformation of productivity and production relations, but also broadens the connotation scope of the original traditional innovation factors. In addition, the distribution of innovation factor inputs is uneven and innovation capability is generally low among provinces in China, while the calculation results of innovation efficiency are controversy. It is of great practical significance to identify the effectiveness of innovation factor allocation in each province under the new connotation of innovation factors. To this end, clarifying the multidimensional characteristics of innovation factors, selecting reliable measurement methods, and analyzing the current situation of innovation factor allocation in each province play a powerful role in stimulating the innovation factors potential and driving economic growth. e existing literature measures the allocation efficiency of innovation factors roughly from the following three levels: First, different innovation inputs and innovation outputs. Duan et al. [1] construct a comprehensive index of innovation development from the three dimensions of innovation inputs, innovation organization, and innovation outputs. Dai et al. [2] use financial resources as scientific and technological financial inputs, and scientific and technological achievements as scientific and technological financial outputs. Tao and Peng [3] regard internal R&D expenditures and full-time equivalent of R&D personnel as technological innovation inputs, and the number of patent applications received, revenue from new product sales, and revenue from main operations as technological innovation outputs. Second, different innovation efficiency evaluation objects and various innovation efficiency measurement ways. Li et al. [4] decompose the innovation process into innovation development process and innovation transformation process and then calculate the innovation efficiency of 37 subsectors in China from 2004 to 2011. Wang and Lan [5] use the SFA model to estimate the innovation efficiency of China's A-share listed state-owned enterprises. Chen and Li [6] adopt a two-stage network DEA method to obtain the green innovation efficiency, green technology research and development efficiency, and green technology achievement conversion efficiency in China from 2003 to 2017. ird, measure the allocation efficiency by the factor allocation distortion. Dollar and Wei [7] find that the systematic distortion of capital allocation in China leads to uneven marginal returns across companies, regions, and departments. ey point out that if capital can be allocated more effectively, it can lower capital intensity by 5% without sacrificing economic growth. Hsieh and Klenow [8] find that if capital and labor are redistributed so that the marginal output of China and India is equal to that of the United States, this will increase the total factor productivity of the manufacturing industries of the two countries, respectively, by 30%-50% and 40%-60%. Chen and Hu [9] show that resource misallocation leads to a gap of about 15% between actual output and optimal output in various subsectors of China's manufacturing industry.
Existing researches which discuss the allocation modes of innovative factors mainly can be divided into internal, external, and open innovation. Internal innovation means that the innovation subject relies on its own innovation factors to achieve innovative production. External innovation means that the innovation subject completes innovation activities with the help of external innovation factors. Open innovation means that the innovation subject breaks through its own innovation boundaries and coordinates internal and external innovation factors in the innovation process [10]. However, different ways of allocating innovation factors have different impacts on innovation performance. Hong and Shi [11] believe that independent R&D is significantly positively correlated with enterprise innovation performance. Chen and Hou [12] find that the influence of independent innovation on scientific and technological performance manifests a nonlinear threshold effect. Kafouros et al. [13] conclude that the impact of academic cooperation on the innovation performance in Chinese emerging market companies is significantly positive. Asimakopoulos et al. [14] point out that there is an inverted U-shaped relationship between external knowledge and enterprise innovation efficiency. Berchicci (2013) [10] believes that the inverted U-shaped relationship between external R&D activities and enterprise innovation performance is the function of the substitution effect of its own R&D capabilities. Nieves and Petra [15] find that internal and external knowledge have substitution effects within a certain threshold and have a negative impact on enterprise innovation performance. When the threshold is exceeded, it will turn into a complementary effect and has a positive impact on the enterprise innovation performance.
In summary, the definition of the connotation of innovation factors in the existing literature is mostly based on the traditional innovation factors such as labor, capital, and technology. But it lacks the content of new innovation factors such as data and institution. e existing literature ignores the theoretical basis and operation process behind the innovation factor allocation. e efficiency measurement of the innovation factor allocation in the existing literature is mostly by means of the parametric SFA and the nonparametric DEA. In addition, there are few studies on the evaluation of innovation factor allocation effectiveness. Adopting a more reliable method to identify the effectiveness of innovation factor allocation is helpful to evaluate the situation of innovation factor allocation more accurately. In view of this, the main contributions of this article are as follows: On the one hand, with the introduction of the data and institution innovation factors, we build the index of multidimensional innovation factor, which not only weighs the internal and external coordination of traditional innovation factor from the cost theory, but also considers the moderating effect of new innovation factor from the coupling theory. Based on this, a four-dimensional integrated indicator is formed including internal innovation factors, external innovation factors, institutional innovation factors, and data innovation factors and the allocation of innovation factor is evaluated on the input-output process of innovation factor. On the other hand, we make a new attempt to identify the effectiveness of innovation factor allocation by a random forest method that is different from the previously used. is method can not only get rid of constraints including the model setting form, the number of variables, and the correlation between variables in the parameter SFA method, but also avoid the evaluation bias of the relative effectiveness caused by the noneffective selection of production frontier in the nonparametric DEA method. It shifts the focus on its own efficiency, ensuring the reliability of the measurement by resampling thousands of times. e above aspects are the contribution of the theoretical level. Furthermore, in the aspect of practical level, we draw many different conclusions from existing studies, which find that the innovation factor allocation in more than half of the provinces is not fully effective, data innovation factor inputs especially data integration and data applications have the significant contribution to innovation output, the marginal impact characteristics of important innovation factor inputs are all 2 Computational Intelligence and Neuroscience nonlinear, and the gap between the current situation and the optimal combination of important innovation factor inputs in each province in China is obvious, etc. e remaining structure of this paper is as follows: In Section 2, we present the definition and measurement method of innovation factor allocation. In Section 3, we identify the effectiveness of innovation factor allocation in each province in China and discuss the importance of innovation factor inputs. In Section 4, we determine the important innovation factor inputs, describe their nonlinear characteristics and optimal combination, and then point out the future adjustment direction of innovation factor inputs in each province in China. In Section 5, we offer the main conclusions and future research.  [16]. e narrow defining of production factors can be understood as the factors joining in the production process, while the broad defining of production factors should also include factors that reflect the output benefits of the production process. Schumpeter [17] defines innovation as introducing a new combination of production factors and production conditions into the production system that has never been before. Combining modern production conditions with organizational models, innovation can be understood as not only the realization of new products (services) or procedures, the improvement of original products (services) or procedures, but also the process of new products (services) or procedures that are about to or have been commercialized [18][19][20].

Innovative Factors
To sum up, this article defines innovation factors as production factors that participate in the innovation process, influence innovation performance, and reflect innovation achievements.
ey have the characteristics of increasing marginal returns, which include not only traditional innovation factors such as labor, capital, and technology, but also new innovation factors such as data and institution.

e Definition of the Multidimensional Innovation Factor Allocation.
e cost theory holds that the criterion for organizing economic activities is the minimization of internal production costs and external transaction costs. Drawing on Nieves and Petra (2014) [15], the innovation factors are divided into internal innovation factors and external innovation factors, the internal innovation factors are derived from the innovation subject itself, the external innovation factors are derived outside the innovation subject, and the internal and external innovation factors are mainly based on traditional innovation factors. On the one hand, the innovation subject relies on internal innovation human, material and financial resources, regards innovation as an internal control process, completes all innovation links and leaves innovation output within the subject. On the other hand, the innovation subject combines their own existing innovation foundations and regards innovation as a R&D outsourcing process. By contacting external innovation partners, they achieve technology cooperation, absorption, and acquisition of external innovation and enlarge enterprise innovation outputs. If the innovation subject is overconcentrated on internal innovation, it will cause problems such as excessive risk and knowledge spillover, while overreliance on external source innovation will bring about problems such as rising negotiation costs and loss of initiative. erefore, the innovation subject needs to effectively coordinate internal and external innovation factors to minimize the total cost of innovation activities.
e coupling theory states integrating the coordination channels of internal and external traditional innovation factors, enhancing the coupling viscosity of internal and external traditional innovation factors, and reducing the misallocation of internal and external traditional innovation factors, which require both the participation of innovation subjects themselves and innovation assistance from forces other than the subjects. We refer to analysis of Berchicci [10]. On the one hand, the government-led institutional innovation factors regulate the depth of internal and external innovation factors coordination, integration, and fit by intellectual property protection, financial education support, free trade in goods, market transaction quotas, financial development scale, and industrial pollution control. On the other hand, data innovation factors led by intermediaries regulate the breadth of internal and external innovation factors synergy, integration and adaptation by data production level, data transmission speed, data user groups, data application scope, data sharing degree, and data integration capabilities. Institution and data innovation factors play the regulatory role of institutions and information (intrinsic nature of data) on internal and external innovation factors, thereby forming a multidimensional integrated indicator of internal innovation factors, external innovation factors, institutional innovation factors, and data innovation factors.
Based on the above analysis, this article draws the definition of the multidimensional innovation factor allocation. e cost theory synergizes internal and external innovation factors. e measurement indicator of internal innovation factors can be from the three aspects of internal innovation human resources, material resources, and financial resources. e measurement indicator of external innovation factors can be from three aspects such as external innovation technologies cooperation, absorption, and acquisition. e regulation role of institution and data innovation factors depends on coupling theory. Institution innovation factor measurement indicator can be from six aspects including property rights protection, education support, trade freedom, trading markets, financial development, and pollution control. Data innovation factor measurement indicator can be from six aspects including data generation, transmission, use, application, sharing, and integration.
Computational Intelligence and Neuroscience 3

Indicator Composition.
is paper is based on the definition of innovation factors allocation, the operability of indicators, and the availability of data. Combining with the research of Tao and Xu [21], we select the specific quantitative indicator. e internal innovation human resources, financial resources, and material resources in the internal innovation factors are, respectively, measured from the full-time equivalent of R&D personnel, internal R&D expenditures, and the number of R&D organizations. e external innovation technology cooperation, absorption, and acquisition in external innovation factors are, respectively, measured from external R&D expenditures, foreign direct investment, and high-tech product imports. Property rights protection, education support, trade freedom, trading market, financial development, and pollution control in the institution innovation factors are, respectively, measured from the total number of patent enforcement cases, financial education subsidies, total regional goods exports, technology market turnover, deposits and loans of financial institutions, and industrial pollution control investment. e data generation, transmission, use, application, sharing, and integration in the data innovation factors are, respectively, measured from the number of Internet pages, Internet broadband access ports, Internet users, total software services, total post and telecommunications services, and the number of robots owned. Taking the above four dimensions of internal, external, institution, and data innovation factors as innovation inputs, and taking per capita sales income of new products as innovation output, we evaluate China's innovation factors allocation efficiency.

Variables and Data.
is paper is based on the panel data of 30 provinces in China from 2009 to 2018 (except Tibet). e data are extracted from China Statistical Yearbook on Science and Technology, China Statistical Yearbook, China Internet Development Report, China Information Yearbook, and International Federation of Robotics. e acquisition of the variables and the processing of the data are performed as follows: Regarding the matches and fillings of the data, the first is the number of robots owned in China based on the industrial output value of the main applications of industrial robots in each province. e second is filling the missing data of provinces according to the difference between the national total and existing provinces total. In addition, with regard to the methods of the processing of the data, firstly, exchange rate adjustments are converted by the ratio of RMB to USD. Secondly, the price adjustment refers to the price index formula of Zhu and Xu [22]. e fixed asset investment index and the consumer price index are, respectively, assigned weights of 0.45 and 0.55. Regarding the year 2009 as the price base period, convert the nominal values into actual values.
irdly, using the method of BEA for stock adjustment, based on the annual investment flow after price adjustments E t , we calculate the average annual growth rate g k . en, considering amortization of intangible assets for at least 10 years, and referring to Wang and Gao [23], set the residual value rate d T to 10% and get the depreciation rate e above contents are detailed in Table 1.

A Random Forest Method.
e existing researches of measuring efficiency mainly use nonparameter DEA method and parameter SFA method. e DEA method is used to evaluate relative efficiency by constructing the optimal production frontier. On the one hand, it refers to other effective subjects instead of its own optimum, which is difficult to purposely offer the adjustment direction according to itself current inputs. On the other hand, it may occur efficiency measurement bias if the optimal production frontier is selected mistakenly. In addition, the SFA method measures efficiency by determining the form of model and the number of variables in advance. It means the differences of the form of model and the number of variables lead to the different measurement results. Meanwhile, the collinearity between variables in the model also causes biased efficiency measurement. Compared with DEA and SFA, the random forest method not only gets rid of constraints including the form of model, the number of variables, and the correlation between variables, but also ensures the reliability of measurement by resampling thousands of times. With reference to Ouyang and Chen [24], the prediction of output can be regarded as the maximum output achieved under the full utilization of inputs by the random forest method. According to this, we can calculate own optimal output in terms of current inputs and then identify effectiveness by the ration of actual output to optimal output. It does not refer to other optimal subjects but shifts the focus to the judgment of its own effectiveness. Moreover, it gets the contribution rankings of various inputs to outputs from large to small based on the principle of minimum sum of squares of residuals. We further get the marginal impact characteristics of inputs and find out the optimal combination of inputs and thus put forward the further adjustment direction depending on own current situation of inputs. e introduction of random forest method is below in detail.
A random forest method is an ensemble learning method. is method uses resampling to randomly select sample data and sample characteristics to establish multiple regression numbers, and the mean value of the output results of multiple regression trees is used as the final prediction result. Based on the data of each input, this paper predicts the results of each output according to the random forest method. e detailed steps are as follows: (ii) Construct a regression tree with features m selected from the sample, and perform feature splitting according to the principle of the minimum sum of squared residuals. Specifically for the input indicator c k , the critical point at which the optimal threshold is known is d (the determination of d also adopts the principle of minimum sum of squares of residuals). en according to the sample size of the left and right sides of the critical points T U1 and T U2 , we, respectively, get the sample of U1 and U2: From this, the average value of the output results of each sample can be obtained: e selection of the initial split node c k of the regression tree is based on the following formula: (i) e selection of the remaining nodes of the regression tree is repeated based on step (ii). e prepruning rule is set to include at least 5 sample points for each node. When the rules are met, the regression tree immediately stops splitting. In addition to regression analysis, the random forest method can also evaluate the importance of each input and its marginal influence on the output. e nodes of the regression tree are arranged from top to bottom according to the contribution of each input to the reduction of the residual sum of squares of the output. e top is the input with the largest contribution, and the bottom is the input with the least contribution. According to this, the importance of each input can be obtained. By sorting the importance of input, we can get the important order of input. e marginal effect is mainly used to measure the impact of a single input change on the output, which can be defined as In equation (4)

Effectiveness Identification.
We use random forest method, the algorithm is run 1000 times, and the average value of 1000 times is regarded as the final prediction of innovation output. Based on the above analysis, it is also the optimal innovation output achieved by innovation factor inputs. en we get the ration of actual output to optimal output, which is the allocation efficiency of innovation factors from 30 provinces in China from 2009 to 2018. Based on this, we identify the effectiveness of innovation factor allocation for each province in China and offer the corresponding schematic graph shown in Figure 1.
In Figure 1, the black solid dots indicate that the province's actual innovation output exceeds its own optimal output, realizing the effective allocation of innovation factors, while the gray solid dots indicate that the province's actual innovation output is lower than its own optimal output. e effective allocation of innovation factors has not been completed. According to the allocation of innovation factors in each province from 2009 to 2018, each province is divided into the following three types: one is fully effective, and the province achieves an effective allocation of innovation factors during the entire sample period; the second is not fully effective, and the province only achieves the effective allocation of innovation factors in a part of the sample period; the third is totally ineffective, and the province does not meet the requirements of the effective allocation of innovation factors during the full sample period.
As can be seen from Figure 1, the provinces with innovation factor allocation fully effective are Tianjin, Shanxi, Shanghai, Zhejiang, Hubei, Hunan, Chongqing, and Ningxia. e above provinces can not only effectively coordinate internal and external innovation factors during the entire sample period, but also give full play to the coupling role of institution and data innovation factors, organically integrate the four-dimensional innovation factors into one, and obtain innovation output that exceeds its own best state. Although Chongqing and Ningxia are located in the relatively backward western regions, there is a certain gap in the innovation factor inputs compared with the developed regions, and the optimal output that can be achieved by the current innovation factor inputs is relatively low, so the actual output of the two provinces reaching the optimal level is also relatively easy. e provinces where the allocation of innovation factors is not fully effective are Beijing, Inner Mongolia, Liaoning, Jilin, Jiangsu, Anhui, Fujian, Jiangxi, Shandong, Guangdong, Guangxi, Gansu, Qinghai, and Xinjiang. In the abovementioned provinces, the innovation output obtained by the innovation factor inputs in only some years is higher than its own optimal level. Although Beijing, Jiangsu, Fujian, Shandong, and Guangdong are located in eastern regions with superior innovation conditions and abundant innovation resources, their innovation factor inputs are relatively more, and the optimal output that can be achieved by relying on their existing innovation factor inputs is also relatively high. e actual output beyond its own optimal level is also relatively difficult. e provinces where the allocation of innovation factors is totally ineffective are Hebei, Heilongjiang, Henan, Hainan, Sichuan, Guizhou, Yunnan, and Shaanxi. e actual output of the above eight provinces during the entire sample period was lower than their optimal level. Although Hebei is located in the economically developed eastern region, the relatively high innovation factor inputs have caused structural imbalance, and it is difficult to fully release the efficacy of innovation factor inputs.

Importance Judgment.
From the foregoing, the random forest method can get the importance of the innovation factor inputs according to the contribution to the reduction of the square of the residual of the innovation output. It puts the most important innovation factor input at the top of the tree and the least important innovation factor input at the bottom of the tree. erefore, the higher the innovation factor input located at the level of the tree, the greater the contribution to the innovation output. According to the importance of the innovation factor inputs from 2009 to 2018 (see Table 2), this paper draws a grid graph and a boxplot graph (see Figures 2 and 3). e grid graph reflects the contribution degree, the changing trend, and the abnormal time point. e larger the grid area, the greater the importance. e boxplot presents the average contribution and scatter degree. Based on the above analysis, it provides clear direction for improving the effectiveness of innovation factor allocation.
In Table 2 and Figure 2, it can be seen that the contribution of data application is the most prominent in 2009 6 Computational Intelligence and Neuroscience     technology acquisition, education support, trading market, and data application. ird, the number in 2009 and 2016 is three, which, respectively, include property rights protection, data transmission and data use in 2009, and internal innovation financial resources, internal innovation material resources, and data integration in 2016. Forth, the number in 2013 is two, respectively, including trade freedom and data sharing, the input in 2015 is data generation, and there is none in 2011, 2012, 2014, and 2017. is shows that the input structure of innovation factors in 2018 is more reasonable than other years, it gives full play to the advantages of innovation factor inputs, and there is the most innovation factor input contributing its greatest impact.
In Table 2 and Figure 3, the average contribution of innovation factor inputs during the entire sample period is ranked from large to small, followed by data integration, data application, external innovation technology absorption, internal innovation material resources, external innovation technology acquisition, financial development, internal innovation human resources, trade freedom, internal innovation financial resources, data generation, external innovation technology cooperation, trading market, data transmission, data sharing, data use, property rights protection, pollution control, and education support. e ranking top two belong to data innovation factors. It confirms not only the importance of the data innovation factor inputs, but also the data integration measured by the number of robots owned and the data application measured by the total amount of software business, which implies artificial intelligence and advanced manufacturing playing the important role within reforming the synergy of internal and external innovation factors and promoting the acceleration of innovation output. Data innovation factors have the characteristic of strong integration. ey can overcome the limitations of time and space, which is conducive to the optimization and upgrading of internal and external innovation factors and the complementary advantages between internal and external innovation factors, which greatly promotes the improvement of innovation output. Internal innovation human resources, internal innovation financial resources, external innovation technology absorption, education support, pollution control, and data transmission all have outliers. Five have large outliers which all appear at the largest time point of its own contribution, and three appeared in 2018. It not only illustrates the relatively Computational Intelligence and Neuroscience reasonable input structure of innovation factors in 2018, but also implies that the overall input structure of innovation factors in China is undergoing continuous adjustment and improvement.

Marginal Effect Trend.
e above discusses the importance of each innovation factor input to the innovation output under the existing innovation factor inputs combination, which is the static analysis of the impact on the innovation output. If the innovation factor input has a significant impact on innovation output, will it continue to increase inputs or be subject to threshold constraints? e answer to this question requires us to give the dynamic changes in the marginal effect trend of innovation factor input.
Based on the importance ranking of innovation factor inputs, the top eight cumulative impacts exceeding 60% of the total impact are regarded as important innovation factor inputs, followed by data integration, data application, external innovation technology absorption, internal innovation material resources, external innovation technology acquisition, financial development, internal innovation human resources, and trade freedom. At the same time, it is found that the above inputs are evenly distributed in the internal, external, institution, and data innovation factors. In order to obtain the time changing trend of the marginal effects of each important innovation factor, the marginal effect graph of each important innovation factor input is given in the order of the abovementioned fourdimensional innovation factors (see Figure 4). We select 2009 (dotted line type), 2012 (lower solid line), 2015 (dashed line), and 2018 (higher solid line) for describing in detail.
ere are three key boundary points in the marginal effect trend graph in Figure 4, which are two turning points (indicated by ×) and the maximum slope point (indicated by * ). Before the first turning point is the initial stage of innovation factor input impact, and the marginal impact on innovation output is relatively flat. e first turning point to the maximum slope point is the marginal impact on innovation output from the increasing to the maximum. e stage between the maximum slope point and the second turning point is the marginal incremental impact of the innovation factor input fallen to the suspended. e second turning point is the innovation factor input required for maximum innovation output. e marginal impact of increasing the innovation factor input on innovation output is almost zero after the second turning point. It can be concluded that the optimal innovation factor is the amount of innovation factor input required when the innovation output reaches its maximum output. If all the important innovation factor inputs in a province have exceeded the second turning point, then the province's important innovation factor investment portfolio is the best. Based on this, we give the key boundary points of the marginal effect of important innovation factor inputs in 2009, 2012, 2015, and 2018 (see Table 3).
Combining the time trend of the marginal effects of important innovation factors, we find that the impact on innovation output is nonlinear in whole process. In addition, It indicates that, comparing 2018 with the other years, the demand for internal innovation human resources is relatively small, and the demand for other important innovation factors is greater in the optimal innovation factor inputs required to achieve the maximum innovation output.

Inputs Selection Analysis.
e ranking top eight of the innovation factor inputs' importance in 2012 and 2018 are consistent with the result of the average contribution ranking in Table 2. erefore, the input of important innovation factors in each province in 2012 and 2018 is carried out by grouping accordingly. It can not only analyze the time changes of important innovation factor inputs, but also propose future adjustment direction based on the current innovation factor inputs in each province.
According to the grouping of important innovation factor inputs in each province in 2012 and 2018 (see Table 4 Based on the summary of important innovation factor inputs grouping characteristics in each province in 2012 and 2018 (see Table 5), the changes in the following grouping characteristics of each province are obtained: Firstly, the important innovation factor inputs in some provinces in 2012 and 2018 have always maintained the characteristics (a), such as Qinghai and Ningxia. Although Ningxia shows the innovation factor allocation is effective in 2012 and 2018, the optimal innovation output achieved is low due to its existing innovation factor inputs. It implies that the above provinces need to significantly increase the important innovation factor inputs, which can not only increase the actual innovation output, but also raise the optimal output threshold standard. Secondly, the important innovation factor inputs in some provinces in 2012 and 2018 remain at the characteristic (b), such as Tianjin, Hebei, Liaoning, Jilin, Anhui, Henan, Hunan, and Chongqing. e above provinces may refer to the contribution ranking of important innovation factor inputs and determine the appropriate direction for the increase in the important innovation factor inputs.
ere are also some provinces where the important innovation factor inputs have dropped from the characteristics (b) in 2012 to the characteristics (a) in 2018, such as Shanxi, Inner Mongolia, Heilongjiang, Jiangxi, Guangxi, Hainan, Guizhou, Yunnan, Gansu, and Xinjiang. e above provinces input less in data applications. e contribution of data application input is among the best. Increasing the inputs of data application can not only make the provinces get rid of the characteristics (a), but also promote the improvement of    innovation output. ere are also some provinces where the investment in important innovation factors has risen from the characteristic (b) in 2012 to the characteristic (c) in 2018, such as Fujian, Hubei, Sichuan, and Shaanxi. e grouping characteristics have been upgraded, and the above provinces need to continue to maintain this situation, or they can increase investment in innovation factors which have a large gap such as trade freedom and other factors. irdly, the input of important innovation factors in some provinces in 2012 and 2018 has always remained at the characteristic (c), such as Beijing, Shanghai, Zhejiang, and Shandong. e above provinces may be based on the gap of some important innovation factors between current and optimal inputs, combined with the contribution of important innovation factor inputs, giving priority to the important innovation factors with small gaps and large contributions. Fourthly, the inputs of important innovation factors in Guangdong in 2012 and 2018 have met the requirements of characteristics (d). It means that Guangdong has realized the effective allocation of innovation factors in 2018. If they want to improve the optimal innovation output standard, it is more difficult to start from the input end of innovation factors. It may be achieved by changing the existing innovation development patterns. e important innovation factor inputs in Jiangsu Province have declined from the characteristics (d) in 2012 to the characteristics (c) in 2018. Jiangsu Province may continue to increase inputs in financial development and data applications in the future.

Conclusions and Future Research
Innovation is the driving force of economic development, and effectively measuring the allocation of innovation factors is conducive to better playing the role of innovation. We use the random forest method to identify the effectiveness of innovation factor allocation in 30 provinces in China from 2009 to 2018 and explore the importance of the innovation factor inputs' contribution in the entire sample period and its own maximum contribution timing. en we further grasp the marginal effect characteristics of important innovation factor inputs, catch the time changes of its key boundary points, analyze the actual inputs of each province based on the optimal combination of important innovation factor inputs, and finally propose future adjustment directions. e specific conclusions reached are as follows.
First, more than half of the provinces is not fully effective in the innovation factor allocation. How to rely on internal and external innovation factors and make use of institution and data innovation factors to achieve the actual innovation output higher than their own optimal levels is still in a period of exploration and adjustment. Second, from the perspective of the importance of innovation factor inputs, data innovation factors especially data integration and data applications have contributed significantly, which reflects the important impact of data innovation factors on innovation output. From the view of the maximum contribution time of innovation factor input, the most innovation factor inputs contribute its own maximum value in 2018, suggesting that the structure of China's innovation factor input has an overall upward development trend.
ird, the marginal effects of important innovation factor inputs are nonlinear, and the positions of the first turning point, the maximum slope point, and the second turning point of the marginal effect of important innovation factor inputs in different periods are different, indicating that the marginal effect of important innovation factor inputs on innovation output is not always the same. Based on this, we find out the optimal combination of important innovation factor inputs. Fourth, Guangdong Province has achieved the optimal combination of important innovation factor inputs or can change the existing innovation development pattern, while the remaining provinces have not yet reached the optimal factor input requirements and need to adjust the inputs to improve the current innovation choice.
Based on the above conclusions, the following policy implications can be obtained. Firstly is improving the effectiveness of the innovation factor allocation further. We not only need to pay attention to the coordination of internal and external innovation factors, but also focus on the coupling of institution and data innovation factors. Secondly is giving full play to the potential of data innovation factors, especially data application and data integration: Developed provinces may be able to make further development, while backward provinces may be able to achieve curve overtaking.
irdly is grasping the marginal effects of important innovation factor inputs, and making good use of the impact trend of the important innovation factor inputs. Fourth, each province makes appropriate innovation development policies based on local conditions and rationally adjusts the current innovation factor inputs selection according to the optimal combination.
Needless to say, this paper has some limitations, which are also the basis of future research in this paper. Firstly, when evaluating the effectiveness of innovation factors allocation, this paper takes internal innovation factors, external innovation factors, institutional innovation factors, and data innovation factors as innovation factor inputs, and per capita new product sales revenue as innovation output. Except for per capita new product sales revenue, it can also be measured by indicators such as per capita gross national product in the evaluation of innovation output. erefore, it is necessary to add other variables that can measure innovation output to identify the effectiveness of innovation factor allocation in further research, so as to more comprehensively show the allocation situation of innovation factors in various provinces in China.
Secondly, this paper takes China's provinces as the research object and evaluates the innovation factor allocation from the macrolevel. In addition, it also has practical significance of evaluating the innovation factor allocation from the meso-and microlevels. erefore, it is necessary to analyze the innovation factor allocation from the industry level and the enterprise level and further explore the future adjustment direction of the innovation factor allocation at the meso-and microlevels.
Finally, when analyzing the impact of innovation factor input, this paper focuses on the marginal effect trend and key boundary points of important innovation factor input but does not show the impact of remaining innovation factor input on innovation output. Although the impact of this part of innovation factor inputs is not significant, they also make the marginal contributions to innovation output. In the future, we may further explore the marginal effect characteristics and key boundary points of such innovation factor inputs. It may draw other important conclusions.

Data Availability
is paper is based on the panel data of 30 provinces in China from 2009 to 2018 (except Tibet). e data sources and processing methods are shown in Table 1.

Conflicts of Interest
e authors declare no conflicts of interest.