R&D investment is an important way to improve scientific and technological innovation capabilities. In an increasingly competitive market, the rapid changes in science and technology have brought new opportunities for enterprise development. However, if production factors cannot be rationally allocated, low allocation efficiency or low allocation efficiency is likely to occur. The phenomenon of excessive overflow of production factors makes the input factors unreasonable and causes the problem of lowering the economic output of enterprises. Therefore, this article analyzes the feasibility and timeliness of R&D investment from factors of production and enterprise output performance based on data mining. The problem is to optimize the rational allocation of future factors of production and provide assistance in achieving a combination of existing and new factors of production. For this test, we selected the companies listed in the Growth Enterprise Market and the survey period is from 2018 to 2020. The data is taken from the Guotaian database, some of which is obtained by manually reading the company’s annual report, and a multiple regression analysis model is established and tested. The relationship between R&D investment, production factors, and corporate performance is obtained. The group regression method is used to test the impact of production factors on R&D. Whether input and corporate performance have a moderating effect, and the specific moderating and lagging effects of production factors are investigated. Experiments have proved that the nonstandardized coefficient of R&D investment intensity and operating gross profit margin is 0.714, and the

In today’s global economic integration, the concept of technological innovation has sounded the clarion call of the new industrial revolution one after another, and the scientific and technological capabilities of enterprises have become the touchstone of whether enterprises can survive. In recent years, Chinese companies are growing rapidly as a whole, but their growth point remains at the top of the total economic volume, and their capacity and quality development are not well coordinated. A series of common problems, such as low technical content and low market value of finished products, have become bottlenecks that limit the development and growth of many companies, making them difficult to even break through.

In today’s world, economic downward pressure is relatively high, companies want to create economic benefits in the downturn, and research and development activities may be another option for companies. As for the economic consequences that R&D investment will bring to enterprises, many scholars at home and abroad have also conducted fruitful research on this, but the research conclusions are inconsistent and there are big differences. As the main body of scientific and technological innovation, enterprises must proceed from their own perspective and demand resource allocation that meets their own conditions, and the ultimate goal of the enterprise is to make profits and maximize the effectiveness of production factors in order to improve performance more effectively.

Kim and Yang proposed that it is a top priority to realize the transition from traditional agriculture to modern market agriculture as soon as possible in the construction of a new socialist countryside. Realizing market agriculture can improve China’s agricultural core competitiveness, increase farmers’ income, and contribute a lot. They also discussed the current situation and influencing factors of the flow of production factors in the process of implementing market-oriented agriculture in our country and discussed the means to promote the circulation of agricultural production factors. However, their research did not clearly propose how to realize the transition from traditional agriculture to modern market agriculture, and the overall research lacks data support [

This article will apply the data mining classification algorithm to the actual problems of the performance evaluation of the market-oriented enterprises of production factors. The data comes from the Guotaian database, or by consulting the GEM 2018-2020 annual report, applies the algorithm to the actual problems and improves the market. In this article, we add Tobin’s

Data mining method

Data mining is different from ordinary information retrieval. Ordinary information retrieval is to directly obtain the required content through query commands, while data mining is to obtain effective information from the data through association rules and machine learning algorithms [

Data mining framework diagram.

The prediction task predicts the value of the target attribute based on the existing attributes, mainly regression tasks and classification tasks. The target variable predicted by the regression task is a continuous variable, and the target variable predicted by the classification task is a discrete variable. However, both of these are used to train the prediction model through the training set. The target variable of the training set is known; that is, the training set there is labeled data, so the generated model is a mapping of existing attributes and target attributes generated under supervision, which is a supervised learning method. Descriptive tasks generally summarize the potential association patterns in the data, mainly including cluster analysis, association analysis, and anomaly detection [

Data preprocessing

Reasons for data preprocessing

The purpose of data preprocessing is to obtain reliable data for data mining tasks. There are many reasons for the low quality of data, such as equipment failure during the data collection stage, human error during data input, technical errors during data transmission, and the user’s cover-up of information [

Data preprocessing method

Data cleaning is to remove the “dirty data” from the data. Smooth noise, general data will have random noise, and the data obtained by smooth processing is more realistic; the detection and deletion of outliers, generally speaking, the data of outliers has a lot of noise, which can be considered noise data. The training of the model is unfavorable, and the general processing method is to delete it directly. Since data is generally collected in different systems, and these systems are generally independent, the data is also isolated data from each other [

For the performance evaluation of the algorithm, in addition to the test set, the performance evaluation index of the algorithm is also required. For different tasks, there are different algorithm performance evaluation indicators to compare the effects of different algorithms or the same algorithm with different parameters [

Accuration (refers to the proportion of correctly classified samples to the total number of samples)

Precision (refers to the proportion of the number of positive samples predicted to be correct to the total number of positive samples predicted)

Recall (refers to the proportion of the number of positive samples that are correctly predicted to the total number of positive samples)

Confusion matrix of classification results.

Reality forecast result | Positive example | Counterexample |
---|---|---|

Positive example | TP (real positives) | FN (false counterexample) |

Counterexample | FP (false positives) | TN (true counterexample) |

For a classification problem, the prediction accuracy and recall are usually mutually restricted, and the

ROC and AUC

For the predicted value, the larger the probability that the sample belongs to the positive sample, the same, the smaller the value, the greater the probability that the sample belongs to the negative sample. For practical applications, if we pay more attention to the accuracy rate, we can increase the threshold; if we pay more attention to the recall rate, we can lower the threshold [

AUC is the area included under ROC. When AUC is greater than 0.5, the classification algorithm is effective, and the larger the AUC, the stronger the generalization ability of the classification algorithm. When the AUC is less than or equal to 0.5, the classification algorithm is invalid [

Naive Bayes

Bayes’ theorem is defined as follows:

Assume that there are

For the unknown sample

So, the expression of the Bayesian classification algorithm can be written as

Logistic regression

The Sigmoid function is similar to a step function. At the jumping point, it can be regarded as jumping from 0 to 1 in myopia, which meets the requirements of classification. At the same time, the differentiability of the function ensures that the solution is more convenient [

Among them,

Ultrasound images have high requirements for edge detail and are nonstationary signals that cannot be met by traditional Fourier transform-based signal denoising methods. Ultrasonic speckle suppression and denoising methods can be broadly divided into spatial area local statistical filtering, anisotropic diffusion filtering, and wavelet transform-based filtering.

If

Obviously,

To estimate

The loss function

This formula cannot be solved analytically, and the gradient descent algorithm can be used to approximate the solution to obtain the best regression parameter

Dependent variable: corporate performance

There are various indicators that reflect the performance of enterprise R&D input and output, including financial and nonfinancial. Nonfinancial indicators include upgrade of production process, number of patent applications, and update of knowledge and skills, considering that the measurability and availability of this type of data is difficult, so when combined with assumptions and samples, Choose financial indicators to measure R&D input and output. Tobin’s

Independent variable: R&D investment

The R&D expenditure is divided into two parts: expense and capitalization. It includes the actual R&D investment of the enterprise in the year, the investment of personnel and equipment, and the investment of information and creativity. On a quantifiable basis, the current R&D investment amount is obtained. In order to avoid a large gap in the value of the sample’s R&D investment due to the difference in company size, the R&D investment ratio is selected as an independent variable.

Categorical variable: intensity of production factors

The factors of production are divided into three dimensions: labor, capital, and technology. Different measurement standards are used for the dimensional division of production factors, and the intensity division is completed according to corresponding indicators and methods. Capital-labor ratio and fixed asset ratio are selected to measure capital-labor intensity. The ratio of capital to labor can reflect the quotas between the two most basic factors of production; the ratio of fixed assets can indicate whether the company has idle fixed assets and the importance of fixed assets, and the two can be directly measured and complemented and analyzed, and the ratio obtained the larger the value, the more important the capital. The ratio of R&D expenses to product costs is cited to reflect how much R&D expenses are condensed in each unit cost; the ratio of R&D personnel to the number of employees reflects the personnel input of the company in conducting R&D activities, combining the above two indicators to distinguish between technology and nontechnology type enterprise.

Control variables

Enterprise scale

Since the scale of listed companies has an impact on output effects, large-scale companies often have certain accumulated advantages, so company scale is a control variable that needs to be established. Since R&D investment not only has an impact on the value of fixed assets but also has a more important impact on the value of intangible assets. In addition, in the R&D investment intensity index, the operating income has been calculated as the denominator, so the total assets are taken as the natural logarithm.

Asset-liability ratio

Since R&D activities are a corporate activity with high capital investment, and the sample selected companies listed on the Growth Enterprise Market, the sample has the characteristics of short business cycle, fast replacement, and large growth inertia. The specific situation of corporate debt management will eventually be reflected in business performance. Therefore, from the perspective of the relationship between corporate capital stock and technological innovation capabilities, the asset-liability ratio is used as a control variable to indicate the abundance of corporate funds. The relationship and meaning between the variables are shown in Table

List of variable relations and meanings.

Variable type | Variable meaning | Variable value and method description |
---|---|---|

Dependent variable | Enterprise market performance | Market value/total assets at the end of the year |

Corporate financial performance | Main business income-main business cost/main business income | |

Independent variable | R&D investment intensity | R&D investment/operating income |

Categorical variables | Capital labor ratio | Fixed assets/labor force |

Proportion of fixed assets | Net fixed assets/total assets | |

Proportion of R&D expenses | R&D expenses/product production costs | |

Proportion of R&D personnel | Number of R&D personnel/total number of employees | |

Control variable | Enterprise size | Natural logarithm of the company’s total assets |

Assets and liabilities | Total liabilities/total assets |

This test selects companies listed on the Growth Enterprise Market, and the research period is from 2018 to 2020. The data comes from the Guotaian database, partly obtained by manually reading the company’s annual report. Since the China Securities Regulatory Commission did not include the R&D investment data in the scope of mandatory disclosure by listed companies, the R&D investment data is not available in databases such as Guotaian. Therefore, such data is manually collected by reading the company’s annual report. After strict screening of the sample data through the screening criteria, a total of 2942 sample data were obtained.

Screening criteria

Financial data for the period 2018-2020 must be complete. If something is missing, the sample will not be used. If the data is complete, the subject should be representative. If your company’s industry data is less than 5 (including 5), it will be deleted. Due to the peculiarities of the financial industry, data on listed companies in the financial industry have also been deleted.

For companies, the relationship between R&D activities and corporate performance will also be affected by the company’s own “personality.” The exact same R&D management model is effective for some companies, but invalid for some companies. The “personality of an enterprise” can be expressed as the “foundation” of the enterprise, that is, the factors of production, specifically the number of laborers, the richness of assets, or the level of technology. When the independent variable is a continuous variable and the adjustment variable is a categorical variable, the adjustment effect can be tested by the group regression method; that is, after the adjustment variable

We establish a multiple regression analysis model to test the relationship between R&D investment, production factors, and corporate performance. The establishment steps are as follows: put independent variables and control variables into the model and perform regression analysis to explore the relationship between R&D investment intensity and corporate performance and introduce adjustment variables, divide the samples into three groups in the three-dimensionality groups of production factors, and build the model as follows:

Among them,

Statistical analysis was carried out with SPSS 13.0 statistical software. The significance test of the difference was performed by one-way analysis of variance, the difference between the two groups was tested by LSD-t, and the statistics of intelligent data mining analysis results of the market-oriented circulation of production factors were performed by the group

Overall situation analysis

Here, we first analyze the overall situation of GEM listed companies from 2018 to 2020 according to the intensity of R&D investment. The results are shown in Figures

Chart of R&D investment intensity over time.

Tobin’s

Graph of operating gross profit margin over time.

It can be seen from Figure

It can be seen from Figures

As can be seen from Figures

Analysis of each grouping situation

It can be seen from Table

Descriptive statistics grouped by the intensity of production factors from 2018 to 2020.

Intensity of production factors | Minimum | Maximum | Mean | Standard deviation | ||
---|---|---|---|---|---|---|

Labor intensity | R&D investment intensity | 64 | 0.003 | 0.450 | 0.045 | 0.061 |

Tobin’s | 64 | 0.976 | 9.736 | 3.613 | 1.838 | |

Operating gross profit margin | 64 | 0.056 | 0.663 | 0.304 | 0.141 | |

Capital intensity | R&D investment intensity | 462 | 0.003 | 0.166 | 0.051 | 0.028 |

Tobin’s | 462 | 0.619 | 19.544 | 3.623 | 2.514 | |

Operating gross profit margin | 462 | 0.030 | 0.959 | 0.347 | 0.169 | |

Technology intensity | R&D investment intensity | 597 | 0.009 | 0.728 | 0.091 | 0.084 |

Tobin’s | 597 | 0.844 | 18.105 | 4.346 | 2.298 | |

Operating gross profit margin | 597 | -0.060 | 0.980 | 0.358 | 0.182 |

This paper uses SPSS 13.0 software to analyze the data and conducts correlation analysis before regression analysis. Correlation analysis is used to describe the degree of interdependence between variables. It can detect whether there is autocorrelation between explanatory variables in the model. The results of correlation analysis are shown in Table

Correlation analysis data sheet.

R&D investment intensity | Assets and liabilities | Enterprise size | Tobin’s | Operating gross profit margin | |
---|---|---|---|---|---|

R&D investment intensity | 1 | — | — | — | — |

Assets and liabilities | -0.293 | 1 | — | — | — |

Enterprise size | -0.132 | 0.461 | 1 | — | — |

Tobin’s | 0.165 | -0.216 | -0.273 | 1 | — |

Operating gross profit margin | 0.378 | -0.413 | -0.119 | 0.419 | 1 |

It can be seen from Table

The influence of R&D investment intensity on Tobin’s

The statistical performance of this model is significant, reaching 19.118, indicating that the regression model of R&D investment intensity and Tobin’s

The impact of R&D investment intensity on operating gross profit margin

The regression result of R&D investment on Tobin’s

The statistical performance of this model is significant, reaching 117.457, indicating that in the regression analysis of operating gross profit margin, the regression model of R&D investment intensity and operating gross profit margin is overall significant. The adjusted model is 22.5%, indicating that the regression equation explains 22.5% of the variation of operating gross profit margin. At the same time, the collinearity diagnosis result also shows that there is no collinearity problem among the explanatory variables in the equation. The nonstandardized coefficient of R&D investment intensity and operating gross profit margin is 0.714, and the

The regression result of R&D investment on operating gross profit margin.

This article uses the 2018-2020 GEM listed companies as a sample, based on the perspective of production factors, to study the impact of R&D investment and corporate performance. The article divides the entire industry on the Growth Enterprise Market into labor-, capital-, and technology-intensive industries by calculating the corresponding production factor intensity indicators. Through the use of statistical software SPSS 13.0, descriptive statistical analysis, correlation analysis, and regression analysis were performed on the full sample and subsamples. At the same time, the group regression method was used to test whether production factors have a moderating effect on R&D investment and corporate performance. It also examines the specific adjustment and hysteresis effects of production factors in it. In the research of this article, the factors of production are divided into three parts: labor factors, capital factors, and technology factors. It is found that the factors of production have a significant impact on the relationship between R&D investment and corporate performance. This conclusion will guide my country’s GEM in the future. Companies with different factor intensives carry out R&D activities to improve corporate performance, which has certain practical guiding significance.

This article is not supported by data.

The author declares that he/she has no conflicts of interest.