Intelligent Data Mining Based on Market Circulation of Production Factors

R&D investment is an important way to improve scientific and technological innovation capabilities. In an increasingly competitive market, the rapid changes in science and technology have brought new opportunities for enterprise development. However, if production factors cannot be rationally allocated, low allocation efficiency or low allocation efficiency is likely to occur. The phenomenon of excessive overflow of production factors makes the input factors unreasonable and causes the problem of lowering the economic output of enterprises. Therefore, this article analyzes the feasibility and timeliness of R&D investment from factors of production and enterprise output performance based on data mining. The problem is to optimize the rational allocation of future factors of production and provide assistance in achieving a combination of existing and new factors of production. For this test, we selected the companies listed in the Growth Enterprise Market and the survey period is from 2018 to 2020. The data is taken from the Guotaian database, some of which is obtained by manually reading the company’s annual report, and a multiple regression analysis model is established and tested. The relationship between R&D investment, production factors, and corporate performance is obtained. The group regression method is used to test the impact of production factors on R&D. Whether input and corporate performance have a moderating effect, and the specific moderating and lagging effects of production factors are investigated. Experiments have proved that the nonstandardized coefficient of R&D investment intensity and operating gross profit margin is 0.714, and the T value of 9.296 is positive and significant. Each increase of enterprise R&D investment intensity by 1 will increase operating gross profit margin by 0.714. The coefficient of operating gross profit margin is much smaller than the coefficient of Tobin’s Q value. This shows that the factor of production has a great influence on the relationship between R&D investment and corporate performance. It has the importance of being a specific practical guide for guiding GEM companies in my country with different elemental intensities to carry out R&D activities and improve corporate performance.


Introduction
In today's global economic integration, the concept of technological innovation has sounded the clarion call of the new industrial revolution one after another, and the scientific and technological capabilities of enterprises have become the touchstone of whether enterprises can survive. In recent years, Chinese companies are growing rapidly as a whole, but their growth point remains at the top of the total economic volume, and their capacity and quality development are not well coordinated. A series of common problems, such as low technical content and low market value of finished products, have become bottlenecks that limit the develop-ment and growth of many companies, making them difficult to even break through.
In today's world, economic downward pressure is relatively high, companies want to create economic benefits in the downturn, and research and development activities may be another option for companies. As for the economic consequences that R&D investment will bring to enterprises, many scholars at home and abroad have also conducted fruitful research on this, but the research conclusions are inconsistent and there are big differences. As the main body of scientific and technological innovation, enterprises must proceed from their own perspective and demand resource allocation that meets their own conditions, and the ultimate goal of the enterprise is to make profits and maximize the effectiveness of production factors in order to improve performance more effectively.
Kim and Yang proposed that it is a top priority to realize the transition from traditional agriculture to modern market agriculture as soon as possible in the construction of a new socialist countryside. Realizing market agriculture can improve China's agricultural core competitiveness, increase farmers' income, and contribute a lot. They also discussed the current situation and influencing factors of the flow of production factors in the process of implementing marketoriented agriculture in our country and discussed the means to promote the circulation of agricultural production factors. However, their research did not clearly propose how to realize the transition from traditional agriculture to modern market agriculture, and the overall research lacks data support [1]. Data mining is playing an increasingly important role in politics, economy, transportation, and life. There have been many cases of applying data mining to solve practical problems at home and abroad. Wang et al. proposed that thresholding the ensemble coherence is a common method to identify radar scatterers that are less affected by decorrelated noise. However, thresholding the consistency may result in the loss of information in areas that experience more complex deformed scenes. If differences in moderately coherent regions have similar behaviors, it is important to consider their spatial correlation for correct reasoning. Then, the information on the low-coherence area may be used in a similar way, while the coherence is used for thematic mapping applications, such as change detection. A method based on data mining and statistical procedures is proposed to reduce the influence of outliers in the results. Our method allows minimizing the outliers in the final results, while preserving the spatial and statistical correlation between the observations. The experimental results lack more data support so that data mining can alleviate the impact of outliers in the results is still doubtful [2]. Tao et al. collect data on IoT devices to achieve better decision-making, higher automation, higher efficiency, productivity, accuracy, and wealth creation. Data mining and other artificial intelligence methods will play a key role in creating a smarter IoT, despite the many challenges. The applicability of eight well-known data mining algorithms to IoT data is tested. Including the deep learning artificial neural network, which constructs a feedforward multilayer artificial neural network for modeling advanced data abstraction, their research data is insufficient, making the data mining results inaccurate [3]. This article will apply the data mining classification algorithm to the actual problems of the performance evaluation of the market-oriented enterprises of production factors. The data comes from the Guotaian database, or by consulting the GEM 2018-2020 annual report, applies the algorithm to the actual problems and improves the market. In this article, we add Tobin's Q value as an indicator to measure the market performance of a company's R&D investment and use Tobin's Q value to reflect its impact on the future of the company. At the same time, this article introduces the variables of factors of production, investigates the relationship between them in detail and in detail, and expands the research space for corpo-rate performance in future scientific research activities and research ideas.

Intelligent Data Mining Market
Circulation of Production Factors

Data Mining
(1) Data mining method Data mining is different from ordinary information retrieval. Ordinary information retrieval is to directly obtain the required content through query commands, while data mining is to obtain effective information from the data through association rules and machine learning algorithms [4,5]; the information obtained is indirect and abstract, and the hidden patterns used for evaluation are discovered through data mining. The general framework of data mining is shown in Figure 1.
The prediction task predicts the value of the target attribute based on the existing attributes, mainly regression tasks and classification tasks. The target variable predicted by the regression task is a continuous variable, and the target variable predicted by the classification task is a discrete variable. However, both of these are used to train the prediction model through the training set. The target variable of the training set is known; that is, the training set there is labeled data, so the generated model is a mapping of existing attributes and target attributes generated under supervision, which is a supervised learning method. Descriptive tasks generally summarize the potential association patterns in the data, mainly including cluster analysis, association analysis, and anomaly detection [6,7].
(2) Data preprocessing (a) Reasons for data preprocessing The purpose of data preprocessing is to obtain reliable data for data mining tasks. There are many reasons for the low quality of data, such as equipment failure during the data collection stage, human error during data input, technical errors during data transmission, and the user's cover-up of information [8,9]. Due to technical or confidential reasons, the data is incomplete data, which is reflected in the missing part of the attribute values of the data or the missing values of some important attributes. The imbalance of data is reflected in the imbalance of the data distribution, and the imbalance of the data will lead to the inaccuracy of the trained model [10].

(b) Data preprocessing method
Data cleaning is to remove the "dirty data" from the data. Smooth noise, general data will have random noise, and the data obtained by smooth processing is more realistic; the detection and deletion of outliers, generally speaking, the data of outliers has a lot of noise, which can be considered noise data. The training of the model is unfavorable, 2 Wireless Communications and Mobile Computing and the general processing method is to delete it directly. Since data is generally collected in different systems, and these systems are generally independent, the data is also isolated data from each other [11,12]. Data reduction is to reduce the scale of data, such as deleting irrelevant attributes, replacing words with numbers in the bag-of-words model, and discretizing continuous values. These processing will greatly reduce the scale of data and save calculation space and time.

Classification Algorithm Performance Evaluation.
For the performance evaluation of the algorithm, in addition to the test set, the performance evaluation index of the algorithm is also required. For different tasks, there are different algorithm performance evaluation indicators to compare the effects of different algorithms or the same algorithm with different parameters [13,14]. For the two-class classification problem, the category predicted by the classification algorithm and the true category of the sample can be combined to obtain the confusion matrix of the classification result, as shown in Table 1 F1 value is proposed on the basis of precision rate and recall rate, and is defined as For a classification problem, the prediction accuracy and recall are usually mutually restricted, and the F1 value balances the influence of these two indicators (5) ROC and AUC For the predicted value, the larger the probability that the sample belongs to the positive sample, the same, the smaller the value, the greater the probability that the sample belongs to the negative sample. For practical applications, if we pay more attention to the accuracy rate, we can increase the   [17,18]. The proportion of real positive cases predicted to be positive is consistent with the recall rate. The ordinate is the true rate, indicating the proportion of real negative cases predicted to be positive. The definitions are as follows: AUC is the area included under ROC. When AUC is greater than 0.5, the classification algorithm is effective, and the larger the AUC, the stronger the generalization ability of the classification algorithm. When the AUC is less than or equal to 0.5, the classification algorithm is invalid [19].

Data Mining Classification Algorithm
(1) Naive Bayes Bayes' theorem is defined as follows: where PðBjAÞ refers to the probability of event B occurring under the condition of event A and PðBÞ and PðAÞ, respectively, represent the probability of event B occurring in event A, and event B is two independent events in event A [20]. Assume that there are t classified samples in the training sample T, and the sample attribute X = fx 1 , x 2 , ⋯, x k g belongs to the c class, where x i represents the i attribute in the sample sign. According to Bayes' theorem, where PðcjXÞ is the conditional probability that the sample attribute X is the class label c; PðXjcÞ is the conditional probability that the sample attribute is X under the c class; PðCÞ is the proportion of each type of sample, which is obtained by counting the frequency of each type of sample; and PðXÞ is a normalized evidence factor and has nothing to do with category [21,22]. For the unknown sample X, we calculate the conditional probability of this sample for each category separately. The calculation of PðXjcÞ is more difficult because the attributes of the unknown sample may not appear in the training set. According to the assumption that the attributes are independent of each other, there are So, the expression of the Bayesian classification algorithm can be written as (2) Logistic regression The Sigmoid function is similar to a step function. At the jumping point, it can be regarded as jumping from 0 to 1 in myopia, which meets the requirements of classification. At the same time, the differentiability of the function ensures that the solution is more convenient [23,24]. The calculation formula of the Sigmoid function is Among them, z is a regression function; set the regression coefficient ω, and the input is X, then z = ω T X, into the above formula, we can get Ultrasound images have high requirements for edge detail and are nonstationary signals that cannot be met by traditional Fourier transform-based signal denoising methods. Ultrasonic speckle suppression and denoising methods can be broadly divided into spatial area local statistical filtering, anisotropic diffusion filtering, and wavelet transform-based filtering. ln If y is the probability that sample X belongs to a positive sample, 1 − y is the probability of a negative example, and ln ðy/ð1 − yÞÞ is the relative probability that sample X is a positive sample. Considering y as the posterior probability, estimate Pðy = 1jXÞ, then Equation (13) can be something like Obviously, To estimate ω through the maximum likelihood method, we can get

Wireless Communications and Mobile Computing
The loss function JðωÞ is This formula cannot be solved analytically, and the gradient descent algorithm can be used to approximate the solution to obtain the best regression parameter ω [25].

Experimental Design of Market-Oriented
Circulation of Production Factors

Variable Selection
(1) Dependent variable: corporate performance There are various indicators that reflect the performance of enterprise R&D input and output, including financial and nonfinancial. Nonfinancial indicators include upgrade of production process, number of patent applications, and update of knowledge and skills, considering that the measurability and availability of this type of data is difficult, so when combined with assumptions and samples, Choose financial indicators to measure R&D input and output. Tobin's Q value is also added as an indicator to measure the market performance of enterprise R&D investment. Regarding R&D investment as a dynamic and long-term process, it has given high weight to the development prospects of the company, and the Tobin Q value is used to reflect its impact on the future of the company.
(2) Independent variable: R&D investment The R&D expenditure is divided into two parts: expense and capitalization. It includes the actual R&D investment of the enterprise in the year, the investment of personnel and equipment, and the investment of information and creativity. On a quantifiable basis, the current R&D investment amount is obtained. In order to avoid a large gap in the value of the sample's R&D investment due to the difference in company size, the R&D investment ratio is selected as an independent variable.
(3) Categorical variable: intensity of production factors The factors of production are divided into three dimensions: labor, capital, and technology. Different measurement standards are used for the dimensional division of production factors, and the intensity division is completed according to corresponding indicators and methods. Capital-labor ratio and fixed asset ratio are selected to measure capitallabor intensity. The ratio of capital to labor can reflect the quotas between the two most basic factors of production; the ratio of fixed assets can indicate whether the company has idle fixed assets and the importance of fixed assets, and the two can be directly measured and complemented and analyzed, and the ratio obtained the larger the value, the more important the capital. The ratio of R&D expenses to product costs is cited to reflect how much R&D expenses are condensed in each unit cost; the ratio of R&D personnel to the number of employees reflects the personnel input of the company in conducting R&D activities, combining the above two indicators to distinguish between technology and nontechnology type enterprise.

(4) Control variables (a) Enterprise scale
Since the scale of listed companies has an impact on output effects, large-scale companies often have certain accumulated advantages, so company scale is a control variable that needs to be established. Since R&D investment not only has an impact on the value of fixed assets but also has a more important impact on the value of intangible assets. In addition, in the R&D investment intensity index, the operating income has been calculated as the denominator, so the total assets are taken as the natural logarithm.

(b) Asset-liability ratio
Since R&D activities are a corporate activity with high capital investment, and the sample selected companies listed on the Growth Enterprise Market, the sample has the characteristics of short business cycle, fast replacement, and large growth inertia. The specific situation of corporate debt management will eventually be reflected in business performance. Therefore, from the perspective of the relationship between corporate capital stock and technological innovation capabilities, the asset-liability ratio is used as a control variable to indicate the abundance of corporate funds. The relationship and meaning between the variables are shown in Table 2.

Test Subject.
This test selects companies listed on the Growth Enterprise Market, and the research period is from 2018 to 2020. The data comes from the Guotaian database, partly obtained by manually reading the company's annual report. Since the China Securities Regulatory Commission did not include the R&D investment data in the scope of mandatory disclosure by listed companies, the R&D investment data is not available in databases such as Guotaian. Therefore, such data is manually collected by reading the company's annual report. After strict screening of the sample data through the screening criteria, a total of 2942 sample data were obtained.

(i) Screening criteria
Financial data for the period 2018-2020 must be complete. If something is missing, the sample will not be used. If the data is complete, the subject should be representative. If your company's industry data is less than 5 (including 5), it will be deleted. Due to the peculiarities of the financial industry, data on listed companies in the financial industry have also been deleted.

Moderating Effect Test Method.
For companies, the relationship between R&D activities and corporate performance will also be affected by the company's own "personality." The exact same R&D management model is effective for some 5 Wireless Communications and Mobile Computing companies, but invalid for some companies. The "personality of an enterprise" can be expressed as the "foundation" of the enterprise, that is, the factors of production, specifically the number of laborers, the richness of assets, or the level of technology. When the independent variable is a continuous variable and the adjustment variable is a categorical variable, the adjustment effect can be tested by the group regression method; that is, after the adjustment variable M is grouped, X and Y are linear regressions; if there is a difference between the group regression coefficients, it proves that there is a moderating effect.

Model Building.
We establish a multiple regression analysis model to test the relationship between R&D investment, production factors, and corporate performance. The establishment steps are as follows: put independent variables and control variables into the model and perform regression analysis to explore the relationship between R&D investment intensity and corporate performance and introduce adjustment variables, divide the samples into three groups in the three-dimensionality groups of production factors, and build the model as follows: Among them, Y i represents the performance of different groups of enterprises, X 1i represents the R&D investment of different groups, X 2 represents the control variable, and ε is introduced as a random variable to represent other influencing factors not involved in this study to modify the hypothetical model established. If there are significant differences in the grouping regression under each dimension of production factors, it can be concluded that each dimension of production factors has a moderating effect on the relationship between R&D investment and enterprise performance. On this basis, the subsamples classified according to the various dimensions of the production factors are then used for regression comparison. By observing the significance of b in different groups, there are significant differences. It can further reflect the concrete manifestation of the relationship between production factors and R&D investment and enterprise performance.

Statistical
Processing. Statistical analysis was carried out with SPSS 13.0 statistical software. The significance test of the difference was performed by one-way analysis of variance, the difference between the two groups was tested by LSD-t, and the statistics of intelligent data mining analysis results of the market-oriented circulation of production factors were performed by the group t-test. P < 0:05 is considered to be significant and statistically significant. It can be seen from Figure 2 that during the three years from 2018 to 2020, the average R&D investment intensity of listed companies on the ChiNext is 6.9%, and the standard deviation is only 0.0675, indicating that this value can well represent the overall level. The R&D intensity of 2% is only the level at which the company can barely survive, and the R&D intensity of 5% is the level at which the company has a competitive advantage. This shows that the Growth Enterprise Market as a whole has innovative vitality. This is mainly because most of the companies listed on the ChiNext are high-tech companies or entrepreneurial companies, and they regard R&D activities as a necessary condition for maintaining innovation and competitiveness.

Descriptive Statistical
It can be seen from Figures 3 and 4 that the overall Tobin Q value of the sample varies between 0.6149 and 13.267, with an average of 3.5792 and a standard deviation of 2.67, indicating that there is a large gap in the market value of listed companies on the GEM. The operating gross profit margin values are relatively the same. The standard deviation is 0.167, the minimum is -0.03, the maximum is as high as 0.92, and the overall average is 0.366, maintaining a gross profit margin of about 38%. This shows that the GEM listed companies have continuous competitive advantages.
As can be seen from Figures 2-4, from the perspective of time span, the average value of R&D investment intensity has increased from 7.3% in 2018 to 6.73% in 2019 and 6.67% in Although there is a slight downward trend, compared with the 3.5% R&D investment intensity of main board listed companies, it still has a big advantage. Tobin's Q value has been significantly improved in three years. While growing, the gap in the market value has gradually widened, and the standard deviation has increased by 1.26 compared with the previous year. In comparison, the company's operating gross profit margin data has basically stabilized at 0.34 during 2018-2020, indicating that the company's short-term profit level has changed less.
(2) Analysis of each grouping situation It can be seen from Table 3 that the R&D investment intensity of labor-, capital-, and technology-intensive industries are 4.32%, 4.49%, and 8.96%, respectively; Tobin's Q values are 3.613, 3.623, and 4.346, respectively; operating gross profit margins are 0.304, 0.347, and 0.358, respectively, indicating that under different production factor intensities, there is a difference between enterprise R&D investment and enterprise performance. The specific difference is reflected in a stepped difference. Technology-intensive industries far surpass other groups in terms of R&D investment and corporate performance. At the same time, capital-intensive industries are slightly higher than labor-intensive industries. The R&D intensity of technology-intensive samples, Tobin's Q value, and operating gross profit margin are the largest; capital-intensive industries have the smallest standard deviation of R&D intensity, indicating that the R&D intensity of enterprises in this type of industry remains at a relatively stable level; labor-intensive industries, the industry's operating gross profit margin standard deviation, is relatively lowest, indicating that the gross profit margin of this type of industry tends to be stable, but the overall profitability is weak.

4.2.
Correlation. This paper uses SPSS 13.0 software to analyze the data and conducts correlation analysis before regression analysis. Correlation analysis is used to describe the degree of interdependence between variables. It can detect whether there is autocorrelation between explanatory variables in the model. The results of correlation analysis are shown in Table 4. The table lists the coefficients of the correlation between R&D investment intensity, corporate performance, and control variables. Based on the correlation analysis results in Table 4, the relationship between the explanatory variables and the explained variables in the model will be described separately.

Wireless Communications and Mobile Computing
It can be seen from Table 4 that the correlation coefficients between the independent variable R&D investment intensity and the dependent variable Tobin's Q value and operating gross profit margin are 0.165 and 0.378, respectively, and the associated probability P value is less than the significance level of 0.01. There is a significant positive correlation between. Moreover, the operating margin coefficient is greater than Tobin's Q coefficient, indicating that the correlation between R&D investment and operating gross profit margin is stronger than its correlation with Tobin's Q value, which indirectly proves that it is suitable for further analysis of different groups and variables. The difference exists. The two control variables of asset-liability ratio and enterprise scale are also analyzed here. It can be seen that the correlation coefficients between asset-liability ratio and Tobin's Q value and operating gross profit margin are, respectively -0.216 and -0.413, with a significance level of 0.001; enterprise size and the correlation coefficients between     9 Wireless Communications and Mobile Computing each increase in R&D investment intensity of enterprises, Tobin's Q value will increase accordingly 4.233 shows that R&D investment is positively correlated with the company's market performance. The results are shown in Figure 5.
(2) The impact of R&D investment intensity on operating gross profit margin The statistical performance of this model is significant, reaching 117.457, indicating that in the regression analysis of operating gross profit margin, the regression model of R&D investment intensity and operating gross profit margin is overall significant. The adjusted model is 22.5%, indicating that the regression equation explains 22.5% of the variation of operating gross profit margin. At the same time, the collinearity diagnosis result also shows that there is no collinearity problem among the explanatory variables in the equation. The nonstandardized coefficient of R&D investment intensity and operating gross profit margin is 0.714, and the T value of 9.296 is positive and significant, indicating that the R&D investment of an enterprise can directly promote the growth of operating gross profit margin and thus bring business performance to the enterprise. An increase of 1 will increase the operating gross profit margin by 0.714 accordingly. The coefficient of the operating gross profit margin is much smaller than the coefficient of Tobin's Q value, indicating that R&D investment has a greater effect on the market value of the company, which proves that R&D investment and corporate finance performance are positively correlated, and the results are shown in Figure 6.

.Conclusions
This article uses the 2018-2020 GEM listed companies as a sample, based on the perspective of production factors, to study the impact of R&D investment and corporate performance. The article divides the entire industry on the Growth Enterprise Market into labor-, capital-, and technologyintensive industries by calculating the corresponding production factor intensity indicators. Through the use of statistical software SPSS 13.0, descriptive statistical analysis, correlation analysis, and regression analysis were performed on the full sample and subsamples. At the same time, the group regression method was used to test whether production factors have a moderating effect on R&D investment and corporate performance. It also examines the specific adjustment and hysteresis effects of production factors in it. In the research of this article, the factors of production are divided into three parts: labor factors, capital factors, and technology factors. It is found that the factors of production have a significant impact on the relationship between R&D investment and corporate performance. This conclusion will guide my country's GEM in the future. Companies with different factor intensives carry out R&D activities to improve corporate performance, which has certain practical guiding significance.

Data Availability
This article is not supported by data.

Conflicts of Interest
The author declares that he/she has no conflicts of interest.