A Big Data-Driven Approach to Analyze the Influencing Factors of Enterprise's Technological Innovation

A data-driven intelligent analysis method is proposed in this paper to explore and identify the enterprise's technological innovation influencing factors. Questionnaire surveys or expert interviews are usually adopted by the traditional evaluation methods for indicators of technological innovation selection. However, it inevitably involves human factors and experts' subjective judgments, which may affect the result of enterprises evaluation. The research presents an improved text clustering method based on a semantic concept model to explore and analyze the key influencing factors of enterprise's technological innovation. The study collects textual data from 400 enterprises in Beijing and smart analyzes the critical influencing factors of enterprise's technological innovation by using the proposed method. The influencing factors can be divided into seven categories. In addition, compared with the traditional K-means clustering method, the proposed method has a good effect. We proposed a methodology to conduct an intelligent analysis for enterprise's technological innovation under the data-driven. It can provide more objective and auxiliary suggestions for the evaluation of the enterprise's technology innovation.


Introduction
Technological innovation is the foundation of the survival and development of enterprises and the driving force for the country's economic and social development. It is essential for an enterprise to gain a competitive advantage by correctly analyzing and evaluating the technological innovation capability. Since the middle of the 18th century, the world has experienced three industrial revolutions. Social development is entering the era of data revolution with the development and application of new-generation information technologies such as cloud computing, big data, mobile Internet, artificial intelligence, and the Internet of ings. e production and circulation of massive data gave birth to "big data" and set off the fourth industrial revolution. e digital economy era has given birth to new production factors represented by big data. Data-driven continuous growth and innovative development are the main lines of enterprise digital transformation. Compared with the past process-driven, data-driven allows companies to use massive and multidimensional data to establish a more comprehensive evaluation system, create direct business innovation growth, and continuously improve operational efficiency. It is an essential means to maintain sustainable development in market competition. With the continuous development of enterprises, various forms of technical documents and text information continue to spew. According to statistics, 80% of enterprise data exist in unstructured conditions, such as Web pages, technical papers, e-mails, etc. Especially in enterprise's technological innovation, related technological innovation activity reports, meeting minutes, annual corporate reports, patent technology files, project application reports, and textual information are increasing with each passing day. Most of the time, enterprises often need to deal with disordered unstructured textual data except structured data. Ignoring the text information generated by corporate technological innovation activities will inevitably affect the result of technical innovation management. Enterprises need to deal with these collected files and explore the value and knowledge behind these massive amounts of data. erefore, the rise of big data and the development of intelligent text analysis technology have used many unstructured and fragmented textual data that enterprises initially neglected. However, it is a significant challenge for the era of big data to mine knowledge from massive unstructured data and provide auxiliary decision-making for enterprise technological innovation.
Text mining generally refers to the process of extracting valuable, nontrivial patterns or knowledge from a large amount of unstructured or semi-structured text files.
erefore, text mining provides an effective measure for textual data collation, analysis and, mining. It meets the massive demand for processing and analyzing an enormous amount of unstructured and semi-structured textual data. To a certain extent, it solves the labour and material cost problems of manual text processing. Text mining, as a representative of the intelligent measure method, has been widely applied in various fields. For example, the evaluation of business intelligence and enterprise technology analysis [1][2][3], the enterprise technology opportunity identification [4,5], correlation analysis of enterprise technology cooperation behavior [6][7][8], the analysis of enterprises technology maturity [9][10][11], and prediction of enterprise technology development trend, etc. [12][13][14].
e intelligent text mining algorithm can quickly and high-quality organize amounts of information into a few meaningful clusters and obtain the hidden potential knowledge or patterns. e rapid growth of textual data, especially in enterprise's technological innovation, has become diverse, high-dimensional, complex data loaded with semantic information. erefore, it is feasible to explore the essential influencing factors of enterprise's technological innovation based on intelligent analysis algorithms and realize the semantic organization for technological innovation evaluation. is paper collects technical data from 400 enterprises in Beijing. It combines the proposed intelligent text clustering algorithm to realize knowledge mining and acquire enterprise's technological innovation at the semantic level. e remainder of this paper is organized as follows. e second section presents the related works. e third section provides the implementation process of the proposed methodology and makes a performance comparison validation with the traditional K-means clustering method. While section four presents the result and the section five give a discussion of this research. In the final section, we conclude this paper's work.

Related Works
e previous studies of enterprise's technological innovation usually focus on theory research and evaluation methods. In the early stage, the evaluation system of enterprise's technological innovation was mostly constructed by questionnaire or expert interview. e influencing factors of an enterprise's technological innovation capability were obtained by the questionnaire design or expert experience. However, these methods are often affected by the limitation of the sample size and the subjective factors of expert opinions. It is difficult to objectively and comprehensively reflect the status of enterprise's technological innovation capabilities. Many scholars make a comprehensive evaluation of enterprise technological innovation. ey usually use various evaluation methods to express the overall characteristics of enterprise technological innovation. ese evaluation methods are mainly divided into the following aspects.
Some scholars adopted the fuzzy evaluation method to evaluate the enterprise technological innovation ability. e main feature of this method is first to design a set of evaluation index systems, determine the weight of each index, establish a fuzzy comment set, and then use the fuzzy evaluation method to judge the innovation ability of the enterprise. For example, Du et al. [15] established a risk evaluation model for technological innovation based on fuzzy evaluation. Suder and Kahraman [16] proposed a Fuzzy TOPSIS method to evaluate technological innovation investments using eight different criteria. Feng and Ma [17] identified the influencing factors of service innovation in manufacturing enterprises by using the fuzzy DEMATEL method. e shortcomings of this type of method are that human factors play a prominent role, and the data collection and processing are human-oriented, which lack objectivity and needs a lot of labour.
Some scholars used the Analytic Hierarchy Process (AHP) and its variants to evaluate the enterprise technological innovation capability. Mu et al. [18] established an index system for the technological innovation capabilities of small and medium-sized enterprises through the AHP method. Pan et al. [19] combined the AHP and osculating value process (OVP) to evaluate the green innovation ability of manufacturing enterprises. In addition, some scholars have used the improved AHP and fuzzy evaluation method to establish a model for assessing the technological innovation capabilities of enterprises [20]. e weight of each indicator in the AHP method depends on the subjective judgment of experts, and it is inevitable to have a certain degree of subjectivity.
Some researchers evaluated the enterprise technological innovation capability with the Data Envelopment Analysis (DEA) method. Wang et al. [21] constructed a high-tech industrial evaluation framework of technical innovation efficiency based on two-stage network data envelopment analysis (DEA). Ma et al. [22] used the DEA method to evaluate and analyze the innovation capability of 233 listed companies in 5 major industrial sectors defined by the China Securities Regulatory Commission (CSRC). Li et al. [23] measured the technical efficiency, scale efficiency, and pure technical efficiency of innovation in China's semiconductor industry using a three-stage DEA model. Although the DEA method can realize the multiple inputs and output, it is only an efficiency evaluation method and cannot indicate the actual technical level of the research object.
Some scholars proposed to use intelligent decisionmaking methods to evaluate the technological innovation capabilities of enterprises. e intelligent decision-making method applied the artificial intelligence-related theoretical methods and fusing traditional decision-making mathematical models for intelligent reasoning and solving, such as genetic algorithm, ant colony algorithm, rough set, and so 2 Computational Intelligence and Neuroscience on. At present, there is little literature on the application of intelligent decision-making methods to evaluate enterprise technological innovation capabilities. Shang [24] proposed the evaluation model of strategic management capability based on the Back-Propagation (BP) neural network algorithm. Zhen and Yao [25] analyzed the lean production and technological innovation in the manufacturing industry based on Support Vector Machine (SVM) algorithms and data mining technology.
Most of the related works about enterprise's technological innovation evaluation methods rely on the experts' subjective experience. Hence, the evaluation results are varied due to the different experts' opinions. Less-comprehensive impact indicators selected by experts may not reflect the actual status of the enterprise's technological innovation capability. e lack of objectivity in the evaluation method will not be conducive for identifying and cultivating enterprise's technological innovation capabilities. At present, it can objectively reflect the actual level of enterprise's technological innovation capabilities by applying big data-driven intelligent analysis methods. In addition, it can use massive and multidimensional data to establish a more comprehensive evaluation system.

Methodology
is paper proposed an improved semantic clustering algorithm combined with the concept of semantic similarity and relatedness based on domain ontology. After text preprocessing, a corpus containing the keywords was built and the keywords were mapped to the concepts in the domain ontology of the enterprise's technological innovation. e calculation of semantic similarity and relatedness between concepts was one of the critical steps. It needs to use the semantic similarity and relatedness method proposed in this paper to establish the compound similarity matrix. Finally, the keyword set in the corpus is clustered according to the improved semantic text clustering algorithm proposed in this paper. Figure 1 shows the framework of semanticbased text clustering in the field of enterprise technology innovation. According to the framework, there are mainly three steps in the improved methodology and the details are as follows, and the main parameters used in the following equations are shown in Table 1.

Text Concept Mapping.
After collecting the data from the enterprise's technological innovation, preprocessing the data, including Chinese word segmentation, custom dictionary, POS selection, and stop words removing, etc., obtained the text keyword sets. e keyword set is mapped to the ontology of the enterprise technology innovation domain to obtain the concept set. Two situations need to be considered.
(1) When the dataset's keywords can directly match with the concepts in the domain ontology, the keywords T � t 1 , t 2 , . . . , t n are directly mapped with the concepts C � c 1 , c 2 , . . . , c m .
(2) When the keywords in the dataset cannot directly match the concepts in the domain ontology and appear frequently, the keywords should be reserved as unregistered words. Calculate the occurrence frequency TF of the keyword. If TF > μ, keep the keyword in the unregistered word sets W � w 1 , w 2 , . . . , w l , otherwise delete the keyword.

Semantic Similarity and Relatedness Calculation.
Before text clustering, the semantic similarity and relatedness calculation method need to be used to construct the semantic matrix. We proposed a new semantic measurement method that combines the concepts of semantic similarity and relatedness. Firstly, calculate the semantic similarity of concepts based on the semantic distance in the established domain ontology of enterprise's technological innovation, as shown in Figure 2. Secondly, assign the weights to the path of two concepts connected in the domain ontology. e semantic distance between two concepts was obtained by traversing the sum of the weights of the connection paths instead of calculating the number of edges connecting the two concepts. e specific calculation is shown in equation (1). en, calculate the semantic relatedness of concepts through the co-occurrence in the text. Finally, combine the semantic similarity and the relatedness to establish the compound similarity matrix M.
where α denotes the adjusting parameter, λ denotes the factor of influence degree of semantic distance on semantic similarity, dist(c i , c j ) denotes the semantic distance between c i and c j in the domain ontology, f k denotes the co-occurrence frequency of concept c i and concept c j in the k words window of the entire corpus. f c (c i ) and f c (c j ) represent the frequency of concept c i and concept c j in the whole corpus.     e term set extracted from the document set can be expressed as T � t 1 , t 2 , t 3 , . . . , t n . w ij means the weight of the term t j in the document d i . e traditional clustering algorithms are usually determined by the number of occurrences of the term t j appearing in the document d i , that is, the term frequency, or the term frequency-inverse document frequency (TF-IDF), is used to assign weights. e traditional K-means algorithm selected K points randomly from the sample as the initial cluster centre candidate. en, it usually calculated the distance from the sample points to the centre with the Euclidean distance formula (shown in equation (2)) and divided the points to the nearest centre. Finally, it iteratively calculated the centre of the cluster until the centre of each group does not change.

Improved K-Means
is paper proposed an improved K-means algorithm, which improves the algorithm mainly by selecting initial cluster centres and semantic-based clustering. According to the semi-positive semantic similarity and relatedness of n × n matrix M obtained by Step3, M � W T W can be obtained by orthogonalization of the positive semi-positive matrix, where the column in W represents the document vector. According to the semantic similarity and relatedness matrix M, the Euclidean distance formula can be improved as the following equation. where , the modified distance measurement considered the semantics between words based on the semantic similarity and relatedness matrix. erefore, the distance between the document and the cluster can be measured by the definition of the following equation: where t j represents the feature vector of the document, z l defines the cluster centre, t j is the new document feature vector derived from t j , and z l is the cluster centre derived from z l . Firstly, randomly select a data point z l as the initial cluster centre from the input dataset. Secondly, for each point c i in the dataset, the distance from c i to z l is calculated by the Equation D(c i ) � arg min‖c i − z j ‖ 2 2 j � 1, 2, . . . k. e point c i with the maximum distance D(c i ) from z l is selected as the new cluster centre z 2 . en, repeat the above steps until K initial cluster centres are found. Finally, the remaining data points in the sample set are allocated to the nearest or most similar clusters according to the principle of maximum similarity. e cluster centres in the K clusters are recalculated and iterated until the termination condition is met. e step of the improved K-means algorithm is described in Algorithm 1. e improvement to the traditional K-means algorithm proposed in this paper based on semantic similarity and relevance mainly includes (1) using the improved method based on the maximum similarity to determine the initial cluster centre and reduce the random position of the cluster centre. (2) e improved Euclidean distance with semantic similarity and relatedness is used to measure the similarity between cluster centres and sample sets, instead of the traditional K-means algorithm, which ignores the semantic relationship between terms. (3) By adding convergence conditions, the original K-means algorithm solved the problem of unstable clustering results.

Validation Methods.
e paper mainly used the SSE and SC methods to compare the clustering results of the traditional K-means method and the improved K-means method.
e SSE method calculates the total error value between any data point and the cluster centroid, and the calculation method is shown in equation (5). dis represents the distance function, p is any data point in the cluster of c i , and m i is the cluster centroid. e lower value of SSE equates to better performance of clustering. Otherwise, a higher value represents a worse clustering effect.
e Silhouette Coefficient method combines clustering cohesion and separation to evaluate the effect of clustering, and the value is between [−1, 1]. e higher value of the Silhouette Coefficient indicates the better clustering performance. e calculation method of Silhouette Coefficient was shown in equation (6). a(i) represents the average distance of the data point i to all other points in the cluster to which the data point i belongs. b(i) represents the minimum value of the average distance from the data point i to all points of each of the other groups.

Validity Comparison of the Proposed Method.
To compare the clustering performance between the traditional K-means algorithm based on Bag-of-words and the improved K-means algorithm based on semantic similarity and relatedness, the number of K is selected from 3 to 10. e Computational Intelligence and Neuroscience experimental results of the SSE and Silhouette Coefficient are shown as follows. Table 2 shows that as the number of K clusters increases, the SSE value of the improved K-means algorithm is significantly smaller than that of the traditional K-means algorithm. It shows that the clustering performance of the improved K-means algorithm is better than the traditional K-means algorithm. As shown in Figure 3, when the number of clusters (K) equals 8, the improved K-means algorithm and the traditional K-means algorithm have an elbow (inflexion point) within the SSE value. It shows that when the number of clusters is 8, performance clustering might be the best. It provides a reference for the value of K in K-means clustering.
As shown in Table 2, with the increasing number of clusters K, the Silhouette Coefficient value of the improved K-means algorithm is significantly higher than the traditional K-means algorithm. e higher value of the Silhouette Coefficient in the dataset indicates the better the clustering performance. Hence, it shows that the performance of the improved K-means algorithm is better than the traditional K-means algorithm. As shown in Figure 4, when the number of clusters (K) equals 8, the improved K-means algorithm and the traditional K-means algorithm have relatively higher values. Hence, comprehensive analysis shows that the optimum value of K is 8. e red dotted lines in Figure 5 represent the Silhouette Coefficient of the traditional K-means algorithm and improved semantic K-means algorithm. e bar chart is the category of clusters. Most of the samples in a group have a higher Silhouette Coefficient value and are distributed near the red dotted line, representing a better clustering effect. On the contrary, if the sample points have a lower Silhouette Coefficient value and the distribution is scattered, the clustering effect is worse. Figure 5 shows that the Silhouette Coefficient value of the improved K-means algorithm is higher, and the sample distributes near the red dotted line. Hence, the result indicates the performance of an improved K-means algorithm based on semantic similarity, and relatedness is better than the traditional K-means algorithm based on the Bag-of-Words model.

Data Collection.
e experimental data in this paper mainly collect the technological innovation information of 400 enterprises in Beijing and uses the document information as a text collection. e collected textual data mainly consist of the enterprise's primary status, the development of enterprise technological innovation activities, enterprise innovation projects, enterprise organizational structure, enterprise main products and services, enterprise profitability, etc. Table 3 briefly shows the details of the data collection result.
After data cleaning and selection, there are 867 valid texts, and the overall data size is about 20 M. e experimental operating environment is the Windows 10 system, 2.70 GHz core processor, 8.0 GB memory, and Python 3.6.2. After the preprocessing, including the custom dictionary, part-of-speech filtering, and stop words removing, the keywords vocabulary was obtained and shown in Figure 6. en, map the keywords to the concept in domain ontology and get the semantic similarity matrix P and relatedness matrix Q by calculating the semantic similarity and relatedness. Figure 7 shows the result obtained by text clustering based on semantic similarity and relatedness. e most important 15 feature words in each cluster are selected to represent the topic based on the feature weight, as shown in Table 4. According to the topic reflected by each group, there are eight types of main factors affecting Input: preprocessed dataset D � d j |j � 1, 2, . . . m ; the dataset contains N terms C � c i |i � 1, 2, . . . , n , semantic similarity, and relatedness matrix M; the number of cluster K; iteration termination condition ε; the maximum number of iterations MaxStep; Output: K cluster result; BEGIN (1) start � 0 k � 0; //initialization load dataset D and select an initial cluster centre z 1 randomly from D saving to the initial cluster centre Z j � z j , j � 1, 2, . . . k ; (2) Calculate the distance between each sample and the initial point z l , find the point c i with the largest distance from z l according to the equation (4), take the sample point c i as the second initial cluster centre z 2 , and save it to the initial point set Z j � z j , j � 1, 2, . . . k ; (3) repeat step 2 until the kth initial cluster centre is found; (4) according to the D(c i ) � arg min‖c i − z j ‖ 2 2 , assign each sample c i to the class of the nearest k initial cluster centres; (5) update the centre of each cluster through the mean value Z i (O) � n i i�1 c i /n i , n i represents the number of sample points in the group; (6) the measure function E � k l�1 m j�1 n i�1 W lj dis EU improved (z li , c ji ), z li represents the cluster centre, dis EU improved (z li , c ji ) represents the distance between the jth data point and the lth cluster centre; w lj represents the semantic matrix; (7) if the number of iterations reaches MaxStep or satisfies |E 1 − E 2 | < ε, the iteration is terminated;

Results Analysis.
Otherwise, O � O + 1, return to step 5 and step 6 (8) end; ALGORITHM 1: Improved semantic similarity and relatedness-based K-means clustering algorithm. enterprise technological innovation. We combined the eighth cluster and the third cluster because they reflect the same theme. e analysis of the seven influencing factors about the technological innovation of enterprises is as follows.
Cluster 1: manufacturing capability. e main feature words of this cluster mainly focus on the new products, new processes, new materials, process technology, equipment level, etc. e content of these feature words is related to the manufacturing capabilities of products     Computational Intelligence and Neuroscience and processes. e influence of manufacturing capability on technological innovation of enterprises is mainly reflected in the capability to transform the research and development results into manufacturing. e word "equipment level" reflects the advanced manufacturing equipment, the phrases "Construction technique, Process technology, Technical process, High-tech" reflect the topic of process design capability, and the term "quality control" reflects the content of product quality management. "Internet application, Information technology, Industrialization" reflect the theme of product innovation activities. erefore, the cluster's words with high feature weights reflect that manufacturing capability is essential for an enterprise's technological innovation. Cluster 2: innovation resources. e words "engineer," "senior engineer," and "R&D expenditure" in this cluster have a high proportion of feature weight. e  Engineer, senior engineer, senior expert, R&D expenditure, R&D, total assets, employees' number, equipment original value, expenditure on science and technology activities, main business proportion, bachelor degree or above, enterprise scale, asset-liability ratio ownership structure, technology introduction words "Engineer, Senior engineer, Senior expert, Employees number, Bachelor degree or above" reflect the quantity and quality of the R&D staff. e words "R&D expenditure, R&D, Total assets, Expenditure on science and technology activities, Main business proportion" reflect the financial investment on R&D. e word "Equipment original value" reflects the equipment investment of R&D. e phrases "Enterprise scale, Asset-liability ratio ownership structure, Technology introduction" represent the investment of enterprises in non-R&D including of enterprises own capability. e investment of innovation resources mainly refer to the quantity and quality of enterprise's investment in technological innovation resources. It is reflected in the investment of staff, funds, and equipment in R&D. e investment of innovation resources is one of the influencing factors of enterprise technological innovation. Cluster 3: mechanism innovation. e top feature words of this cluster can reflect directly that the topic is mainly about the mechanism innovation, which is an essential factor that affects enterprises' technological innovation. e feature words such as "Rewards system, Performance review, Post-doctor, Excellent talents, Incentive mechanism" indicate that enterprises attach importance to the incentive mechanism of personnel.
e words "Organization and implementation, Operating mechanism, Organizational construction, Management system, Organizational structure" reflect the influence of the organizational management mechanism on the technological innovation of enterprises. An effective innovation mechanism can stimulate talents and cooperate effectively. e eighth group's theme is the same as cluster 3, and the sample proportion is only 4.3%. Hence, we merge the contents of cluster 3 and cluster 8. Cluster 4: innovation output. e feature words in this cluster, such as "Patent, Industry standard, Gross profit on sale, Main business product sales revenue," have higher feature weight. It indicates that this cluster mainly reflects the topic of innovation output. e words "Patent, Industry standard, Science and technology progress award, Method number, Number of patent applications, Number of technology development projects, Number of new product development projects, Software copyright, Utility model, Design patent" reflect the technological output of enterprise's innovation.
e words "Gross profit on sale, Main business product sales revenue, Industrial output, Industrial added value" reflect the innovation benefits. erefore, from the distribution of feature words, the innovation output is also an important performance that affects the technological innovation of an enterprise. Cluster 5: market innovation. e words "Competitive advantage, Social benefit, Market competitiveness" have high feature value in this cluster. e keywords tend to reflect market innovation, which means product sales and promotion innovations can meet market demands. With the continuous expansion of market scale and the increase of market demand, the market-oriented product sales innovation model has become one of the important factors affecting the development of enterprise technological innovation. Cluster 6: protection measures. "Intellectual property, Intellectual property protection, Intellectual property management, Independent intellectual property" are high feature value words in this cluster, reflecting the content of intellectual property protection measures. Technical knowledge protection can promote technology diffusion and ensure attracting foreign capital and technology introduction. erefore, the protection measures for intellectual property and technology are conducive to promoting the technological innovation of enterprises and are an important influencing factor. Cluster 7: innovation strategy. is cluster contains the most significant proportion of samples. e keywords "Industry-University-Research Cooperation, Industry-University-Research, Internal and external resources, Integration, Resource Integration" reflect the topic of innovation strategy. It refers to the integration and arrangement of internal and external innovation resources and technologies based on the enterprise's overall strategy with enterprise operation. e phrases "Industry-University-Research Cooperation, Industry-University-Research, Research institutes, Colleges and universities, R&D team" represent the content of the joint innovation strategy of enterprises and industry, university, and research. e phrases "Internal and external resources, Resource Integration" reflect the integration and allocation of internal and external resources.
e terms "Technical cooperation, Technology Exchange, Technology fusion" reflect the strategic plan of enterprises for technology integration and innovation. e phrases "Core competence, Strategic planning, Overall planning, Leader" represent the innovation capability and strategy of the enterprise leader or decision-maker. erefore, the topic of this cluster is innovation strategy, and the correctness of the innovation strategy also has an important impact on enterprise technological innovation. Figure 8 shows the framework of the influencing factors for enterprises' technological innovation. ere are 7 types of impacts factors: manufacturing capability, innovation resource, mechanism innovation, innovation output, market innovation, protection measures, and innovation strategy. rough analysis of the linking feature words of each cluster, the meso-level concept can be concluded, as shown in Figure 8.

Policy Suggestions.
It can be seen from the framework of influencing factors of enterprise's technological innovation, and the four clusters results that are particularly prominent include the "Protection measure," "Innovation strategy,"  Computational Intelligence and Neuroscience "Market innovation," and "Mechanism innovation" except the "Manufacturing capability," "Innovation resources," and "Innovation output," which are included in most of the researches. e four aspects are relatively new in the field of enterprise's technological innovation. us, the paper provides policy suggestions from these four aspects for the enterprise's technological innovation. e innovation capability of intellectual property is an important aspect to measure the innovation output of enterprises. In the current fierce competition environment, enterprises are rushing to develop scientific and technological achievements through various channels for survival and development, and seek legal protection by applying for patents. However, it is not uncommon for enterprises to suffer heavy losses because their competitors registered the patents and trademarks in advance. erefore, enterprises should strengthen their awareness of intellectual property protection, set up special intellectual property departments, and pay attention to cultivating professional intellectual property talents. e innovation strategy is a plan and methodology for enterprises to develop new products and services in the future. It aligns the development of innovations with future corporate goals, which requires formulation based on the external environment and internal conditions. us, the innovation strategy directly reflects leadership decisionmaking. e weak sense of innovation or lack of innovative decision-making power for senior decision-makers will bring risks to the enterprise. e enterprises should plan and layout the innovative strategic cooperation in advance. In addition, it strengthens the collaboration between schools and industry and establishes strategic partners, which is helpful for the enterprise's technological innovation.
Market innovation is mainly reflected in marketing and management capabilities. From the formation of initial products to the market introduction, the whole process is inseparable from professional marketers and marketing strategies. erefore, to strengthen the market development, enterprises need to expand sales channels, explore new sales models, and improve the sales platforms. Moreover, enterprises should build their own sales team, introduce professional marketing personnel, and strengthen the training and management of marketing.
Introduce innovative management talents and improve the enterprise innovation mechanism. e enterprises need to construct a suitable organizational framework and formulate the innovation management system, including a talent introduction mechanism, talent training mechanism, innovation incentive system, reasonable innovation evaluation system, innovation achievements protection mechanism, etc. It can mobilize the enthusiasm of corporate managers and employees by improving the incentive mechanism, including interest incentives, competence incentives, power incentives, and responsibility incentives. e interest incentives include salary, welfare, bonus, etc. Competence incentives include training, competitive employment, etc. e power incentives mainly refer to promotion, and responsibility incentives mainly refer to a reasonable assessment system. Mobilizing the enthusiasm and creativity of employees can effectively enhance the enterprise's technological innovation capabilities.

Implication for eory and Practice.
is paper enriches and deepens the theoretical research on evaluating enterprise's technological innovation capability from the theory aspect. e paper first extracts prominent factors that affect enterprise technological innovation based on the collected textual data. It provides a reference for the construction method of the enterprise's technological innovation evaluation index system. Moreover, the process of enterprise's technological innovation is dynamic and continuous. Traditional assessment methods for enterprise technology innovation lack automatic processing ability and cannot meet the needs of large-scale, high-quality, and in-depth knowledge acquisition. As a high-tech knowledge processing technology, the text mining method has significant advantages in the intelligent analysis of enterprise technological innovation capabilities. It can reflect the objective level of enterprise's technological innovation under the data-driven.
is paper proposes a method for discovering enterprise technological innovation knowledge based on semantic mining technology. It is helpful to explore the potential factors of influencing the enterprise technological innovation by revealing the inherent complex associations of enterprise's textual data and extracting valuable patterns and knowledge. To our knowledge, this research is the first attempt to apply the semantic text clustering algorithm in enterprise's technological innovation. In addition, this paper analyzed the development status and trends of the enterprise's technological innovation capabilities based on the collected textual data from 400 enterprises in Beijing. It helps to promote technological innovation of the enterprises by smart analysis under the data-driven.
us, the potential benefits of the proposed model can help drive and facilitate the enterprise's technical innovation capability.

Conclusions
is paper utilizes a data-driven intelligent mining method based on a semantic conceptual model to analyze the influencing factors of enterprise technology innovation. Some scholars believe that technological innovation's evaluation system construction method is a questionnaire design and survey analysis. However, it inevitably involves human factors and the subjective judgments of experts. is study proposed a systematic process for evaluating the enterprise's technological innovation based on sizeable textual data. Computer information processing analyzes the evaluation factors and indicators that affect enterprise technology innovation from objective data. Furthermore, the traditional text clustering algorithm based on the Bag-ofwords model ignores the semantic relationship between concepts, resulting in an unsatisfactory clustering effect. erefore, this paper proposed an improved semantic concept-based clustering algorithm to analyze enterprise's collected textual data. e performance of the improved K-means clustering method based on semantic similarity and relatedness is superior to the traditional K-means clustering method. e proposed method realized the clustering of critical factors in the field of enterprises technological innovation, and the analysis of experimental results can obtain the seven key factors that affect the technological innovation of enterprises.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares that there are no conflicts of interest regarding the publication of this paper.