Research on Enterprise Competitive Information Management Based on Co-Occurrence Analysis

Link data source lack has seriously hampered the research development of competitive information for enterprises based on co-link analysis in recent years, and exploring application and innovation of the new method of URL co-occurrence analysis in enterprises competitive information research is very important to make up for current inadequacy. And exploration on problems such as whether Jaccard handling needs to be conducted on collected data and how the effect is to conduct information research in different methods for data in different handling ways is meaningful to theoretical perfection of competitive information research of enterprises. Non-Jaccard data and Jaccard data are compared with examples of Chinese representative banks based on new methods of URL co-occurrence analysis, and competitive information research of enterprises is conducted in the multidimensional scaling method and social network analysis method; and comprehensive exploration and analysis are conducted from multiperspectives such as gradient grading analysis, recognition of main competitors, and competitive landscape analysis of fine grit. Innovative research competitive information for enterprises based on URL co-occurrence analysis in this thesis effectively makes up for current inadequacy due to link data source lack, and it expands new spaces for competitive information research of enterprises; it is also in research in this thesis that analysis result based on non-Jaccard data is far better than analysis result based on Jaccard data when multidimensional scaling analysis is conducted; analysis result based on non-Jaccard data and analysis result based on Jaccard data uncover competitive information of related enterprises when social network analysis is conducted, and above conclusions in this thesis are all verified and analyzed through example data.


Introduction
Web link analysis is one of the methods most widely applied and deeply researched in web-metrics; it is initially used to construct network influence factor [1,2], and later plenty of related conceptions and technologies are extended. e main purposes are all basically to uncover information of different levels and connotations based on web link data, and in-link analysis and co-link analysis are two practical web link analysis methods. e co-link analysis method was proposed by Larson [3] first, and it has been applied to uncover relations among organizations of different types, such as academic relation [4], political relation [5], and commercial relation [6]. Vaughan [7][8][9] was the representative figure to apply co-link analysis to competitive information research of enterprises, and he completed plenty of work from theory to demonstration; domestic Zhou and Zhou [10] and Jin [11] once selected enterprises of different types to conduct research, and they all verified important functions of colink analysis in the aspect of competitive information research of enterprises.
However, in recent years, the usable link data source has been gradually reduced and search engines used before such as AltaVista, AlltheWeb, and Yahoo all have disappeared or support on link search has been canceled; Google also only returns sample data of link search, so it cannot be used for link data collection, and it greatly hampers further development of competitive information research of enterprises based on co-link analysis [1,12].
Scholars actively explore new solutions to solve the problem of data source lack of link analysis; currently, the main research thinking is to seek for a replaceable data object of link data, and there are mainly two schemes: the first scheme is to use network keywords as the object to replace network link. Vaughan and You proposed the conception of network co-word analysis [13] and collected data of telecom enterprises for verification; they found that the network co-word method could also be used to draw business competition map. Domestic Zhang and Yu [14] adopted multidata sources to compare co-link analysis and network co-word analysis, and they found that network co-word analysis can also uncover important enterprise competitive information. e second scheme is as follows: URL reference data are another possibility to replace network link, and URL reference is similar to super-link on structure; the so-called URL reference of website A to website B refers to that web-pages in website A contain URL of website B, but it does not link to website B [1,12] for sure. Yue and Fang [15] compared network co-word analysis and URL reference, and they pointed out that there may be problems of "multiwords for a single meaning" and "multi meanings for a single word" innetwork co-word analysis, while URL reference elements had obvious directivity and uniqueness; thus, enterprise competitive information research based on URL co-occurrence analysis may have wider application prospect and bigger innovation space.
At present, there are a few achievements [1,12,16,17] on the research of enterprise competitive information based on URL co-occurrence analysis. However, taking the representative banks of China as an example, comparing the two kinds of data of non-Jaccard and Jaccard, the research results of enterprise competitive information innovation using a variety of methods and from multiple perspectives are rare [18]. Whether the collected data need to be processed by Jaccard and how the data of different processing methods are processed by different methods for information research are of great significance to the theoretical improvement of enterprise competitive information research.

Selection of the Research Object and Data Collection.
According to the relevant information of several authoritative "bank rankings" [19][20][21][22], taking into account the representativeness of the categories, we finally selected 30 banks from the top Chinese banks in the above-ranking list as the research sample. eir number, name, and URL are shown in Table 1. URL co-occurrence frequency refers to the frequency that URL of website A and website B co-occurs in webpages of other websites, and search strategy to search URL co-occurrence of two websites is "http://www.icbc.com.cn http://www.ccb.com -site:http://www.icbc.com.cn -site: http://www.ccb.com" with Industrial and Commercial Bank of China and China Construction Bank. Bing has been the only international search engine to provide an automatic offline data source for web-metrics since May in 2011, and Professor elwall developed Web metric Analyst based on Bing API to provide help [1,23] for data collection in this thesis.

Data Processing.
e Jaccard index method, also known as Jaccard similarity coefficient, was proposed by Swiss statistician Paul Jaccard in 1901 to compare the similarity of sample sets. It is defined as the quotient of the intersection of two sample sets and the union of two sample sets. Using the Jaccard method to process the original hit frequency data, the purpose is to eliminate the influence of the size of the object itself and only reflect the similarity between the objects in the analysis results. Jaccard data are the data after the original data are standardized by the Jaccard method. Jaccard data eliminate the influence of its own scale to a certain extent. Non-Jaccard data are the original data, which are not standardized by the Jaccard method. However, recent studies [1,12] show that object size also reflects the important information of enterprise competition and is an important indicator of competitive position. It is also feasible and effective to directly use URL co-occurrence frequency data as the link weight between nodes for competitive information research. So, in this paper, we study both Jaccard data and non-Jaccard data.
In order to show the analysis results more comprehensively, this paper intends to use the original hit frequency data and Jaccard data to analyze the competitive spectrum and explore the differences and connections between them.
e Jaccard method is defined as the quotient of intersection and union of two sample sets, and the formula is as follows: where C ii and C jj are, respectively, hit times of searching objects i and j and C ij is simultaneous hit times of searching objects i and j [14,24,25]. en, collected original data can be arranged to generate URL co-occurrence original data matrix and co-occurrence data matrix upon Jaccard, and then related tools and methods are used to conduct multidimensional scaling analysis and social network analysis on the two kinds of a matrix. e formed matrix by URL co-occurrence original data is shown in Table 2: For the analysis results of the competition map, this paper intends to verify the correctness of the analysis of the competition pattern from two aspects: one is to use the publicly available relevant reports and data, and at the same time, on the previous research of using a blog search engine to obtain the key event path [14], further use social information search engines such as WeChat and microblog to obtain the key information; the second is to invite experts from banks to evaluate and judge the analysis results [12].

Multidimensional Scaling
Analysis. e multidimensional scaling method and social network analysis method are usually used in enterprise competitive information data. Multidimensional scaling (abbreviated as MDS) is a statistical analysis method which uses the similarity data between objects to reveal the spatial relationship between them. is method has strong visualization function and is the best choice to obtain the overall competitive map when the sample size is small, so it is favored by many researchers. Multidimensional scaling, also known as "similarity structure analysis," is one of the methods of multivariate analysis. It is a common method of statistical empirical analysis in sociology, quantitative psychology, marketing, and so on. e multidimensional scaling method is a data analysis method that simplifies the research objects (samples or variables) of multidimensional space to low-dimensional space for positioning, analysis, and classification, while preserving the original relationship between objects [14,26].
is paper also uses the MDS method to analyze the data, and the analysis tool is SPSS 16.0. For the convenience of description, the bank name is represented by V + number, which is the corresponding number of the bank in Table 1. Figure 1 is MDS figure obtained in this research based on Bing URL cooccurrence original data, and stress value is 0.07327, which is less than 0.1 and close to 0.05; dispersion accounted value is 0.92673, which is close to 1; thus, it indicates that fitting degree of MDS figure and actual data are good.

Analysis Based on Non-Jaccard Data.
It can be found from Figure 1

Analysis
Based on Jaccard Data. Firstly, Jaccard handling is conducted on Bing URL co-occurrence original data, and Figure 2 is the MDS figure obtained based on Jaccard data; stress value is 0.11374, which is close to 0.1 and less than 0.2; dispersion accounted value is 0.88626, indicating that fitting result is within an acceptable range.
It can be found that the effect of Figure 2(a) is worse than that of Figure 1 Bank (V19) among joint-stock commercial banks are scattered at the edge of the map; thus, overall clustering effect is not ideal. Figure 2(b) is similar to Figure 1(b) from the perspective of whether banks are listed and listed banks are mostly in relatively central positions of the map; though Figure 2(b) also can display a cluster of listed banks and cluster of unlisted banks, the effect is obviously worse than Figure 1(b).

Social Network Analysis
3.2.1. Analysis Based on Non-Jaccard Data. In Figure 3, the thickness of the line indicates the strength of the association between bank nodes. When the co-occurrence strength is greater than 30699, the URL co-occurrence network is shown in Figure 3 On the one hand, it shows that the competitive position of these banks is relatively high, and the competitive position of these banks is also relatively close, which is consistent with the total assets, net profit, and total profit of listed banks in the 2014 semiannual report. e report data of capital adequacy ratio [27] are also relatively consistent. When the co-occurrence intensity is greater than 30199, the URL co-occurrence network is as shown in Figure 3(b). At this time, except postal savings banks, other banks are just listed banks among the 30 banks, which indicates that these listed banks have a higher competitive position in the market. ese banks are mainly composed of state-owned commercial banks, joint-stock commercial banks, and two city commercial banks; Figure 3(c) shows all the URL co-occurrence networks. Shengjing Bank, Hengfeng Bank, Chengdu Bank, and three rural commercial banks are relatively weak and at the edge of the network.

Analysis
Based on Jaccard Data. Line weight also indicates associative degree among bank nodes in Figure 4, and it can be found in Figure 4(a) that Huaxia Bank and Bank of Beijing have the strongest competition association; they are most similar, and plenty of data indeed support this conclusion in terms of semiannual report [27] of listed banks in 2014; for example, total asset of Huaxia Bank was 1.78856 trillion yuan and total asset of Bank of Beijing was 1.475606 trillion yuan; thus, total asset scale of these two banks was the closest among listed banks, and Huaxia Bank just ranked before Bank of Beijing. Net profit of Huaxia Bank was 8.67 billion yuan, and the net profit of Bank of Beijing was 8.856 billion yuan; net profit of the two banks was the closest among listed banks, and Bank of Beijing just ranked before Huaxia Bank. However, the year-on-year increasing rate of net profit of Huaxia Bank ranked at the top among all listed banks. e capital adequacy ratio of Huaxia Bank was 9.95%, and the capital adequacy ratio of Bank of Beijing was 10.38%; the capital adequacy ratio of the two banks was the closest among listed banks, and Bank of Beijing just ranked before Huaxia Bank. Core capital adequacy ratio of Huaxia Bank was 8.20%, and core capital adequacy ratio of Bank of Beijing was 8.48%; core capital adequacy ratio of the two banks was the closest among listed banks, and Bank of Beijing just ranked before Huaxia Bank. Earning per share of Huaxia Bank was 0.97 yuan, and earning per share of Bank of Beijing was 0.84 yuan; earning per share of the two banks among listed banks was relatively close, and Huaxia Bank was two spots ahead of Bank of Beijing. e year-on-year increasing rate of earnings per share of Huaxia Bank ranked first among all listed joint-stock commercial banks. e bad-loan ratio of Huaxia Bank was 0.93%, and the bad-loan ratio of Bank of Beijing was 0.68%. Among the listed banks, the nonperforming loan ratio of the two banks is close, which is at the optimal level of all the listed banks. From the correlation strength greater than 395 to the correlation strength greater than 272, the increased correlation nodes are all the banks with competitive correlation with Huaxia Bank and Bank of Beijing, which are mainly composed of joint-stock commercial banks and city commercial banks. From the correlation strength greater than 272 to the correlation strength greater than 210, the increased competitive correlation  mainly exists between the banks with competitive correlation with Huaxia Bank and Bank of Beijing in the previous stage. All URL co-occurrence networks after Jaccard are shown in Figure 4(d). Ping An Bank; their edge link is much, and link weight sum is also relatively great, and they are also in close relation to state-owned commercial banks area; thus, they are in important positions of the competitive landscape. Urban commercial banks area is composed of nine bank nodes such as Bank of Beijing, Bank of Ningbo, Shengjing Bank, Huishang Bank, Bank of Jiangsu, Bank of Chengdu, and bank of Shanghai; their edge link is little, and link weight sum is relatively small, and they are in a certain relation to state-owned commercial banks; thus, they are in a secondary position of the competitive landscape. Rural commercial banks area is composed of three banks including Chengdu Rural Commercial Bank, Chongqing Rural Commercial Bank, and Guangzhou Rural Commercial Bank, and they are in edge position of the competitive landscape.

Comprehensive Exploration and Analysis
It can be seen that the competition pattern of China's representative banks has roughly formed a multilevel gradient structure. At present, the first gradient is mainly stateowned commercial banks with strong capital and strength, and there is fierce competition among them. Among them, ICBC and CCB are the core, and they are also the main Complexity competitors. e five state-owned commercial banks are in at forefront of all listed banks in terms of total assets, net profit, and capital adequacy ratio, and the ranking is close to each other. In terms of total assets and net profit, ICBC ranked first and CCB ranked second; in terms of capital adequacy ratio, CCB ranked first and ICBC ranked second. e second gradient is mainly joint-stock commercial banks, which are not as strong as the first gradient state-owned  commercial banks, but their products and services still have strong competitiveness, and they have the potential to grow up to the first gradient. While the joint-stock commercial banks compete with each other, they also pose a certain threat to the first-grade state-owned commercial banks. Among the joint-stock commercial banks, China Merchants Bank, Industrial Bank, China CITIC Bank, and Ping An Bank are more prominent. e third gradient is mainly urban commercial banks, which is closely following the second gradient. ey are mainly competitive with jointstock commercial banks. eir scale is relatively small, and some strong ones grow faster.

Identification of Major Competitors.
e co-occurrence value of different banks in the URL co-occurrence matrix and the weight of connected edges in the social network analysis competition map, that is, the thickness of connected edges, can reflect the degree of competition to a certain extent. erefore, the main competitors of specific objects can be more clearly identified from these data and maps. For example, the main competitor of CCB is ICBC, the main competitor of the postal savings bank is CCB, China Communications, ICBC, and China Merchants, and the main competitor of Huaxia Bank is Bank of Beijing. It can be seen from the 2014 semiannual report [27] of listed banks that ICBC ranks the top one in terms of total assets, net profit, core capital adequacy ratio, and so on compared with CCB; ICBC ranks the top two in terms of the year-on-year growth rate of total assets, and ICBC also ranks the bottom two in terms of nonperforming loan ratio. In terms of capital adequacy ratio and net profit growth rate, CCB is slightly better than ICBC, ranking first and second, respectively. Although the total assets and net profit of China Construction Bank, Bank of Communications, Industrial and Commercial Bank of China, China Merchants Bank, and other banks are slightly higher than or close to the postal savings bank, the postal savings bank network covers China's rural areas with deposits of nearly 800 billion US dollars. e postal savings bank has nearly 40000 branches in China, and its network size has exceeded that of CCB, communications, industry and commerce, China Merchants, and other banks [28]. In terms of net profit, capital adequacy ratio, and core capital adequacy ratio, the Bank of Beijing ranks the top one compared with Huaxia Bank. In terms of total assets and earnings per share, Huaxia Bank ranks the top 1 and top 2, respectively.

Analysis of Fine-Grained Competition Pattern.
As the joint-equity commercial bank has large quantities, further analysis can be made for the area of the joint-equity commercial bank so as to clarify its interior pattern. MDS analysis and social network analysis based on original data are taken as examples, which are shown in Figures 5 and 6. Figure 5 is the MDS diagram of the joint-equity commercial bank obtained based on Bing URL original co-occurrence data. e stress value is 0.03752, which is smaller than 0.05 and close to 0.025, while the D.A.F. value is 0.96248, which approaches to 1. It indicates that the fitness degree of the MDS diagram approaches to the optimum.
As can be seen from Figures 5 and 6, there are also strong differences within the joint-stock commercial bank cluster. Ping An Bank, Shanghai Pudong Development Bank, Everbright Bank, Minsheng Bank, Huaxia Bank, China CITIC Bank, China Merchants Bank, and Industrial Bank are in the core position, and they are relatively concentrated, indicating that among the joint-stock commercial banks, these banks have stronger strength. And the difference between them is not particularly big, and there is more competition. Zheshang Bank, Hengfeng bank, Bohai bank, and other banks are in a relatively marginal position. e graph can also clearly show the main competitors and

Conclusion
Based on the new method of URL co-occurrence analysis, this paper takes the representative banks of China as the research object and uses a variety of methods to carry out the innovative research of enterprise competitive information from multiple perspectives, which effectively makes up for the deficiency caused by the lack of linked data sources and opens up a new space for the research of enterprise competitive information. It is also found that the analysis results based on non-Jaccard data are much better than those based on Jaccard data in multidimensional scaling analysis; in social network analysis, the results are based on non-Jaccard data, and Jaccard data reveal the competitive information of related enterprises from different aspects. e above conclusions are verified and analyzed by the case data.

Data Availability
e data used to support the findings of this study are included within the article.  12 Complexity