Forecasting the Development of Self-Driving Technology in China by Multidimensional Information

,


Introduction
e road is the artery of economy based on which persons, goods, and materials are transported in time. Efficient road traffic is the foundation for social functioning [1,2]. However, road traffic problems are getting worse nowadays and begin to have a negative impact on the sustainable development of economy and society. More and more traffic jams and traffic collisions are observed, which result in a large number of casualties and property losses [3]. Another issue is the air pollution caused by automobile exhaust, which contributes to greenhouse gases and environmental pollution [4]. Self-driving technology was demonstrated to be an alternative that may alleviate the above problems [5,6]. As a result, arrangements had been made for self-driving technology by countries including China. e "Made in China 2025" was put forward by the Chinese government on May 19th, 2015. In this project, energy-efficient and new energy vehicles are within the ten key industries for development [7]. e document clearly stated that by 2025, China will master the complete technology of autonomous driving and corresponding key technologies. In addition, an independent Research and Development (R&D) system will be established. In summary, self-driving technology is one of the core fields in the next few years in China. Different from traditional technologies, self-driving technology is usually considered as disruptive technology [8]. Disruptive technologies mean high cost on R&D, high economic benefits, and high update speed for the technology [9]. High cost means low fault tolerance, and high update speed indicates that it is easy to fall behind for a country or a company. erefore, accurately identifying the future trends of self-driving technology as soon as possible is crucial for governments' and vehicle enterprises' R&D strategic planning in order to obtain the first-mover advantage in the global competition. Hence, forecasting the development trends of self-driving technology is getting more and more attention in recent years. e feasibility of technology forecasting for self-driving technology should be discussed first. Some restrictions should be fulfilled so that technology forecasting can be applied to a certain technology [10,11]. In this paper, self-driving technology is assumed that (a) the data in the past contain all the information necessary for the future and (b) the technology trend of self-driving technology is considered to be a continuously growing technology area with a certain pattern, and the pattern of this trend has existed for a certain period of time. Based on the assumptions, it is able to identify technological changes in the future based on the history data of technology with proper technology forecasting methods.
Numerous methods had been proposed to reveal information hidden in the technology data [12][13][14][15][16][17][18][19]. Generally, technology forecasting methods can be divided into two categories depending on the degree of dependence on data: qualitative methods and quantitative methods [12]. e qualitative methods are usually based on the experience and knowledge of experts, which are time-consuming and subjective [13]. Furthermore, it is more suitable for the early stage of a certain technology. With the advancement of computer technology and related numerical tools, more and more attention had been paid to quantitative methods [14][15][16][17][18][19]. Statistical tools [14][15][16] and text mining tools [17][18][19] are widely used in technology forecasting, which can deal with massive raw data. As text mining is able to find implicit, previously unknown, and potentially useful patterns from large text documents [20], it becomes the mainstream method of technology forecasting. However, it is hard to eliminate the irrelevant and meaningless information by text mining, which can be solved by export knowledge effectively. To combine the advantages of the methods, a model that integrating text mining and expert knowledge is developed in this paper to forecast technology trends efficiently as well as improving the reliability of the results.
Text documents are necessary for both text mining and expert knowledge to forecast technology trends. e quality of results of forecasting is deeply influenced by the type and quality of the documents. For a long time, patents were recognized as a crucial data source for the study of technology innovation [15] and technology forecasting [21,22]. However, patents are not enough to get a complete picture of technology developments. Scientific papers are also the achievements of basic research and the seeds of technology innovation [23]. So, a combination of scientific papers and patents will reflect the research level and acceptance of the technology. On the other hand, industry information can reveal the market acceptance and potential of the technology, which will also affect the trend and direction of the technology development. erefore, the multidimensional information combining patents, papers, and industrial data is used in this paper which can reveal the development status of the technology more comprehensively.
In this paper, the technology forecasting model is developed, which integrates topic-based text mining and expert judgment approaches to forecast self-driving technology trends with scientific papers, patents, and industry data. e paper is structured as follows. e specific procedure of technology forecasting is provided in Section 2. In Section 3, self-driving technology in China for 2002∼2019 is analyzed in detail, and self-driving technology trends in the future are predicted. e conclusions are shown in Section 4.

Methodology
e model is an extension based on the framework of Li [24]. In Li's framework, topic-based text mining and expert judgment approaches were integrated to forecast technology trends with scientific papers and patents. In this model, industry data are considered in addition to scientific papers and patents. Furthermore, an extra preconditioning stage (stage 1 in Figure 1) is constructed to make a preliminary assessment of the problem which can effectively avoid useless work. e framework of the model is illustrated in Figure 1. Some details for the four stages in the framework are given as follows.

Stage 1:
Determine the Feasibility and Necessity of Technology Forecasting. In this paper, Web of Science (WOS) and Derwent Innovations Index (DII) databases are adopted as the data sources for collecting scientific papers and patents, and the industry data are obtained from iiMedia (an authoritative economic data agency in China) [25]. Different search queries related to the target technology (self-driving) are constructed. e results are compared in order to determine the final search queries. Time ranging from 2000 to 2019 is adopted with the step of a year. To determine the feasibility and necessity of technology forecasting, growth curves of patent data, paper data, and industry data are constructed. e curves are compared with the S-curve in the technology life cycle to identify the technology life periods.

Stage 2: Clustering Topics.
e text content of patents and papers from DII and WOS is saved as files by year. In order to facilitate the analysis, the files are converted into XML format with Citespace [26]. en, the Lingo algorithm in Carrot2 [27] is utilized to generate technology topics for the papers and patents.

Stage 3: Constructing the Hierarchical Structure and the Evolution Maps of the Technology.
With the clustering results as objective evidence for decision-making, a topic analysis process is performed with the results and experts' knowledge. Finally, the hierarchical structure of self-driving technology is constructed.

Stage 4: Forecasting the Development Trend of the Technology.
With the technical topics from papers and patents, the differential analysis of the technical topics with high growth rates between scientific papers and patents is carried out. e technology evolution path based on papers and patents is compared to forecast the future development trend of the technology.

Data Collection and Preprocessing.
In this paper, the data sources for paper, patent, and industry data are from Web of Science (WOS), Derwent Innovation Index (DII), and iiMedia, respectively. Different search queries related to selfdriving are constructed with Wikipedia and expert advice.
e results are compared in order to determine the final search queries. e term "TS � (self-driving * OR autonomous car * OR driverless car * OR robot car * OR autonomous vehicle * OR driverless vehicle * OR robot vehicle) AND CU � (China)" is utilized as the query to search the scientific papers from WOS, and 2874 papers were retrieved from the database for 2000 to 2019. e term "TS � (selfdriving * OR autonomous car * OR driverless car * OR robot car * OR autonomous vehicle * OR driverless vehicle * OR robot vehicle) AND PN � (CN * )" was used as the query to search the patent data from DII, and 30870 issued patents were retrieved from the database from 2000 to 2019. e search was done on August 19th, 2020.
Data cleaning is necessary for the retrieved papers and patents. e unrelated patent data and papers are removed manually. e cleaned papers and patents were converted into XML format with Citespace so that they are suitable for text mining with Carrot2.

Analyze from a Holistic Perspective.
e annual amount of the scientific papers and patents are shown in the left side of Figure 2. Generally, the number of scientific papers and patents grew rapidly with time. An exponential growth is observed since 2012, showing that the research and development of self-driving technology have been very active in recent years. To show the growth quantitatively, the corresponding growth rate of papers and patents is shown in the right side of Figure 2. In the early years, oscillations are observed in the results due to the low sum of papers and patents. e growth rate tends to be stable from 2012. e mean growth rates for papers and patents within 2012∼2019 are 46.9% and 39.0%, indicating that related fields are very active. e annual amount of the industry investment is presented in Figure 3. It is observed that a significant increase in industry investment started in 2016, which is due to the national policy.
e National Intelligent and Connected Vehicle (ICV) Testing Demonstration Base was approved by the Ministry of Industry and Information Technology of China in June 2015 [28]. In June 2016, the ICT base was officially opened. erefore, a large amount of investment entered the market in 2016. In 2017, a drop in investment is observed; the reason is that the field had entered a period of steady development. So, a steady increase in investment is observed in 2018 and 2019.  Journal of Advanced Transportation e growth curve method means to fit the growth curve of a variable into the life cycle curve so that an estimate of future performance can be obtained by extrapolation [29]. As shown in Figure 4, the development of technology usually follows the rule of slow-fast-slow. e corresponding life cycle curve is S-shaped and can be divided into four stages, including emerging, growth, maturity, and recession.
Comparing the development tendency of papers and patents (Figure 2) with the life cycle curve, it is not hard to determine the location of paper, patent, and investment.  Paper is in the middle and late of the growth stage, and patent is in the middle of the growth stage. e investment in the self-driving industry is just growing, which means the self-driving industry is in the stage of emerging to growth. erefore, research on self-driving technology is in the explosive growth stage, and commercialization of selfdriving technology is still in the early stage. e rapid development of technology research will influence a lot on the industry development, so it is necessary to forecast the tendency of technology.

Topic Clustering.
e annual data of the scientific papers and patents were analyzed with Carrot2 software. e Lingo algorithm in Carrot2 is adopted, and the corresponding control parameters were set to minimum cluster size � 2, cluster merging threshold � 0.7, and size-score sorting ratio � 0. e clustering results for papers and patents are shown in Figures 5 and 6. As the number of documents in the early years did not meet the minimum clustering requirements, the clustering process for the papers and patents is from 2004 to 2019 and from 2002 to 2019, respectively. e subject terms obtained by clustering are imported into the word frequency visualization software to display the clustering results more clear. As shown in Figures 5 and 6, the hot topics in papers and patents are clear and distinct by year. However, there are also irrelevant and meaningless topics that need to be eliminated, such as "Model," "Current," and "Arm 2 Arm." As mentioned before, the unrelated patents and papers are removed manually.

Construction of the Hierarchical Structure of Self-
Driving Technology. Based on clustering results in Section 3.3.1, the hierarchical structure of the technology is constructed as a technology tree to clarify the logic between the topics of self-driving technology. e technology tree is a branching diagram that represents relationships among technologies [30]. It provides a picture of the technology [31] to represent the relationships among product components, technologies, or functions of technology in a specific technology area [29]. e technology hierarchy can be utilized in selecting an interesting technology area for indepth analysis [32]. erefore, it is important to construct the hierarchical structure of the technology.
Before constructing the technology tree, the subject headings are classified and summarized with the help of experts in self-driving technology. e irrelevant topics and meaningless topics are also removed. With the classified results, the resulted structure is shown in Figure 7. As shown in the figure, the hierarchical structure of self-driving technology consists of two parts, "Hardware" and "Software." e "Hardware" category contains the tangible entity that supports self-driving technology, which consists of "Sensors," "Vehicle Controller," and "Electricity"; the "Software" contains the invisible algorithm, which consists of "Navigation," "Positioning," "Perceive," "Decision," and "Control." is information makes it possible to identify the topic clustering results' category, as well as define the structure of the technology evolution map. en, on the basis of the two categories, the experts merge the interrelated topics and subdivided them into the corresponding categories.

Comparison of the Technology's Evolution Based on
Papers and Patents. Based on the hierarchical structure of self-driving technology, now we try to analyze the evolution trend of technical topics. e statistical test [33][34][35] is considered before the analysis. Topics from different perspectives are considered as follows.
We consider "Hardware" topics and "Software" topics first. e amount of the topics that appeared in papers and patents are shown in Figure 8(a). e amount of both topics in papers and patents are relatively small before 2012. After 2012, both "Hardware" and "Software" grow rapidly in patents and have the same order of magnitude. Significant growth is also observed for "Software" in papers since 2012. However, the amount of "Hardware" in papers is small from beginning to end. Although the "Hardware" in papers seems to grow since 2017, the amount is still much less than that of "Software" topics in papers. We speculate that the phenomenon is due to that papers about hardware are hard to publish. e relative proportion between "Hardware" and "Software" in papers and patents is presented in Figure 8(b). e results are nondimensionalized so that the sum of the proportion is 1. For the papers, the relative proportion is not stable before 2012 due to the low amount. After 2012, the "Software" has a very clear larger proportion. For the patents, the amount of "Hardware" and "Software" are always similar. Similar results are obtained for the growth rate (Figure 8(c)). For the papers, the growth rate of "Software" is larger than that of "Hardware" since 2012 while the results of "Hardware" and "Software" are similar for the patents. We may conclude that the applied research studies on selfdriving technology have more balanced development in software and hardware.

Journal of Advanced Transportation
Based on the above analysis, it is beneficial to analyze the subdivided topics of "Hardware" and "Software" in papers and patents. As the amount of "Hardware" topics in papers are quite small, it is not included in the relevant analysis. e topics "Navigation," "Positioning," "Perceive," "Decision," and "Control" in the "Software" topic of papers are shown in Figure 9. e relative proportion is also given to show the variation more clear. Since 2012 from which the sum of papers is not so small, the topic "Decision" gradually becomes dominant in the "Software" topic. e topic "Control" has a relatively stable proportion, and the proportion of other topics is low. We concluded that decision algorithm is a hot topic in related research. e topics "Sensor," "Electricity," and "Vehicle Controller" in "Hardware" topic of patents are shown in Figure 10. In the early years (2006-2012), "Sensor" and "Electricity" dominate the "Hardware" of patents. en, "Vehicle Controller" started to increase very fast. Till 2016 and from that time on, the proportion of "Vehicle Controller" is larger than 80 percent. "Sensor" still has a small proportion, and it is hard to find a patent on "Electricity." We conclude that related technology on "Sensor" and "Electricity" is mature.
e "Perceive" topic still has a place while it is hard to find a patent on other topics.

Forecasting of Technological
Hotspots. Shibata et al. [36] extracted the commercialization gap between papers and patents and proposed that in the active technical    Journal of Advanced Transportation research field, topics that exist in papers but not in patents are considered as technological opportunities. To some extent, technological opportunities can determine technological developments in the future [37]. erefore, it is critical to identify technological opportunities in order to forecast technology development trends.
Similar to the method proposed in [36], we make a comparative analysis of "Software" topics between papers and in patents. e analysis of "Hardware" topics is not performed due to the low amount of papers. e comparison of the results between papers and patents is presented in Figure 12. As discussed before, for the results of papers, the subtopic "Decision" gradually dominates since 2012. "Control" has a relatively stable proportion, and other topics have very small proportions. However, for the results of patents, "Control" dominates since 2015. e proportion of "Perceive" and "Positioning" is small, and other topics including "Decision" are not observed.
With the above findings, the following conclusions are drawn. (1) Technology on "Control" is basically mature, and research on the topic is mainly about production realization.
(2) e proportion of "Perceive" and "Positioning" will be small in the future. (3) Scientific research on "Decision" is still undergoing. It will be a hotspot when the relevant algorithms are mature.

Conclusions
In this paper, a technology forecasting model is developed and used to forecast the development trends of self-driving technology in China. To improve efficiency and accuracy, topic-based text mining and expert judgment approaches are combined to forecast technology trends. Multidimensional information including scientific papers, patents, and industry data ranging from 2002 to 2019 is considered to improve the reliability of the results. e findings are listed as follows: (1) Research on self-driving technology including papers and patents is in the explosive growth stage, and commercialization of self-driving technology is still in the early stage. e investment amount is significantly influenced by the government's policy. (2) With the hierarchical structure of self-driving technology, it is observed that "Software" topics dominate in papers while a more balanced development in software and hardware is obtained in patents. (3) e topic "Decision" dominates in "Software" topics of papers. For patents, subtopics about "Control" dominate in both the "Software" and "Hardware" topics. A time lag phenomenon is observed between papers and patents. (4) We speculate that technology on "Decision" will be the next hotspot in patents.

Data Availability
e data used to support this study are included within the article and are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.