Classification and Evolution Analysis of Key Transportation Technologies Based on Bibliometrics

To study the classiﬁcation and evolution of key technologies in the transportation ﬁeld, the data of 36 authoritative SCI journals in the transportation ﬁeld were collected from the Web of Science core collection database from 2001 to 2020. Based on the bibliometric method, this study used Python to process and visualize data, combined with bibliometric software VOSviewer to assist data visualization. Firstly, a preprocessing data algorithm was designed to deduplicate the collected data, merge synonyms, and extract key technologies. Then the paper records that contained the key technology lexicon were ﬁltered out. Next, the annual number of publications and the distribution of key technologies over time were counted. The least squares method was used to ﬁt the distribution of the annual proportion of the publications, and the slope k 1 of the ﬁtted linear regression equation was used to determine the research interest trend of key technologies. The key technologies were divided into “hot technology,” “cold technology,” and “other technologies,” according to the research heat trend. In order to further explore the research hotspots, the least squares method was also used to ﬁt the citations of all technologies to obtain the slope k 2 . We use the Gaussian mixture model (GMM) algorithm to cluster k 1 and k 2 of each technology. As a result, the 144 technologies were divided into 13 super-key technologies, 60 key technologies, 59 relative key technologies, and 12 lower-key technologies. Then, the evolution of key technologies was analyzed from two perspectives of weighted evolution and cumulative evolution. And the technology evolution trend in the transportation ﬁeld in the past 20 years was explored. Finally, the cooccurrence clustering method was adopted to divide key transportation technologies into ﬁve categories: vehicle technology and control, optimization algorithms and simulation techniques, artiﬁcial intelligence and big data, Internet of Things and computing, and communication technology. The research results can provide references for diﬀerent people in the transportation ﬁeld, including but not limited to researchers, journal editors, and funding agencies.


Introduction
e economy and society are in an unprecedented stage of super-rapid development, and subversive technologies are emerging. Burgelmanet defines technology as "technology refers to theoretical and practical knowledge, skills, and artifacts that can be used to develop products and services and their production and delivery systems" [1]. Technologies are the most productive means, permeated into all aspects of social production, promoting the transformation of various industries' decisive factors. ey have changed the way people travel, from walking and horse-drawn carriages to the first human steam car, planes, high-speed trains, subways, shared travel, and driverless vehicles. In addition, technologies will have a profound impact on the structure and operation mode of various transport and the rules of transport. Although it is difficult to accurately judge future transportation development trends, the evolution and maturity of research technology would significantly impact transportation discipline and industry. In this paper, we refer to the technology applied in the transportation field as transportation technology.
In terms of the application of technology, recurrent neural network technology is applied to GPS trajectory mining to reconstruct a complete public transportation network [2], promoting the development of public transportation. Multi-input and multioutput (MIMO) communication technology is used to solve communication capacity in rail transit [3]. Cloud computing combined with fog computing is used to solve fog computing performance degradation during rush hours [4]. Augmented reality (AR) is used to simulate the onscreen communication of self-driving vehicles when meeting hazards on the road and intentional behavior. Studying the trust, usability, and experience [5], Internet of ings of selfdriving vehicles users can be applied to random early detection for vehicles dynamic. It can be applied in signalized interfaces to proactively detect incipient congestion and set the best cycle and phases of traffic lights [6]. Deep learning is used for crowdedness prediction [7], short-term prediction of traffic flow [8], pedestrian behavior recognition [9,10], driving policy for autonomous road vehicles [11], and license plate segmentation and recognition [12][13][14]. In addition, mobile-edge computing [15], railway traffic conflict control [16], and federal learning [17] have also been applied to transportation.
Bibliometrics is a complex subject with an extensive combination of information science, philosophy, and statistics for a journal or a specific field [18]. With the help of bibliometrics indexes and tools, the characteristics, keywords, and hot topics of journals can be explored [19]. ere have been some attempts of topic extraction and research trend analysis from bibliometrics in transportation. Based on the literature data published in IEEE Transactions on Intelligent Transportation Systems from 2000 to 2009, Cobo et al. [20] used coword analysis to detect, visualize, and evaluate ITS concepts and ITS subject areas. Wang et al. [21], based on articles of IEEE Transactions on Intelligent Transportation Systems (2010-2013), studied the productivity and collaboration models of Intelligent Transportation Systems. Tang et al. [22] classified the subject categories of different research fields of core articles in IEEE Transactions on Intelligent Transportation Systems (2010-2013) by coword analysis, including vehicle control technology, modeling, simulation, and image processing. Moral-Muoz et al. [23] used highly cited literature to study scientific participants who have made significant contributions to the development of Intelligent Transportation Systems. Davarzani et al. [24] used bibliometrics and network analysis tools to identify key researchers, cooperative models, research clusters, and relationships in green ports and maritime logistics. Arunachalam et al. [25] conducted a literature review of big data technology capabilities in the supply chain and established a technology capability maturity model. Tian et al. [26] used network analysis and cluster analysis to identify the trends and characteristics of transportation carbon emissions. Based on selected books or book chapters and 162 studies published in 48 academic journals between 1979 and 2018, Alexandridis et al. [27] surveyed all published research in the area of shipping finance and investment. Zhou et al. [28] analyzed 704 papers published in Transport from January 2007 to June 2019 and investigated the development of the journal in terms of the current situation and emerging trends. rough bibliometric analysis, Li et al. [29] made a statistical analysis of all journal publications, influential papers, main contributors, and main subjects of bottleneck model research in the past half-century. Meyer [30] systematically and quantitatively reviewed the literature on the decarbonization of road freight transportation with literature coupling and cocitation analysis. Abduljabbar et al. [31] analyzed 328 journal papers from the Scopus database from 2000 to 2020 to explore the changing trends of micromobility research.
At the same time, bibliometric software such as VOSviewer and CiteSpace is also helpful for the research progress and the recognition of the characteristics of the paper. For example, the papers of TR-Part B from 1979 to 2019 were analyzed with VOSviewer [32]. Liu et al. [33] used CiteSpace and VOSviewer to identify the research progress and trends in traffic prediction by bibliometrics. Based on the data of 1045 documents from Computer-Aided Civil and Infrastructure Engineering from 2000 to 2019, Wang et al. [34] analyzed the characteristics of these documents with VOSviewer and CiteSpace.
Most of the research has focused on specific areas, such as ITS, big data analytics, micromobility, and traffic forecasting, instead of conducting a systematic review of all technologies that affect transportation development. Specifically, as many technologies are being applied to transportation, there should be some research on the classification and evolution of transportation technology. However, few research studies analyze which technologies are becoming more popular or less popular, how many categories they can be grouped into, and how they have evolved over time. erefore, this paper collected data from 36 authoritative SCI journals in transportation for bibliometric analysis in the last 20 years. And then, the distribution of traffic key technologies in recent 20 years, research hotspots with time, the changing trend of research hotspots, and the evolution and classification of traffic key technologies were explored. is paper will reveal the classification and evolution trend of the key technologies of transportation and dig out the research hotspot technologies in transportation. It is hoped that researchers of key transportation technologies can better understand the development situation regarding this study.

Dataset.
e data used in this study come from the Web of Science Database, which is the largest, most interdisciplinary, authoritative, and influential comprehensive academic information resource. It contains more than 8,700 academic journals worldwide, covering natural science, social science, biomedicine, engineering technology, arts and humanities, and other fields. e 36 SCI journals in Transportation Science and Technology in the 2019 JCR Report published by Corey Weian on June 29, 2020, were searched from 2001 to 2020. Generally speaking, technologies applied in the transportation field will be published in these 36 SCI authoritative transportation journals. ere were 56,451 samples retrieved on April 24, 2021, and the exported record content was a full record. e fields are shown in Table 1 and the periodicals and abbreviations retrieved are shown in Table 2.
2.2. Data Preprocessing. As this paper aimed at the key technology, the technical keywords were filtered from the keywords. And the data preprocessing algorithm is shown in Figure 1. Whether the author's keywords were empty or not should be judged firstly, and if they were, replace them with additional keywords, which could represent the article well. If both the author keywords and the additional keywords were empty, the record should be deleted, leaving 51,457 records after this step. For the retained records, the highfrequency keywords each year were counted, such as the top 100 in this article, and then the nontechnical keywords were manually deleted, such as "environment," "resource," and "vehicle" or other nouns. After that, the key technology word library was initially obtained. en, the synonyms were merged, such as singular, plural, and abbreviation, to get 144 key technology words, being the final key technology word library. According to the key technology word library, papers with the keywords in the library were screened out. e nontechnical keywords were deleted from the screened records, and 15449 records were obtained finally.

Methodology
Based on bibliometrics, this paper used Python to program the data. In the meantime, VOSviewer and Python were used to do the data visualization.

Constructing Cooccurrence Matrix.
It is generally believed that there is a correlation between the keywords given in the same paper, and the frequency of cooccurrence can express this correlation. e more the pairs of words appear in the same paper, the closer the relationship between the two words is. e cooccurrence matrix was constructed according to the frequency of words in the same paper. As shown in Figure 2, one record represents a document. Figure 2 (left) represents the  3.2. Cooccurrence Clustering. Cooccurrence clustering was used to cluster the key technologies, which was based on the distance relation of each item in the statistical database, with the main idea of minimizing the sum of the Euclidean metric weight distances between all individuals in each category in the matrix formed by the database. In this study, the constructed cooccurrence matrix was transformed into the format .net that VOSviewer could recognize.

Degree Centrality.
e index used to measure the position of a node in a network is called degree centrality. If a node is associated with many others, it is probably at the center of the network. In bibliometrics, the number of points directly connected to the node is usually used to measure the degree of centrality. For example, the more the keyword appears together with other keywords, the more central the keyword is.

Annual Proportion.
Annual proportion is the frequency of occurrence of a keyword divided by the total frequency of occurrence of all keywords in the current year. e larger the annual proportion, the more popular the research of the keyword in that year, and vice versa.

Least Squares Fitting.
To get each technology's research heat trend over time, the least squares were used to fit each technology's proportion in 20 years. e linear regression equation was solved as follows: where x i is the year, y i is the percentage of the study, a is the slope, and b is the intercept.
3.6. Weighted and Cumulative Evolutionary Path. A weighted and cumulative evolutionary path method was used to analyze the evolution trend of key technologies from different angles. Formula (2) was used to calculate the weighted average occurrence time of different key technologies in the long axis of the study: where year is the weighted average occurrence time, year i is the year the key technology emerged, and counts i is the frequency of key technologies that occurred in that year. For the cumulative evolutionary path, the statistical approach was as follows: where year yi is the time the keywords appear in the cumulative evolution diagram and year i fir is the time the keyword first appeared in the study.  Figure 1: Data preprocessing algorithm. 4 Scientific Programming For each of the two methods, the frequency of occurrence of each keyword was needed to be counted. And the statistical formula was as follows: where P c is the total frequency of occurrence of keywords in all years of study.

Gaussian Mixture Model (GMM) Algorithm.
e GMM refers to the linear combination of multiple Gaussian distribution functions. eoretically, GMM can fit any type of distribution. It is usually used to solve the problem that the data in the same set contain multiple different distributions. Given the random variable X, the GMM can be expressed as follows: where N(x|μ k , z k ) is the k-th model of the Gaussian mixture model, μ k and z k are the mean and variance of the k-th Gaussian model, respectively, and π k is the mixing coefficient, namely, the weight. It needs to meet the following conditions: 4. Data Visualization 4.1. Paper Volume Analysis. A bibliometric analysis of 15,549 pretreated papers was conducted in this study. To some extent, the change of paper volume represents the research status of a field. Figure 3 shows the number of papers published in all journals every year. Figure 3(a) illustrates the annual volume, and Figure 3(b) represents the cumulative volume of publications. As shown in Figure 3(a), the overall volume of publications increased slowly from 2001 to 2014. However, it had a relatively rapid increase from 2014 onwards, and especially after 2017, it increased almost exponentially. Figure 3(b) shows an exponential rise in the cumulative curve of publications. Since 2014, research on applying transportation key technologies to solve traffic problems has developed rapidly, and more and more scholars pay attention to it. In addition, the annual volume for each journal and the volume for each country were also calculated, as shown in Figure 4. Figure 4(b) only displays the top 30 countries. As can be seen, the journal IEEE T Veh Technol has the most significant volume (4,951), more than three times of IEEE T Intell Transp, indicating that it pays more attention to the study of key technologies of transportation. In terms of country, the US is the most significant contributor (3,477), followed by China (2,903), nearly three times as many as the third largest, Canada, showing that the US and China have made significant contributions to this area.

Distribution of Key Technology Hotspots over Time.
To explore the research focus of key technologies each year, this paper made statistics on the traffic volume according to the annual division, as shown in Figure 5, showing the top 10 traffic technologies. Optimization, vehicle technology, and transportation technology were popular in the past 20 years. Optimization and vehicle technology research in the last 20 years was in the top 10, the main technology applied in transportation. Figure 6 gives a more direct view of the percentage of each technique studied each year. Figure 6(a) shows the percentage of each traffic technology over time, with an area representing the percentage and a larger area representing more papers and the more popular research. Figure 6(a) shows only the top 10 technologies, and Figure 6(b) shows the annual percentage change. As shown in Figure 6, some techniques like optimization and vehicle technology had little difference in popularity and were primarily at their peak. Some others, such as CDMA (code division multiple access), were decreasing year by year.
On the contrary, some technologies, like EV (electric vehicle), were increasing year by year. And some technologies first increased and then decreased, such as OFDM (orthogonal frequency division multiplexing), with significant fluctuation. In addition, some techniques such as GA (genetic algorithm) were fluctuating, but the overall proportion was relatively stable.

Classification of Key Technology Change Trends.
To distinguish trends in technology, this paper defined the technology categories as "hot technology," "cold technology," and "other technologies." e hot technology represents the technology becoming more popular, the cold technology means the technology is becoming less popular, and others are other technologies. is paper used the least square fitting method to fit the percentage change of each technique in 20 years and judged the technique category according to the slope range of the fitting line. e fitting result is shown in Figure 7, where the curve represents the original data, and the straight line represents the fitted linear regression equation. According to the experiment, hot technology was with a slope k 1 of more than 0.00005, and cold technology was less than −0.00005, otherwise for other To a certain extent, the trend of changes in the number of publications can reflect the research enthusiasm of technology. However, in the research field, the citation rate of papers is usually used as an important reference index for research hotspots. In order to further explore the research hotspots, the least squares method is also used to fit the citations of all technologies to obtain the slope k 2 . We use the GMM algorithm to cluster the published volume slope k 1 and the cited slope k 2 of each technology. e clustering results are shown in Figure 9(a). It is found that, due to the excessive amount of publication and citation rate of some technologies, such as 5G, deep learning, and EV, the clustering effect is not very good. erefore, we define the technologies with slope k 1 greater than 3 and cited slope k 2 greater than 21 as super-key technology. e remaining technologies are grouped into 3 categories, which can be seen in Figure 9(b). In Figure 9(b), the red category is defined as a key technology, the green category as a relative key technology, and the purple category as a lower-key technology.
By further cluster analysis using the GMM algorithm, the 144 technologies were divided into 13 super-key technologies, 60 key technologies, 59 relative key technologies, and 12 lower-key technologies. All the technologies and their classification are shown in Table 3. Among them, k 1 and k 2 of super-key technology are much larger than the others. For key technology, k 1 is greater than 0.3 and most k 2 is greater than 0. Both k 1 and k 2 of the relative key technology are near 0. It is worth mentioning that k 2 of all lower-key  OPTIMIZATION  CDMA  TRANSPORTATION TECHNOLOGY  VEHICLE TECHNOLOGY  SIMULATION  POWER CONTROL  MOBILE COMMUNICATION  GA  CELLULAR NETWORK  TRAFFIC ASSIGNMENT   VEHICLE TECHNOLOGY  CDMA  SIMULATION  OPTIMIZATION  GA  TRANSPORTATION TECHNOLOGY  QOS  FEA  FUZZY LOGIC  EV   VEHICLE TECHNOLOGY  SIMULATION  OPTIMIZATION  TRANSPORTATION TECHNOLOGY  CDMA  FEA  QOS  WSNS  GPS  NN   VEHICLE TECHNOLOGY  OPTIMIZATION  SIMULATION  OFDM  CDMA  MIMO  SCHEDULING  FADING CHANNELS  VANET  TRANSPORTATION TECHNOLOGY   CDMA  SIMULATION  OPTIMIZATION  VEHICLE TECHNOLOGY  TRANSPORTATION TECHNOLOGY  MOBILE COMMUNICATION  FADING CHANNELS  SCHEDULING  GA  MATHEMATICAL MODELLING   CDMA  TRANSPORTATION TECHNOLOGY  POWER CONTROL  SIMULATION  FEA  EVA  MOBILE COMMUNICATION  VEHICLE TECHNOLOGY  OPTIMIZATION  GA   SIMULATION  VEHICLE TECHNOLOGY  TRANSPORTATION TECHNOLOGY  CDMA  OPTIMIZATION  QOS  CHANNEL ESTIMATION  GA  NETWORK TECHNOLOGY  EV   SIMULATION  OPTIMIZATION  TRANSPORTATION TECHNOLOGY  OFDM  VEHICLE TECHNOLOGY  GA  MIMO  SCHEDULING  PREDICTION  FADING   technologies is less than 0, even though k 1 of some lower-key technologies is larger, which shows the importance of citation rate in research hotspots.

Analysis of Key Technology Evolution.
To further explore the evolution trend of key transportation technologies, this paper analyzed the weighted evolution and the cumulative evolution trend, as shown in Figure 10. Figure 10(a) represents the weighted evolutionary trend, calculated by formulae (2) and (4). Figure 10(b) illustrates the cumulative evolutionary trend, calculated by formulae (3) and (4). As shown in Figure 10, there were a few key technologies before 2012, while many technologies had been applied for transportation 20 years ago, being contradictory. Few articles were published before 2012, resulting in the overall research time zone being tilted back when weighted. Figure 10(a) shows that the weighted time zones for key technologies were mainly concentrated in 2014-2018, and the technologies with high volume were mostly "vehicle technology," "optimization," "EV," and "simulation" technologies. e later the weighted time zone is, the later the technology becomes popular, representing the current research hot topics; for example, "digital twin," "blockchain," and "federated learning" appeared close to 2020, indicating that these technologies were the latest of transportation research. e abscissa in Figure 10

Cooccurrence Clustering Analysis of Key Technology.
Based on the above research, the key technologies of transportation were divided into five categories using VOSviewer. As shown in Figure 11, the circle represents the size of degree centrality. e solid line represents the cooccurrence of the two technologies, and the dotted line indicates the same category. e five kinds of technologies are shown as follows: (1) Vehicle technology and control. It contains vehicle technology, electric vehicle technology, automated driving technology, and vehicle control technology, such as optimal control and adaptive control technology. (2) Optimization algorithms and simulation techniques.
It mainly includes optimization, genetic algorithm, particle swarm optimization, integer programming, and simulation technology.
(3) Artificial intelligence and big data. It mainly includes artificial intelligence, deep learning, neural network, big data, data mining, and data analysis.

Conclusion
is paper made a bibliometric analysis of the papers published by 36 authoritative SCI journals in the field of transportation in the last 20 years. In terms of the volume of papers published in journals, it increased slowly from 2001 to 2014, but it had grown rapidly since 2014, and significantly since 2017, it had increased almost exponentially. As for national publication volume, the US was the largest (3,477), followed by China (2,903), nearly three times as many as Canada, the third largest. In the time distribution of research hotspots, some technologies such as optimization and vehicle technology had been popular for nearly 20 years, being the leading technologies applied in transportation. Some technologies, such as CDMA, were getting less and less popular. Some were just the opposite, such as EV. e research heat of some technologies such as OFDM first increased then decreased, with a significant fluctuation. And there were technologies, such as GA, having been fluctuating, but the overall ratio was relatively stable. To distinguish trends in technology, 144 techniques were classified according to the slope range of the linear regression equation by least squares into 60 hot technologies, 46 cold technologies, and 38 other technologies. To further explore the research hotspots, the least squares method was also used to fit the citations to obtain the slope k 2 . e GMM algorithm was used to cluster k 1 and k 2 of each technology. As a result, the 144 technologies were divided into 13 super-key technologies, 60 key technologies, 59 relative key technologies, and 12 lower-key technologies. is paper analyzed the evolution of technology from the perspective of weighted evolution and cumulative evolution. e latest popular technologies in transportation were "digital twin," "blockchain," "Federated Learning," and so on, and "deep learning," "5G," "Noma," and "Edge Computing" were hot technologies with significant publication volume in recent years.
e key technologies of transportation are divided into five categories by cooccurrence cluster analysis: (1) vehicle technology and control class; (2) optimization algorithm and simulation class; (3) artificial intelligence and big data class; (4) Internet of ings and computing class; and (5) communication technology.
is paper classified and analyzed the key technologies in transportation, essential for the application research. It is needed to note that this study did not reveal the evolutionary nature of key transportation technologies, which is beyond the scope of this paper and could be further explored in future research.
Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this study.