A New Method for Identifying Key and Common Themes Based on Text Mining: An Example in the Field of Urban Expansion

Urban land use is a core area of multidisciplinary research that involves geography, land science, and urban planning. With the rapid progress of global urbanization, urban expansion has become a research focus in recent years. Therefore, how to scientiﬁcally and accurately identify key and common themes in the urban expansion literature has become crucial for scientiﬁc research institutions in various countries. This paper proposes a new framework for identifying such themes based on an analysis of scientiﬁc literature and by using text mining and thematic evolutionary analysis. First, the latent Dirichlet allocation algorithm is used to capture the thematic clustering of scientiﬁc literature. Second, the key degree of the thematic node in the thematic evolution transfer network is used to represent the key feature of a theme, and the PageRank algorithm is employed to measure the critical score of this theme. When recognizing common themes, the common features of various themes are digitized and mapped to a specially selected quadratic function to measure the degree of commonness. Finally, the hidden Markov model is used to build a thematic prediction model. This method can eﬃciently identify key and common themes from the literature and provide theoretical and technical support for future research in related ﬁelds.


Introduction
e increasingly drastic land-use changes during the process of urbanization are important factors that affect the global social economy and ecological stability [1,2]. For half a century, many areas in the world have undergone rapid urbanization, thereby resulting in the continuous emergence of large cities and megacities [3,4]. Accordingly, urban expansion has become a research hotspot in the fields of geography, economics, ecology, environmental science, and sociology [5]. Urban expansion refers to an increase in the total area of urban land and the outward development of land use under the influence of economic development, population growth, urban planning, and urbanization [6]. However, due to the lack of reasonable planning and guidance for urbanization, the process of urban expansion is often associated with an excessive demand for urban land and the disorderly development of urban fringe areas [7], both of which have a negative impact on regional economic growth, the layout of production and living spaces, residential types, urban morphology, and ecological environment [8][9][10].
erefore, how to guarantee a continuous demand for urban land for socialeconomic development in the new era and how to clarify the formation mechanism of urban growth in order to reasonably control the urban scale, delimit the urban growth boundary, and optimize the spatial pattern of urban land have become the working emphases of current urban management [11,12]. Scholars have carried out extensive research to clarify those factors that drive urban expansion [13,14], strengthen the development and application of dynamic monitoring technologies for urban expansion [15,16], and accelerate sustainable urban spatial planning [17]. e scientific literature is an important and authoritative knowledge carrier. Using bibliometrics and text mining methods to study the thematic evolution and thematic prediction of the massive urban expansion literature can help trace the development trajectory and grasp the flow of knowledge in the field of urban expansion. In recent years, some scholars have qualitatively combed the findings of urban expansion research from multiple levels, dimensions, and perspectives [18,19]. Some scholars have also used knowledge maps and bibliometric methods to quantify and visualize the results and topics in urban expansion research [20,21]. However, as the number of documents has increased exponentially, the types of these documents have also become increasingly abundant. Accordingly, thematic identification is increasingly being used in big data scientific literature identification. When automatic thematic identification is faced with high data dimensions and complex data types, traditional thematic identification methods may become ineffective.
is article divides themes into key and common themes. Key themes play relatively important roles in the urban expansion field. ese themes have attracted significant concern, are mature, and have development potential in each time window. e evolutionary process of key themes plays an important role in describing the future development of a theme [22]. In addition, in the globalization context, cities have increasingly become the power centers of global social-economic development. Whether in developed or developing countries, the status and role of cities in national development have become increasingly important and have begun to represent countries in a global competition. erefore, urban development and expansion play key roles in the competition among countries in the context of globalization. is article then defines common themes as "themes that have received equal amounts of attention from scholars in developed and developing countries." How to accurately identify themes has always been a challenge in the field of bibliometrics. Statistical methods based on word frequency and co-occurrence frequency are widely used in thematic identification. However, these methods only simulate the literature as a language package and do not fully consider the relationship among themes, and revealing the rich thematic information contained in the literature is not an easy task [23]. e topic model represented by latent Dirichlet allocation (LDA) uses Dirichlet distribution to describe the literature generation process and obtains vocabulary clusters by maximizing the co-occurrence probability of keywords.
is model can avoid parameter explosion and overfitting problems and can effectively extract hidden themes from the literature [24]. However, this method requires predefined empirical values and can only reveal the potential semantic relationships among themes. To solve these problems, scholars have recently examined thematic identification by constructing networks and comprehensively evaluating certain indicators, such as network centrality. For example, by constructing a citation network, Shibata et al. [25] demonstrated the novelty of themes from the time and function dimensions and detected the emerging themes in regenerative medicine. Small [26] and Lee and Choe [27] not only considered the novelty of themes in their identification method but also employed network time series analysis and structural hole theory to measure the characteristics of thematic growth and influence. However, these studies conduct static analysis based on historical literature data, which are unable to reflect the dynamic development characteristics of a network in real-time. erefore, how to construct a dynamic network of themes in the field of urban expansion and how to accurately identify key and common themes that can help scholars achieve breakthroughs in this field are crucial.
Identifying themes is not the end. By expanding research on thematic evolution, this article aims to predict the future development of these themes. Researchers have divided thematic prediction methods into qualitative and quantitative analyses based on different theoretical foundations. While qualitative methods are often limited by subjective judgment, quantitative methods are highly scientific. e most commonly used forecasting methods include the gray forecasting method [28], the life cycle method [29], time series analysis [30], and the neural network forecasting method [31]. However, uncertainty, ambiguity, and randomness are essential phenomena in scientific research, and the above models generally ignore the randomness in the development of technological innovation. e hidden Markov model (HMM) with a double stochastic process can describe the Markov stochastic process of mutual transfer among various themes, reveal the potential evolution path, and provide a basis for predicting future thematic development [32].
is article first applies the LDA model for topic modelling based on the title and abstract of urban expansion literature to obtain a detailed thematic classification. Second, in identifying key themes, the key degree of the thematic node in the thematic evolutionary transfer network is regarded as the key feature of themes. ird, the PageRank algorithm is employed to measure the criticality score of each thematic node in the thematic relationship network. When identifying the common theme, the common features of various themes are digitized and mapped to a specially selected quadratic function to measure the degree of commonality. Finally, by using the HMM, the future development trend of each theme is predicted from the microcosmic angle of the thematic evolution, and a visual display is given.

Data Sources.
In order to ensure the quality and completeness of the sample, this paper selects the Web of Science core collection database for retrieval. Web of Science includes articles, reviews, editorials, letters, and other document types. Considering that the article is more creative and the results are more complete, this article only selects the article for retrieval. e search formula is TI � "Urban expansion" or "Urban extension" or "Urban Growth Boundaries" or "Urban land growth" or "Urban land expansion" or "Urban sprawl" [21,33], and the time span is "1985-2020" (November 4, 2020). A total of 1,933 papers are retrieved, and the bibliographic information is downloaded and summarized in the form of full records (including references). e bibliographic items used in this article include article titles, abstracts, publication years, and reprint addresses, which provide the name, organization, and country information of the corresponding author. Given that international collaborative papers contain information from multiple countries, the corresponding author is used to determine the country of each paper (when the corresponding author is not specified, the first author is used). Each article is attributed to only one country to prevent international co-authored papers from affecting the accuracy of national distinction. After the data deduplication, cleaning, and sorting, a total of 969 documents from developed countries and 1,045 documents from developing countries are obtained.

ematic Extraction Module.
Scholars have investigated the concept of thematic identification by using the topic model. e current mainstream model adopted in thematic identification is the LDA model proposed by Blei et al. [34]. As a text mining method based on unsupervised machine learning, LDA can dig out potential themes from documents while overcoming the shortcomings of traditional methods in calculating text similarity. In addition, the LDA model can express scientific literature in the form of thematic probability vectors, thereby greatly reducing the dimensionality of the literature data and improving the accuracy of text classification and thematic identification. LDA and its improved models have been widely used in text analysis. e output of these models is usually obtained based on the distribution of words under each theme in order to extract high-frequency keywords to describe the themes and achieve excellent thematic classification results [35]. e hidden themes in the urban expansion literature are assumed to follow the distribution where θ dk represents the distribution of theme k in scientific literature d. e thematic term distributions ∅ k ∼ Dir(β) and θ d ∼ Dir(α) are generated for themes k and d, respectively, and the thematic term Z dn ∼ Multinomial(∅ Z dn ) is generated for the n-th term in each literature. erefore, the LDA likelihood model can be described as follows: is paper uses Heinrich's parameter estimation method, where α � 50/k and β � 0.1, and Gibbs sampling to obtain the theme set K � k 1 , . . . , k h and theme attribution set D k � j 1 , . . . , j n of each paper.

Key eme Identification Model.
Based on the connotation of key themes, during the model construction, the thematic evolution is regarded as a hidden Markov process to obtain the thematic transfer network.
ere are two dynamics for thematic evolution in the field of urban expansion research: one is the inspiration of historical research results and the emergence of new ideas in the process of thematic evolution. However, due to the lack of a record carrier, this process is an unobservable hidden sequence; the second is that under the impetus of the first kind of driving force, as the research environment changes and unexpected research results appear, scholars constantly adjust their research thinking and then change the research direction. e professional literature effectively records the research results into an observable sequence. e latter constitutes the microfoundation of the former, and the former is the macroscopic manifestation of the latter. erefore, the thematic evolution in the field of urban expansion can be seen as the superposition of these two processes. is research uses HMM to describe the evolution process of urban expansion theme. By inferring the state transition matrix and the probability distribution of the initial state in the HMM, the confusion and transition matrices between the themes in the evolution of themes are determined, and then the evolution history and future evolution trends of the themes are determined. Afterward, the criticality of these themes is measured based on their network relationship. e PageRank algorithm is then used to calculate the scores of network nodes in the thematic transfer network and serve as the foundation of key theme identification. is process is specifically described as follows: (1) Set the hidden state random transition sequence set of HMM to S � s 1 , . . . , s h , where h is the number of themes generated in the LDA model. Suppose that the hidden state sequence generated by the random process is Q � q 1 , . . . , q t , where q t ∈ S. (2) e probability distribution of the transition state is and j ≤ N, and satisfies α ij ≥ 0, N j�1 α ij � 1, which suggests that during the development of the urban expansion field, the themes will shift from state S i to S j .
. . , O t or the proportion of each theme over the years. (4) e probability distribution of the initial state of the system is π � π i , 1 ≤ i ≤ N , where π i is the occurrence probability of state S i . Given that a higher frequency of theme co-occurrence will facilitate the shift and evolution among themes, this paper uses the thematic co-occurrence matrix as the initial iteration value of the state transition matrix π i . (5) Set the initial value of model training to O � Q � π.
is paper uses the Baum-Welch algorithm [36] to Discrete Dynamics in Nature and Society obtain the following single optimal state transition matrix: where π nm ′ is the probability of transition from theme n to theme m. By extracting all π nm ′ that exceed a certain threshold, a directed graph of the topological relationship between themes in the transfer network can be established. Key theme identification has always been an important research problem in thematic network analysis. PageRank processes the search results of thematic matching based on a web page link analysis. As the most famous web page ranking algorithm, the PageRank algorithm has been widely used to monitor key nodes in various directed, undirected, weighted, or unweighted networks [37,38]. Applying this algorithm to compute for the centrality of thematic network nodes presents a very meaningful research problem.
When calculating the PageRank value of theme i at each moment in a dynamic thematic network, the network topology structure of the current snapshot and the influence of the previous centrality on the existing network should both be considered. One effective method for achieving this goal is network reconstruction, where the previous network topology relationships are weighed into the current network to construct a new network. To describe the dynamic network G, this dynamic network needs to be sampled at different times. e sampling results are then arranged in a time sequence to obtain the time sequence network G � < G 1 , G 2 , . . . , G t−1 , G t , . . . , G n > , where G t represents the sampling result at time t or the snapshot at time t. e analysis of the dynamic network is transformed into the analysis of the sequential network [39].
e parameter α ∈ [0, 1] is used to balance the contribution of the current and previous topologies to the centrality of the network node. e PageRank centrality of node i at time t is treated as the PageRank value of node i in the construction network G t ′ . e key theme in this article refers to any mature theme that has development potential in scientific research. erefore, key themes can easily achieve migrating power in the process of thematic network transfer. Given that the G score can measure the importance of nodes in the process of directed network migration, this paper takes the standardized score G ′ as the critical score for each theme.

Common eme Identification Model.
To identify the degree of commonness of themes, the selected model should be able to measure the common skewness of different themes in the field of urban expansion. Skewness refers to the numerical characteristics of the asymmetric degree in the statistical data distribution [40][41][42]. Common skewness in this article refers to the measurement of the direction and degree of skewness of each thematic distribution. Let k ∈ K � k 1 , . . . , k m , and theme k corresponds to the number of documents p k . A type documents (developed countries) correspond to a k themes, and B type documents (developing countries) correspond to b k themes. Let p k � a k + b k , and define U ka and U kb as the common skewness of A and B type documents in theme k as follows: where U ka + U kb � 1. Formulas (4) and (5) eliminate the influence of the number of A and B documents on common skewness to prevent the difference in the number of documents from affecting the calculation of the co-occurrence degree of themes. When the common skewness is U ka � U kb � 1/2, that is, theme t comprises the themes of A and B type documents, such skewness indicates the highest degree of commonness. By mapping U ka and U kb to the [0, 1] range of the inverted quadratic function, the common function can be monotonized. Without loss of generality, this quadratic function relationship is set to By learning C ka � C kb from the symmetry of the quadratic function, C k � C ka � C kb can be obtained. e C k function is then used to measure the degree of commonness. e highest and lowest degrees of commonness are measured when C k � 1 and C k � 0, respectively. e logic structure is shown in Figure 1.

Data Preprocessing and ematic Extraction from the Urban Expansion Literature.
e preprocessing work in this article mainly involves word segmentation, removal of stop words, root restoration, and marked information removal. In view of the language characteristics of English articles, the words in a text can be directly divided by spaces and punctuation. Removal of stop word removes those words that do not provide useful information for the text analysis, such as auxiliary words, pronouns, conjunctions, and adverbs. According to the characteristics of the collected urban expansion literature, this article expands the stop words to include some additional words (e.g., "data," "study," and "use") that are unique in the field and appear frequently yet have no effects on the experimental results. Root restoration restores words to their corresponding roots. After such processing, the number of feature items in the sample set can be greatly reduced, and the efficiency of thematic extraction can be improved. emes are abstract concepts, and the number of themes in the corpus can be quantified by dividing them into different granularities. e number of themes in the LDA model should be specified in advance. A larger corpus corresponds to a greater number of themes, and such number dynamically changes across different time windows.
is article uses perplexity to determine the optimal number of themes [43]. Perplexity gradually decreases along with an increasing number of themes. A lower perplexity corresponds to a stronger generalization ability and better performance of the model.
By calculating the perplexity of each theme, the optimal number of themes in the LDA model employed in this work is 29. Experts in the field of urban expansion have read the sample of thematic classification literature and observed a relatively high accuracy (with a classification error rate of less than 3%). e boundaries between the themes are clear, and the division effect is ideal. For ease of reference, these themes are named based on keywords (Table 1).

Identification of Key emes in the Urban Expansion
Literature.
e confusion matrix in the HMM indicates the possibility of transforming a hidden state into an observable state.
is probability in turn can measure the threshold barriers for the transition of 29 themes in urban expansion research and can characterize the direction and extent of thematic evolution. e dark (light) squares in the confusion matrix heat map represent those themes that are easy (difficult) to transfer in the innovation process ( Figure 2). Most themes in the field of urban expansion show limited movement, and the thematic evolution is relatively stable. Varying degrees of transfer possibilities are also observed among different themes. To highlight the transfer relationship among these themes, this paper draws a confusion relationship network diagram (Figure 3) where the direction of arrows indicates the direction of thematic transfer. Figure 3 shows that certain themes, including themes 26 (temperature), 15 (urban agglomeration), and 11 (economic development), have a high proportion of transfer inflow and a small proportion of transfer outflow. ese themes are identified as core themes in the field of urban expansion. erefore, transfer inflow and outflow are important manifestations of the criticality of a theme. To measure such criticality, this paper uses the PageRank link analysis algorithm, which obtains the critical evaluation of each node based on PageRank scores. A higher score corresponds to a higher criticality of a theme. e results are shown in Table 2. Table 2 shows the key themes in the field of urban expansion, including themes 26 (temperature), 15 (urban agglomeration), 11 (economic development), 13 (housing development policy), 17 (surface change), and 9 (population density), of which temperature is the most critical. One obvious feature of urban expansion is the continuous increase in the area and density of various buildings in urban construction, which leads to the transformation of many  natural surfaces into impervious surfaces. e changes in the type and spatial structure of land cover affect the storage and transmission of surface temperature, thereby generating urban heat island effects [44]. Using remote sensing technology in analyzing surface thermal infrared information makes the result of urban spatial temperature distribution more accurate than the traditional calculations based on surface meteorological data. erefore, such information provides a reliable basis for quantitatively studying the spatial distribution of urban thermal environments [45]. Studies on the relationship between urban expansion and surface radiant temperature based on remote sensing technology are of great significance for improving urban thermal environments.
Scholars have also investigated those factors that drive urban expansion and find that economic development (theme 11), population density (theme 9), and housing development (theme 13) have important effects on urban expansion [46,47]. Urban expansion and economic development conform to the Kuznets curve. During the initial stage of urbanization, economic development requires the development of a large amount of construction land and infrastructure land, thereby resulting in the outward expansion of cities. However, with the adjustment of the industrial structure, the improvement of infrastructure, and the increasing intensiveness of land use, the rate of urban expansion will decline [48]. Moreover, urban land is the main place that supports human life, work, and study. An increase in the urban population will inevitably increase the pressures on housing, transportation, and public facilities. erefore, the demand of the urban population for space will generate momentum for urban expansion. For example, by studying the law of urban expansion and population growth in the metropolitan regions of the USA, Marshall [49] found that the average land area needed to support a new urban population is twice larger than the per capita land area of the existing city. Moreover, due to the agglomeration economy  6 Discrete Dynamics in Nature and Society of sharing, matching, and learning in urban areas, enterprises and laborers are constantly attracted to these areas. However, the urban space is limited, and the constant gathering of the labor force has increased both housing prices and living costs. People are also forced to settle further away from the city center and pay high commuting costs. When the costs of living and commuting are high enough, these laborers will move elsewhere due to the low net utility   of living in urban areas. In this case, the government invests in the conversion of land into urban transportation infrastructure [50]. By substituting commuting and housing costs [51], the negative impact of rising housing costs is weakened, thereby facilitating a continued urban expansion.

Identification of Common emes in the Urban Expansion
Literature. Based on the abovementioned thematic distribution, the proportion of each theme in the documents of developed and developing countries after unitization is calculated, and the degree of commonness of these themes is measured using formulas (4) and (5). e above results are then used to plot the degree of commonness of each theme in a graph as shown in Figure 4. e red and blue bars indicate the proportion of relevant documents in developing and developed countries after unitization, respectively, whereas the folding line indicates the degree of commonness of themes. e common themes in the field of urban expansion include themes 16 (green space), 26 (temperature), 4 (urban planning management), 2 (spatial pattern), 18 (coastal urban), and 5 (scenario prediction). With the transformational improvement of research data and technical research methods over the last few years, the available methods for urban expansion research have further expanded to scenario prediction [52], 3S spatial analysis [53], spatial econometrics [54], cellular automata [55], and multiagent simulation [56].
Using these methods to explore the spatial-temporal pattern distribution of urban expansion and effectively describe, simulate, analyze, and predict the process of urban evolution can provide decision-making support for urban planning and management. In addition, urban expansion research in developed and developing countries has mainly focused on coastal urban areas [57,58] because compared with other cities, coastal cities have unique geographical locations and resource advantages, and urban expansion is highly susceptible to economic development, land-use policies, and regional development policies. Some significant differences in future land-use change are also observed under different development strategies. To expand living and development spaces, coastal areas are reclaiming land from the sea to address the increasingly serious problem of scarcity of land resources [59]. Reclaiming land from the sea is a large-scale human process that greatly disturbs the geographic processes of coastal zones. On the one hand, such land reclamation can increase food supply, attract more investments, and provide a new development space for urban areas. On the other hand, this reclamation can also reduce the service functions of marine ecosystems, destroy the ecological security of bay landscapes, result in marine sedimentation, degrade the quality of marine environments and habitats, and reduce coastal biodiversity [60,61]. erefore, how to protect coastal zones during their development has become a research hotspot. By combining the aforementioned key and commonness indices, a key and commonness bubble for themes in the urban expansion field can be drawn (see Figure 5). is bubble chart is divided into the following quadrants based on the mean values of key and commonness: high degrees of key and commonness (first quadrant), high degree of key and low degree of commonness (second quadrant), low degrees of key and commonness (third quadrant), and low degree of key and high degree of commonness (fourth quadrant). e first quadrant has eight themes, namely, themes 26 (temperature), 15 (urban agglomeration), 11 (economic development), 17 (surface change), 4 (urban planning management), 7 (urban sprawl), 27 (urban carbon), and 19 (transportation emission), of which themes 7 (urban sprawl) and 4 (urban planning management) have more documents than the median. In other words, these themes have received much attention in urban expansion research and are considered key research directions in this field. e connotations of urban sprawl include the following: (1) urban sprawl is a unique way of urban growth that usually occurs when the land development rate exceeds the population growth rate; (2) urban sprawl is characterized by low density, fragmentation, unsustainability, single-form development, excessive reliance on motor vehicles, and massive consumption of agricultural and ecological lands [62]; and (3) urban sprawl has a series of negative effects on traffic flow, plant and animal habitat, the ornamental nature of natural landscapes, and water circulation mechanisms [63,64]; An in-depth study of urban sprawl has resulted in the formulation of three main theories in the field of urban expansion, namely, compact city theory [65], smart growth theory [66], and new urbanism theory [67]. Urban sprawl control methods can also be divided into two categories. e first category includes the urban planning methods that are implemented by the government and have attributes of administrative orders, such as urban growth boundary, zoning, planned unit development, transfer of development rights, traditional neighborhood development, and transitoriented development [12,68,69]. ese measures are based on the best spatial structure and scale of urban areas and directly affect the development decisions of landowners and developers. e second category includes guided regulation measures that are based on market orientation, including land development, fuel, property, and split-rate taxes. ese measures do not compulsorily regulate the behavioral choices of people and have indirect control over the urban sprawl. In curbing urban sprawl, the pure market mechanism has a very limited influence on the development of compact cities. erefore, the government needs not only to formulate various urban sprawl control measures but also to ensure that the relevant policies match the legal and political environment while restraining rapid urban expansion [70,71].

Forecast on ematic Evolution in the Field of Urban Expansion.
is paper uses 2020 as the forecast base period and imports the confusion matrix parameters into the HMM module to obtain the hidden Markov forecast results for the evolution of themes in the urban expansion literature from 2020 to 2025 ( Figure 6). e proportion of landscape patterns in the prediction results has rapidly increased from 3.09% to 5.14%. e natural landscape is an important environmental resource in the urban ecosystem that has significant ecological and social functions. Meanwhile, rapid urban expansion is a process in which man-made landscapes gradually erode, occupy, and transform natural landscapes, including forest land, cultivated land, lakes, and grassland, under the influence of human disturbance. erefore, rapid urban expansion not only reduces the natural landscape area but also results in the fragmentation of natural landscape patterns. A landscape tends to be a complex, heterogeneous, Discrete Dynamics in Nature and Society and discontinuous patch mosaic from a single, homogeneous, and continuous relative whole [72,73]. e fragmentation of the urban landscape not only reduces the quality of the living environment of residents but also seriously endangers the urban ecosystem and urban sustainable development. erefore, quantitatively identifying the urban landscape based on a remote sensing index (e.g., vegetation, impervious, and water indices) and exploring the responses of natural landscapes to urban expansion have become important ways of understanding the ecological effects of urban landscape evolution [74,75] and provide valuable references for regional urban planning and ecological construction.
Agriculture land change remains the main direction in urban expansion research. e cultivated land occupied by urban expansion faces not only a decreasing quantity but also changes in its quality. ose areas that surround cities have excellent conditions, topography, water conservancy, and transportation. Urban expansion often encroaches on high-quality cultivated land [76] and affects cultivated landuse intensity in two ways. On the one hand, urban expansion easily results in the scarcity of cultivated land resources. e intensity of cultivated land increases along with the continuous growth of population and demand for food. In addition, the rapid increase in the degree of intensification of agricultural production also promotes the application of chemical fertilizers and pesticides per unit area of cultivated land, thereby bringing agricultural nonpoint source pollution and ecological damage to the environment [77]. On the other hand, the increasingly open labor market promotes the transfer of agricultural labor and consequently reduces or abandons agricultural labor input. After the abandonment of cultivated land, the natural succession of farmland ecosystems destroys species habitats and degrades traditional agricultural landscapes with a high conservation value [78]. In addition, some species that live in the farmland system, especially birds and arthropods, will begin to disappear. e natural succession after abandonment also homogenizes the vegetation on abandoned land, thereby increasing the risk of fire and reducing biodiversity by promoting the growth of pyrophytes [79]. erefore, studying the evolution of the spatial-temporal pattern of cultivated land occupied by urban expansion can provide technical support and a decision-making basis for handling the relationship between  urban expansion and cultivated land protection and for scientifically coordinating urban development. Examining such evolution also has important practical significance in realizing sustainable land use.

Conclusion
Our study combines the LDA topic model with HMM to develop a new method for identifying key and common themes from the urban expansion literature. is method overcomes the subjectivity of traditional methods. By applying text mining in a large number of studies in the field of urban expansion, an accurate thematic classification can be achieved, and the identified themes meet the empirical expectations. is study provides theoretical and operational support for identifying key and common themes in the field of bibliometrics.
To study the development trends in the field of urban expansion, this paper divides the scientific literature into 29 themes. By considering both the critical score and degree of  Discrete Dynamics in Nature and Society commonness, a total of eight important themes for developed and developing countries are identified, of which six themes (i.e., temperature, urban agglomeration, economic development, surface change, urban carbon, and transportation emission) have documents less than the median number. Future works should focus on these themes in light of the practical problems being faced in the urban expansion field. e key and common theme identification methods proposed in this paper have good clustering effects, clear thematic boundaries, and accurate recognition results, all of which fully demonstrate their effectiveness and practicability. Future research may consider increasing the scope of the literature collection and including multisource heterogeneous documents to achieve a more comprehensive identification of key and common themes. However, this article also has shortcomings. e data only come from the core database of Web of Science, so the comprehensiveness of the data cannot be guaranteed. is may have a certain impact on the accuracy of the analysis results. erefore, in future research, various databases should be combined to broaden the data sources in order to more accurately identify the key themes and common themes in the field of urban expansion.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this study.

Authors' Contributions
Yanwei Zhang and Xinhai Lu contributed equally to this work and share the first authorship.