The Temporal Spatial Dynamic of Land Policy in China: Evidence from Policy Analysis Based on Machine Learning

Extracting useful information from a large number of policy texts is a challenging and insufciently discussed topic. Utilizing large sample policy texts and a method of machine learning, this study contributes to the research gap by systematically analyzing the temporal evolution and spatial diferentiation of China’s land policy from 1998 to 2018. A framework comprising six major themes of land policy, namely, “land development, land acquisition and demolition, cultivated land protection, land planning, land consolidation and utilization, and land confrmation and transfer” is frst established, according to the theoretical and institutional background of land management. Based on this framework, the Latent Dirichlet Allocation analysis of more than 20,000 policy documents at diferent levels of government shows that, (1) temporally, the priority of land policy evolves with the spirit of the central document and the macropolitical and economic conditions and, (2) spatially, there are signifcant diferences in land policies among provinces. Overall, the analysis of land policy documents shows the tradeof between cultivated land protection and land development and also the emphasize on other topics, with the changes in land policy priorities in diferent periods and regions.


Introduction
Over the past 40 years of reform and opening up, China has witnessed rapid economic growth and urbanization. According to the National Bureau of Statistics of China, the urbanization rate increased from 17.92% in 1978 to 60.60% in 2019. China's unique land system has played a crucial role in its industrialization and urbanization. For example, George and Samuel pointed out that the liberalization of land rights initiated industrialization from 1981 to 1994, and land-based development has promoted urbanization [1]. Wang and Lu argued that the "land expropriation-land transaction" model determined by the dual urban-rural land structure has been an important impetus for China's rapid urbanization [2].
China's land policy has undergone continuous evolution since the Tird Plenary Session of the 11th Central Committee of the Communist Party of China (CPC), which initiated the household responsibility system (HRS). For example, the Constitution of 1982 stipulated that "no organization or individual may occupy, sell, lease, or illegally transfer land in any forms." However, the frst public auction of land in Shenzhen in 1987 prompted an amendment to the 1988 Constitution to allow the transfer of land use rights. On the contrary, land development in the 1990s led to a sharp decline in arable land, and in 1998, the Land Administration Law was amended to emphasize the protection of arable land. A recent case is the latest Land Administration Law (2020), which emphasizes the reform of the land expropriation system and the protection of farmers' rights and interests. Tose cases imply that land policies have been changed over time. However, what is the evolution trend of land policy in China? What is the driving force for policy evolution? What is the future of China's land policy?
Existing studies have made some attempts at these questions, mainly by the analysis of land policies. For example, Wang et al. used 192 central-level policy texts on idle land management from 1992 to 2015 to conduct an econometric analysis and explored the policy tools used by the central government in managing idle land issues [3]. Lv et al. used 59 relevant policies as samples to analyze the evolution process of collective construction land transfer policies [4]. Tose studies help to understand the evolution logic of China's land policy, yet there are also limitations. One of the major limitations is that those studies mainly concentrates on a limited number of policy texts of the central government or local governments of certain region, which limits the application of analysis results and may lead to the problem of selection bias. Terefore, a larger number of policy texts with a wider coverage are needed for the analysis of land policies.
Based on this requirement, we collected all land-related policy documents issued by all levels of government in China from 1998 to 2018 (N � 22,659) and analyzed them with the Latent Dirichlet Allocation (LDA) model. With the policy analysis, this study explored the evolution process of China's land policy over the past two decades, including the changes in policy priorities and the spatial diferentiation of land policies in China.
Te remainder of this study is organized as follows. Section 2 introduces the institutional background and lays the foundation for the topic classifcation in the following sections. Section 3 introduces the data and study methods used in this study. Section 4 presents the machine learning results based on the theoretical and institutional background. Section 5 further analyzes the characteristics of the temporal and spatial evolution of China's land policy. Section 6 presents the conclusion and discussion. Te abbreviations in this study are all listed in Table 1.

Theoretical and Institutional Background
2.1. Literature Review. Studies on government policies are important for their theoretical signifcance as well as their practical meaning [5,6]. For example, based on studies on environmental protection policies, Bao et al. fnd that the prices of battery electric vehicles are more dependent on government subsidy than fuel vehicles [7]. It is also approved that government subsidies and carbon emission reduction have an efective infuence on the development of electric vehicles [8][9][10]. However, policies of government subsidies can also have a negative efect, for instance, on the supply chain system's stability [11]. Furthermore, policies are found to have a signifcant impact on frms' production behavior [12,13], economic returns [14], and even the fuctuation of international markets [15]. Tese empirical results all suggest the importance to extract information from policy text and to evaluate the efects of policy.
Terefore, it is essential to study land policy texts, to understand the evolution of China's land policy and land system. However, due to the multilevel administrative system in China, the number of policy documents issued by the multilevel governments is very large, making it impossible to manually conduct textual analysis. Tus, previous studies on the analysis of policy texts have mainly focused on some important policy documents. For example, Wang et al. used 192 central-level policy texts to explore the policy tools for managing idle land issues [3]. Lv et al. used 59 relevant policies to analyze the evolution process of collective construction land transfer policies [4]. Te manual screening of a limited number of policy documents may lead to the problem of "seeing trees but not the forest" or falling into the trap of selection bias. Moreover, previous studies mainly focus on the policy documents issued by the central government, with little notice of the spatial diferences in policy priorities in diferent regions. Terefore, it is necessary to include a larger number of policy texts with wider coverage, which calls for more advanced text analysis methods.
Te core challenge of text analysis is how to extract the required information accurately and classify and analyze them efciently. Machine learning is widely used for the purpose of information extraction, classifcation, and analysis. For example, modifed region growing algorithm and adaptive genetic fuzzy classifer are used in the process of noise removal, segmentation, feature extraction, and recognition, which are aimed to extract and recognize sign gesture language to facilitate gesture-based communication [16,17]; to enhance communication efciency, a federated communication framework named TKAGFL is proposed to deal with the problems of updates' strategy and data heterogeneity, which is expected to beneft the application of federated learning in the industry [18]. Drawing on these studies, we use the method of machine learning to analyze land policies and to sum up the evolution logic of the land system in China.

Institutional Background of Land System in China.
Under the dual land ownership system, the land system in China is an important part of linking urban and rural development and a necessary basis for coordinating urbanization, industrialization, and agricultural stability [19,20]. Over the past 20 years, land use and development, especially the large-scale transformation from agricultural land to nonagricultural land, have provided the necessary conditions for rapid economic and social development [21]. At the same time, conficts, such as the decline in arable land, mismatches in land supply structure, and land requisition conficts, have prompted a series of changes in the land system.
Te year 1998 was a milestone of land system reform in China. Te second revision of the Land Administration Law basically laid the foundation for the current land system in However, land urbanization has also led to a contradiction between "preserving rice bowls" and "promoting development." Due to food security concerns, the Land Administration Law in 1998 established the basic national policy for the protection of arable land in the form of legislation for the frst time. Te frst policy statement released by the central government in 2004 clearly requires governments at all levels to implement the strictest protection of arable land. According to the 2007 Government Work Report, 1.8 billion mu of arable land is an insurmountable red line in China. Te protection of arable land has always been the top priority of land policy [24].
Land-driven industrialization and urbanization have brought about rapid economic development. However, problems such as overexploitation, inefcient land use, and irrational structures have emerged in this process. In the past decade, the supply of state-owned construction land has increased rapidly, reaching a peak of 730,000 hm 2 in 2013. Such an immense amount of land supply is accompanied by the inefcient use of land. Te total amount of construction land approved by the central and provincial governments from 2012 to 2016 was 1.97 million hm 2 , and of this total, the amount of land on which construction had not been started or completed on schedule accounted for nearly one third (the data source is the "China Land and Resources Statistical Yearbook" and "China Land and Resources Bulletin" over the years). Such a background has prompted the government to strengthen regulations related to land planning, consolidation, and utilization. Te central government has issued a series of land policies to regulate land development and agricultural land conversion. Te year 2014 was a turning point for urbanization in China over the past 20 years. Te National New Urbanization Planning for 2015-2020 clearly points to the need to shift from land urbanization to population urbanization. Correspondingly, promoting the coordinated development of urban and rural areas has become the topic of the land system. Rural development relies on land development, and the confrmation and transfer of land rights are inevitable requirements of rural land development [25][26][27]. Accordingly, the frst policy statement released by the central government in 2013 proposed to complete the confrmation, registration, and certifcation of rural land contractual management rights within fve years and that of 2014 allowed rural collective construction land to be sold, leased, and pooled as shares have equal access to the market and with equal rights and prices as state-owned land. At the same time, the rural homestead system was reformed to promote the mortgage, guarantee, and transfer of housing property rights. How to stimulate the vitality of rural land through the reform of property rights has become an important issue in land management in China.
In summary, for the last two decades, urbanization in China has undergone a transformation from land urbanization to population urbanization, and the land system has also been transformed from urban development to urbanrural coordinated development. In this process, the land policy has been constantly adjusted with a focus on economic and social development and the contradictions that emerged during this period. Based on a simple review of the institutional background, this study identifed six core topics of China's land policy in the past 20 years, i.e., (1) land development, (2) land expropriation and demolition, (3) farmland protection, (4) land planning, (5) land consolidation and utilization, and (6) the confrmation and circulation of rural land rights.

Machine Learning for Text Documents' Classifcation.
Along with the transfer of text information from paper media to Internet media, the cost of text data collection and transmission is greatly reduced, which provides application scenarios for Natural Language Processing (NLP). Te core challenge of text analysis is how to extract the required information from the text accurately and efciently. To achieve this, the computer is required to analyze and process language like a human being. One of the most important part is to classify the text. Generally speaking, machine learning algorithms can be divided into three categories. Tey are supervised, semisupervised, and unsupervised methods. In this section, we briefy introduce some algorithms widely used, including Naïve Bayesian algorithms, support vector machines (SVMs), K-nearest neighbor (KNN), and neural networks.

Naïve Bayesian Algorithms.
Te Naïve Bayes classifer is a simple probabilistic classifer, which applied Bayes' theorem with strong independence assumptions. Based on independence assumptions, the order of features is irrelevant; therefore, the existence of one feature has no infuence on other features in the classifcation tasks [28]. With these over simplifed assumptions, Naïve Bayes classifers have been proved to work unexpectedly well in many complex real-world classifcation applications [29,30].
Below is the expression of the Naïve Bayesian algorithm (equation group (1)): (1) A tiny amount of training data is needed to estimate the parameters for classifcation, which is an obvious advantage of the Naïve Bayes classifer. In addition, the Naïve Bayes classifer is proved to perform better on numeric and textual data, and the computation is easier and more efcient than other algorithms. However, the defects are also obvious. With real-world data, the conditional independence assumption will be violated. And it works poorly when features have high correlation and the frequency of a word is neglected; therefore, its applicability is seriously limited.

Support Vector Machines (SVMs)
. Support vector machines (SVMs) are a discriminative classifcation method, which is based on the structural risk minimization principle [31]. Te core of this principle is to ensure the lowest true error by fnding a hypothesis, which makes the SVMs more accurate. SVMs are used to fnd out the linear separating hyperplane which maximizes the margin between two datasets, i.e., the optimal separating hyperplane (OSH). Te key lies in the calculation of the margin, which is based on the construction of two parallel hyperplanes, as shown in equations (2) and (3). Te margin is on each side of the separating hyperplane, which is "pushed up against" the two datasets. Te generalization error of the SVMs decreases with the increase in the margin. And to increase the margin, the hyperplane is required to have a larger distance to the neighboring data points of both classes.
We maximize the margin as follows: Introducing Lagrange multipliers α and β, the Lagrangian is as follows: Te SVMs is prominent for its classifcation efectiveness [32,33], which makes it very suitable for theoretical understanding and analysis [34]. And also, it performs well on documents with high-dimensional input space; most of the irrelevant features in the documents are weeded out. However, the training and categorizing algorithms of the SVMs are more complex than other methods. Besides, in the training and classifying stage, more time and higher memory consumptions are required. Furthermore, as the similarity is typically calculated for each individual category, it may lead to confusions when documents are notated to several categories in the classifcation.

Neural Networks.
Artifcial neural networks are structured by a large amount of elements named as artifcial neuron. Compared with the elements of traditional architectures, artifcial neuron has larger input fan order of magnitudes [35,36]. Besides, they are made more sensitive to store items and more suitable for distortion tolerant storing and, therefore, can store a greater number of items displayed by high-dimensional vectors. Artifcial neural networks interlink those neurons into groups with a mathematical model of information processing, as shown in equation (4). In this way, artifcial neural networks have some obvious advantages. Te main advantage is to perform well in complex domains, on documents with high-dimensional features and also on noisy, contradictory, discrete, and continuous data. Besides, a parallel computing architecture is employed to provide linear speed up in the matching process of computational elements. In such process, the input value of each element can be compared with the value of stored cases. However, the drawbacks are also obvious. Tough the testing is very fast, the training is slow. And for users, learned results are more difcult to comprehend than learned rules. Also, empirical risk minimization (ERM) enables artifcial neural networks to minimize training error, yet it may result in overftting.
For pattern p, the output from neuron j is O pj :

Data Source and Brief Description.
Te data in this study were collected from the "Faxin" online platform (https:// www.faxin.cn/) (more detailed introduction could be found in: https://www.faxin.cn/). Te "Faxin" online platform was established and maintained by the Supreme People's Court in 2012. After several years of development, it has become an advanced digital network platform that deeply integrates legal knowledge services and big data services for cases in China, including more than 1.4 million laws, regulations, and policy documents.
Using the search engine provided by "Faxin," all landrelated regulations from the central and local governments (hereafter referred to as policy documents) from 1998 to 2018 in China were collected, accounting for a total of 22,659 documents. Among the 22,659 policy documents, 1,262 were issued by the central government, and the remaining 21,397 were issued by local governments. Figure 1 shows the distribution of land-related policy documents in the past 20 years. Obviously, the overall number of land-related policy documents grew signifcantly, indicating that governments have increasingly paid attention to land management issues.

Study Methods.
As mentioned above, the main task of this study is to classify more than 22 thousand of land policies in China according to their topics. Basically, text classifcation is a process of clustering, which groups a collection of objects into subsets or clusters that share similar topics [37]. In the last decades, the feld of machine learning has emerged as plenty of algorithms in text mining and text clustering [38]. For example, Avalos reviewed a series of text mining methods, such as real-time data text mining based on a gravitational search algorithm, clustering approach using a combination of a gravitational search algorithm, and kharmonic means [39]. Recently, Zablith and Osman propose a novel predictive analytics framework in the work of unstructured text classifcation and analysis [40]. In the existing studies, these methods have been widely applied in engineering management [41] and in bibliometrics [42]. Combining the consideration of robustness and accuracy of text mining [43,44], this study used the Latent Dirichlet Allocation (LDA) model to analyze 22,659 policy texts. In the feld of machine learning, the LDA model occupies a very important position in topic models and is often used for text classifcation. Te essence of the LDA model is a Bayesian probabilistic model that contains a three-layer structure of the corpus, topic, and word. In this model, each word in a corpus is considered to select a certain topic with a certain probability, and a certain word is selected from this topic with a certain probability. Te corpus, topic, and word follow the Dirichlet distribution. In the LDA algorithm, a corpus represents a probability distribution composed of some topics, and a topic is a probability distribution composed of many words. Te text clustering results generated by the LDA model can reveal the keywords and specifc probabilities of each topic, and researchers can interpret the meaning of the corpus accordingly [45]. Terefore, compared to manually reading and understanding a policy text, the LDA model can efciently and accurately help researchers identify the topics of the corpus.
In brief, the LDA model assumes a hierarchical structure among words, topics, documents, and corpus. Te documents and the words could be observed, but a latent structure of topics, topic distributions per document, and word distributions per topic exist [46]. Terefore, LDA could be viewed as an approach where multiple words are estimated to be associated with a few latent topics. In more formal terms, LDA is a model based on observed variables (words) and hidden variables (topics) that defne a joint probability distribution. Te joint probability distribution is then used to calculate, according to Bayes rule, a conditional or posterior distribution of the hidden variables given the observed variables [47].
Te more formal and mathematical notation could be presented as follows. A wor d is denoted as w, and a document is a collection of N words, w � (w 1 , w 2 , . . . , w n ), where w n is the n th word in the document. It is worth noting that the ordering of words is not important since the LDA model assumes the approach of a "bag of words," in which the co-occurrence instead of the ordering of words is used to identify the underlying topics. Finally, a corpus is a collection of M documents denoted as D � (w 1 , w 2 , . . . , w M ). Following this notation, the generative process-the assumed process that generated the documents, topics, and words-can be described as follows [39]: Mathematical Problems in Engineering (iii) For each of the N words w n , (i) choose a topic z n ∼ Multinomial(θ) (ii) choose a word w n from p(w n |z n , β) and a multinomial probability conditioned on the topic z n θ is used to measure the topic proportions, which are the sum of the probabilities for each topic in the dth document. In the theoretical issue defnition model, θ was the summation of the issue dimensions weighted by salience and that concept is operationalized as the topic proportions estimated by LDA [46]. Tese proportions are drawn from a Dirichlet prior, θ ∼ Dir(α) where α is the Dirichlet distribution's shape parameters. Te number of topics K, and by extension the dimensionality of Dir(α) and the topic variable z, is assumed a priori and is also assumed as fxed. Note that a proportion is estimated for each of the K topics within each document. Te standard Dirichlet prior of α is 50/K. Te possibility of density is presented as Te distribution of words is parameterized by β. As noted, the observed and latent variables form a joint distribution that, given the parameters α and β, topic mixture θ, a set of N topics z, and a set of N words w can be expressed as follows: Tis joint distribution is used to calculate a posterior distribution of topic probabilities for each document, as expressed in the following: Te numerator is the joint distribution of all random variables, and the denominator is the probability of obtaining the observed corpus under any topic model. Tese probabilities could be summed for overall possible topic structures. Terefore, this needs to be approximated using either sampling-based or variational approximations. Gibbs sampling is used to estimate the posterior distribution. By doing this, LDA could generate a cluster of words. Te whole process of the LDA model could be summarized in Figure 2. And in the feld of policy text analysis, researchers could infer the topics of each policy according to the co-occurrence of words, which is shown by the logic presented in Figure 3 [46].
Before using the LDA model, more than 20,000 policy documents need to be segmented. Tis study uses the Jieba library in Python to divide sentences into individual words (the Jieba library integrates two word segmentation methods based on rules and statistics, which efectively improve the accuracy of word segmentation. A more detailed introduction of the Jieba library could be found at https://pypi.python.org/pypi/jieba/. Based on the Chinese stop word list from the Harbin Institute of Technology, the relevant words in this study were updated to remove the stop words with no actual connotation (the stop vocabulary list of the Harbin Institute of Technology contains 1893 words with no actual connotation but a very high frequency, such as "oh" and "is." Including these words in the word, frequency statistics will obviously afect the accuracy of the results. So, we need to eliminate them.). Next, the term frequency-inverse document frequency (TF-IDF) algorithm was used to optimize the word segmentation results, and the words that frequently appeared but had little impact on the  topics of the corpus were excluded. After these progress, the contents of the 22,659 policy texts were transformed into 189,582 important keywords, which laid the foundation for the construction of the LDA model. For the number of topics, this study frst sets the number of topics to 15 (setting the number of topics to 15 is a result of the strategy of "more is better than less." Tat is, too many topics can be combined through the researcher's understanding, while too few topics may miss some important topics. In fact, as shown in Table 2, our preset number of themes is indeed too much: out of 15 themes, there are 2 themes whose meaning cannot be determined, and the content of 4 themes is repeated.)

Results of the Topic Model-Based on the Theoretical and Institutional Background
In this section, results of the topic model are presented based on the theoretical and institutional background of land policy in China. By those results, the keywords in the policy documents can be ftted into several topics; then, the proportion of each topic in all documents can be calculated based on the frequency of keywords. Tose are the foundations for the analysis of the key point and the temporal and spatial evolution of China's land policy in the next section. First, we present the keywords with a word cloud, which is obtained with word segmentation and word frequency statistics. As shown in Figure 4, the larger the font size of the word in the word cloud, the higher the frequency of the word in the policy texts. In addition to the word "land," the two most frequently used words are "construction" and "cultivated land," indicating that a large portion of the land-related policy documents deals with the contradictions between "promoting development" and "preserving rice bowls." Other high-frequency words include "levy," "right to use," and "planning," indicating that issues such as land expropriation, land ownership, and land planning also frequently appear in policy documents. Tose keywords and their frequency can roughly refect the focus of land policies.
Second, the ftting results of feature words and topics by the LDA model are shown in Table 2. Among the 15 preset topics, 13 have clearly determined feature words corresponding to the topic content. Taking topic 1 as an example, according to words such as "agricultural land," "arable land," "conversion," "approval," and "reclamation," it is easy to understand that this topic discusses a series of policies for the protection of arable land in the process of agricultural land conversion, which belongs to the topic of cultivated land protection. Te ftting results of the two topics (No. 14 and 15) are unclear. Terefore, they are excluded from the subsequent analysis. With these ftting results, the key points of policy documents can be easily analysed by the classifcation of topics.

N.A.
Data source: the author compiled the study based on the results of the LDA model of the 1998-2018 land policy text. 8 Mathematical Problems in Engineering (e.g., topics 6 and 10). Such a classifcation may lead to confusion in the subsequent analysis. Terefore, based on the analysis of the theoretical and institutional background in Section 2, the 13 topics were reclassifed, and the results are shown in the last column of Table 3. After reclassifcation, the 22,659 policy documents can be summarized into six major topics that match the theoretical analysis. In the following sections, the six topics are used to analyze the evolution and spatial diferentiation of land policies. Tis reclassifcation shows an understanding of the background of the land system in China. And lastly, the number and proportion of each topic are shown in Table 3, which visually displays the key points of land policy documents. Clearly, the proportion of policy documents related to the topics of arable land protection and land development accounts for more than 50%. Te proportions of land planning, land consolidation and utilization, and land expropriation and demolition all account for approximately 10%.

Te Temporal Evolution and Change Logic of China's Land
Policy. To explore the characteristics of the temporal evolution of China's land policy, the number of policy documents related to each topic in each year from 1998 to 2018 and the proportion of each topic were calculated, and the results are shown in Figure 5. Figure 5 shows that, in the past two decades, arable land protection and land development have received the most attention in land policy documents, and there is a tradeof between them. Prior to 2016, the frequency of topics on arable land protection was signifcantly higher than that on land development. In particular, in 1998 and 1999, when the Land Administration Law was revised and implemented, and in the three years, after the Tird Plenary Session of the 16th CPC Central Committee in 2003, more than 50% of policy documents focused on arable land protection. In 2017 and 2018, land development surpassed arable land protection for two consecutive years and became the main focus of land policy. Besides, the two topics of land consolidation and utilization and land expropriation and demolition are also worth noting. Clearly, these two topics gained much attention in 2013, which declined slightly in subsequent years, yet are still higher than that in the past decade. Tat is because the report of the 18th CPC National Congress emphasized land use efciency and land expropriation and demolition, which proposed that "reforming the land expropriation system and increasing the proportion of farmers in the distribution of land value-added income" and "substantially reduce land consumption intensity and improve utilization efciency and benefts." Te emphasis of the CPC Central Committee on land use efciency and land expropriation and demolition has encouraged governments at all levels to pay more attention to these issues, thus promoting the introduction of more relevant policy documents. In contrast, the proportion of policy documents on the topic of land planning has remained at a relatively stable level in the past decade, approximately 10%, while that on the topic of land rights confrmation and circulation has declined since 2015.

Spatial Diferences in China's Land Policy and Causes.
China has a vast territory, and the socioeconomic, cultural, and historical backgrounds varied largely among regions. Local governance is adapted to local conditions, which is refected in the tremendous diferences in policy implementation and regulations [48]. For example, government subsidies have diferent degrees of infuence on the supply chain and the market according to diferent levels of subsidies in diferent regions [49]. Land policy priorities also difered spatially. According to the classifcation of the National Bureau of Statistics, the 31 provinces were divided into four major regions, i.e., the eastern, central, western, and northeastern regions, and the proportions of land policy documents on the various topics in the four major regions in the past 20 years are shown in Figure 6. First, for the topics of arable land protection and land development, the degree of attention for arable land protection in the northeastern and central regions (0.45 and 0.44, respectively) is signifcantly higher than that in the eastern and western regions (0.35 and 0.32, respectively), while the land policy of the eastern and western regions focuses more on land development. We believe that the tradeof between "promoting development" and "preserving rice bowls" is mainly determined by two factors. Te frst factor is the regional economic structure. If a region is more dependent on agriculture, more attention is on the protection of arable land. Among the four major regions, the proportion of agriculture in the total GDP in the northeastern and western regions is approximately 11%, followed by the central region at 8%, while the eastern region only accounts for 4%. Terefore, the proportion of policy documents related to arable land protection in the northeastern and central regions accounts for more than 40%. However, for the western region, the factor of the regional economic structure cannot provide a full explanation. Since agriculture output accounts for up to 11% of the GDP in this region, its land policy documents pay the least amount of attention to the protection of arable land. Terefore, the second explanatory factor is the pressure on arable land protection. As pointed out by Liang et al., the western region obtained a large construction land quota after 2003 [50]. With more construction land quota, the pressure on arable land protection in the western region has been obviously smaller than that of other regions, so it is not surprising that the land policy of this region emphasizes the "use of land for development." In terms of other topics, the eastern region, which has gradually moved toward intensive development, has placed more emphasis on land planning than the other three regions. While the western region, with its loose land controls, has paid less attention to land consolidation and utilization. Agriculture plays an important role in the northeastern region; therefore, the topic of rural land rights confrmation and circulation has received signifcantly more attention. As land expropriation and demolition are inevitable in the process of land development, the topic of land expropriation and demolition appears more frequently in land policy

Conclusion and Discussion
Te vast number of policy documents issued by multilevel governments makes it impossible to analyze and extract useful information from them manually. Te existing studies on policy analysis mainly focus on some important policy documents, making it hard to have a detailed analysis of the temporal change and spatial diferences in policies. Tis study tries to use the way of machine learning to have a thorough analysis of the evolutionary logic and spatial diferentiation of China's land policy in the past two decades. Tat is an innovation to combine policy analysis with the technology of machine learning and may have a further application to more wide-ranging text analysis.
Te results show that land development and arable land protection have been the main focus of land policy over the years, the tradeof between which depends largely on the political and economic background. And the focus on other topics is also determined by political and economic conditions. For instance, since the 18th CPC National Congress, topics such as land use efciency and the rights of landexpropriated farmers have gained more attention. From a spatial point of view, the focus of the local land policy is largely determined by the local socioeconomic characteristics and is afected by the land development mode formed under the central-local relationship.
In the future, with the end of the era of urban sprawl, improving land use efciency may get more attention. And the contradiction between "promoting development" and "preserving rice bowls" may continue to exist, thereby challenging the land governance ability of governments. Meanwhile, the land redevelopment will inevitably afect the interests of some social groups. For example, land expropriation and relocation may afect the land use rights of the original holders. Balancing the interests of all parties and efectively adhering to the "people-oriented" principle in the process of land development may become a new issue in land governance.

Data Availability
Te data supporting the fndings of this study are available within the article.

Conflicts of Interest
Te authors declare that they have no conficts of interest. Mathematical Problems in Engineering 11