A BP Neural Network-Based GIS-Data-Driven Automated Valuation Framework for Benchmark Land Price

The automated valuation of benchmark land price plays an essential role in regulating land demand in Chinese real-estate market as the big data are currently accumulated rapidly. However, this problem becomes highly challenging due to the multidimension, large volume, and nonlinearity of the land price-inﬂuencing factors. In this paper, an eﬀective data-driven automated valuation framework is proposed for valuing real estate assets by combining a GIS (geographic information system) and neural network technologies. This framework can automatically obtain the values of spatial factors aﬀecting land price from GIS and generate training set data for training the neural network to identify the complex relationship between all kinds of factors and benchmark land prices. The eﬀectiveness and universality of the framework is veriﬁed via the data of benchmark land prices in Wuhan. The framework can be applied for automated benchmark land price valuation in other cities.


Introduction
In China, the benchmark land price is one of the main means for the government to control land demand at the macrolevel. e benchmark land price essentially reflects the average level of land prices in a certain area of a town, as well as its spatial distribution pattern. It can macroscopically guide land space utilization and investment behaviors and can be used as the basis for bulk land valuation. e benchmark land price is relatively stable over certain periods, generally being in line with the real estate market and urban development requirements [1,2]. However, with the development of the social economy and the progress of urbanization, there exists a gap between the benchmark land price and the actual price of the land market. When this gap reaches a certain level, the benchmark land price needs to be reevaluated, which is a basic business in asset valuation. However, there are defects in the current commonly used benchmark land price valuation approaches. Firstly, there is a large subjectivity in the process of dealing with the benchmark land price impact factors. Secondly, when evaluating an entire city, the resulting large amount of data result in low work efficiency. Facing the huge land transaction market, the application effect of traditional benchmark land price valuation approaches has become increasingly unsatisfactory. It is therefore necessary to establish a reasonable automated benchmark land price valuation framework.
When constructing an automated benchmark land price valuation framework, we are faced with several challenges: (1) benchmark land prices often take an entire city as the valuation object, involving a large number of land parcels. erefore, benchmark land price valuation has higher requirements for data collection and processing; (2) there are multiple dimensions of the factors affecting the benchmark land price, and many factors have no quantitative indicators. In traditional valuation, the value of these factors is often determined based on experience, which decreases the objectivity of the valuation process; (3) the mechanisms of impact factors on the benchmark land price are complicated, which cannot be reflected by simple linear relations; (4) considering the above three reasons, benchmark land price valuation is often carried out every few years. erefore, the guiding role of the benchmark land price in real estate transactions is reduced. e current valuation approaches cannot meet the efficiency and accuracy requirements for assessing a large number of lands.
us, the valuation methods need to improve their applicability and timeliness.
In recent years, the development and application of the GIS technology have provided a basis for large-scale geographic data collection which is updated in real time [3]. Xu and Li [4] conducted a sustainability analysis through GIS for urban residential development which considered the impacts of benchmark land prices. On the other hand, artificial intelligence technology has also begun to play a role in the field of valuation. Neural networks can map complex nonlinear relationships between an input and output, as well as handling multidimensional complex data well. In addition, the self-training processes of neural networks avoid the need for human subjective interference. erefore, based on data collected by GIS, this paper attempts to integrate artificial intelligence and batch valuation to design an automated valuation framework for the benchmark land price. e critical innovation of this paper lies in the framework we proposed for benchmark price valuation. We use the GIS technology to establish a geographic information database to collect geographic data. To deal with the incommensurability of impact factors, we unify the quantitative criteria of these factors by using GIS-based spatial data quantification and weight setting. Considering the nonlinear interactions and relationships between impact factors and land prices, a pricing model based on BP neural networks is constructed, in order to reduce subjectivity and uncertainty of valuation and improve the effectiveness of valuation results. Finally, taking Wuhan as an example, the effectiveness of our framework is verified and the corresponding limitations are discussed. e rest of this article is organized as follows: Section 2 summarizes the relevant literature from three aspects: benchmark land price valuation, automated valuation technology, and artificial intelligence technology in land price valuation. Section 3 introduces the automated valuation framework; Section 4 uses the residential benchmark land price valuation in Wuhan as a data source to verify the effectiveness of the framework. e paper is concluded in Section 5.

Literature Review
is part will summarize the related research from three aspects: (1) benchmark land price valuation, (2) automated valuation, and (3) application of artificial intelligence technology in valuation field.

Benchmark Land Price.
Wang [5] proposed to predict the price curve of commercial and residential land through the urban population size and income level. e results showed that population size and income level had a positive impact on urban benchmark land price. Moreover, the benchmark land price for different land-use types often varies greatly; for example, the benchmark land price of general commercial land is typically higher than that of residential land. De Groot [6] developed a comprehensive valuation framework of ecological services and socioeconomic benefits. By introducing landscape ecology, this valuation framework could accurately reflect the value of the land itself. Davis and Heathcote [7] found a close relationship between the price of residential land in the United States and the current housing price. Demetriou [8], in a case study based in Cyprus, found that location characteristics, legal factors, physical attributes, and economic conditions had more impact on the value of land than other factors. Yang et al. [9] pointed out that demographic and economic factors have an influence on the residential land price in major cities of China and also found that the GDP has a great influence. Burian et al. [10] divided the impact factors of land price into three aspects: social, environmental, and economic. Compared with social and environmental factors, the impact of economic factors on land price was found to be relatively small. Glumac et al. [11] proposed a hedonic urban land price index which could quantify the spatial effect of land and the impact of macropolicies on land transaction volume. Li et al. [12] concluded that the price of urban land is not only influenced by the city itself but also by its interaction with nearby cities. ey calculated the interaction intensity between cities based on a gravity model, in order to obtain the urban land price of a certain area. Yuan et al. [13] focused on the difference of land prices caused by land marketization and fiscal decentralization. Nakamura [14] proposed a land pricing model which considers six factors: entrepreneurship, nature conservation, resource recycling, social vitality, financial viability of local governments, and environmental quality. Tan et al. [15] considered that a newly opened subway station was more likely to raise land prices nearby than a central station. In addition, earthquake risk [16] and noise [17] have also been shown to affect the land price under special circumstances.
From the above analysis, it can be seen that the factors affecting land price mainly focus on the natural environment, economic society, population, infrastructure construction, and the correlations among relevant cities.

Automated Valuation.
In the theoretical research of automated valuation, various models have been proposed and their adaptability has been studied. Berry and Bednarz [18] and Pace and Gilley [19] proposed automated valuation for public purposes, such as for the collection of property taxes, where the hedonic price index was considered as an appropriate method. IAAO [20] applied the three valuation processes of AVM (automated valuation model)-namely, model specification, model calibration, and model test and quality assurance-to the valuation research of benchmark land price. Aragonés-Beltrán et al. [21] proposed a method based on ANP which they applied to the valuation of industrial land value, demonstrating that it could accurately model complex environments. Metzner and Kindt [22] further developed the hedonic price index by extracting, comparing, and integrating parameters from a large number of studies in the literature. Taking Cyprus as a case study, Demetriou [23] developed an automated valuation model (AVM) using geographic information system (GIS) data, which had a higher valuation efficiency and greatly reduced the use of time and resources. e general valuation process has also begun to shift from empirical to systematic and standard. Renigier-Biłozor [24] proposed an automated valuation model with decision theory and data mining to assess real-estate values. In response to the multidimensional factors that affect land prices, Bencure et al. [25] adopted AHP to integrate multiple factors into one model, in order to achieve the automated valuation of land. It can be seen that the most critical process in automated valuation or batch valuation models is dealing with large-scale parameters, as well as integrating multiple parameters with different characteristics into one valuation model.

Application of Artificial Intelligence Technology in
Valuation. Artificial intelligence technology has been applied in assets valuation areas such as box office revenue [26], stock value [27][28][29], and the auction price of paintings [30]. In real estate valuation, artificial intelligence technology has been widely used, especially in developed countries [31]. e most commonly used models are artificial neural networks (ANNs). ANNs can obtain the nonlinear relationships between real-estate characteristics and real-estate prices but do not have many parameter limits, such as in traditional statistical methods [32].
Peterson and Flanagan [33] found that, compared with the linear hedonic price index model, the artificial neural network evidently had lower pricing error and higher pricing accuracy, which can lead to better predictions in a volatile pricing environment. Follow-up studies indicated that, in different situations, ANN and multiple linear regression each have their own advantages and the advantages and disadvantages of the two are not absolute. When the homogeneity of the data set is low, a valuation method based on AI is more appropriate [34]. ANN, as a "black box" datadriven method, has the major disadvantage of complexity [35], which limits its use by valuators. Multiple regression analysis is more transparent and explanatory than ANN. Geographically weighted regression can be used to achieve an optimal balance between the accuracy of the valuation and the transparency of the methodology [36].
Furthermore, Bin et al. [37] used the long short-term memory (LSTM) model to improve an automatic valuation model based on Boosting Tree to predict house prices in the next few months or a year. Poursaeed et al. [38] believed that the internal and external appearance affect the value of a house and thus proposed to evaluate the visual characteristics of houses by use of convolutional neural networks (CNNs). In addition to artificial neural networks, other methods have also been used in the field of real estate valuation, such as Random Forest [39,40], Fuzzy Logic System [41,42], and Genetic Algorithms [43]. e related research illustrates that the wide application of artificial intelligence provides an opportunity to conduct automated valuation. e benchmark land price is an important tool for the government to implement macrocontrol on the land market, and it is a business in asset valuation. e introduction of any new method must comply with the criteria for asset valuation. As a basic approach for land valuation, the market approach selects multiple factors that affect land value, quantifies the differences between the reference objects and the evaluated land with respect to these factors after determining the weights of these factors, and finally determines the land value. In essence, it is a multiobjective decision problem, and an optimization method can be introduced to improve the decision efficiency [44,45]. e calculation process of the BP neural network [46] conforms to the basic idea of the traditional market approach, which considers the effect of comparable impact factors. More importantly, BP neural network can make up for the shortcoming of the simple linear assumption in the market approach to the influencing mechanism of the impact factors. Moreover, there are many factors affecting benchmark land prices and, so, the first problem to be solved is data collection and integration of multidimensional factors. erefore, based on the GIS and BP neural network, we propose a framework of automated benchmark land price valuation with the function of spatial data collection and processing.

Automated Valuation Framework
In this paper, we propose an automated benchmark land price valuation framework based on the BP neural network. In this framework, the land use must be determined first, as plots with different uses yield different benchmark land price types. en, the impact factors of land price are selected, according to the land-use type, to form a value impact system. e sample data to train the neural network are collected with the assistance of the GIS. Finally, after training and testing, the neural network-based valuation model can execute the automated valuation of benchmark land prices. e basic procedure of the framework is shown in Figure 1.

Impact Factors Selection.
According to different specific uses, different functional attribute values of the same plot yield different benchmark prices. For example, the value of commercial land is mainly related to its geographical location, local economic development, lifestyles of residents, and surrounding business services; the land area, mode of land leasing, location of the district, GDP, paid-in foreign investment, unemployment, tenure of district mayor, gross industrial production, and total industrial asset affect the industrial land price [47]; meanwhile, residential land value is closely related to the surrounding cultural environment, traffic, and public facilities. After determining the specific use of the land, it is necessary to choose the corresponding impact factors. As the most common transaction object in the land trading market, the valuation of residential land has attracted the most attention [48].
For residential land benchmark pricing, impact factors exist in many aspects. Dale and McLaughlin [49] proposed Complexity 3 that land price depends on various factors, such as economy, society, environment, and law. By using the Delphi method, China's local government found that the impact factors for the land price were business services, public facilities, population density, transportation facilities, and so on [1]. Residential land price is also influenced by the environment and ecology [50]. Xu and Coors [51] took socioeconomic level, regional pressure of environmental constraints, population pressure, and transportation pressure as the most important impact factors affecting urban residential land price. Yang et al. [9] used geographically weighted regression (GWR) to analyze the relationships between residential land prices and housing prices. ey screened out three main impact factors-immigrant population, gross domestic product (GDP), and residential investment-and found that the impact of GDP on the residential land price was more significant than other factors in the prior three years. It has been observed that location, business services, and education resources have a great influence on the spatial layout of residential land prices. e benchmark land price is the basic reference for residential land prices. e government also considers the same factors when pricing. As the basic consensus of the industry, to meet the requirements of the evaluation criteria, in this paper, we chose business services, environment, traffic, public facilities, and GDP per capita as the five dimensions for impact factors of residential land price. According to the different connotations of these five dimensions, the specific impact factors of residential land price were summarized into the valuation system shown in Table 1.

Data Collection and Preprocessing.
e premise of automated valuation is that there are a lot of standardized samples and data. As benchmark land price valuation is carried out for an entire administrative region, we imported the corresponding plot information into GIS, in order to establish a geographic information database for collecting and processing data in real time.
It should be noted that some impact factors can be quantified by GIS, such as distance to the nearest business service center and driving time to the nearest train station. e accessibility of the road network can be quantified by the number of arterial roads around the plot. e indices which are difficult to quantify, such as intensity of nearby scenic location and intensity of medical support, can be processed by an index scoring method. e standards for index scoring are shown in Table 2, which is offered by our project collaborator.

Backpropagation Neural Network
Training. In our neural network, the input value of the input layer corresponds to the matrix of preprocessed data of impact factors. e output layer corresponds to the predicted value of the residential benchmark land price without renormalization.
e quantified values of 13 benchmark land price impact factors were selected as the input, while the output is the residential benchmark land price. e proposed BP neural network, therefore, reflects the mapping relationship of 13 independent variables to 1 dependent variable.
Based on the above analysis, the specific operation process was as follows: Step 1: establish the value impact system of residential benchmark land price valuation Step 2: use GIS and index scoring method to collect impact factor data Step 3: process the collected parameters and data by the mapminmax function where X is the value after processing, x is the value of the impact factor, x min is the minimum value in this kind of impact factor, and x max is the maximum value in this kind of impact factor Step 4: establish a BP neural network model and determine the number of network layers, the number of input layer nodes, the number of hidden layer nodes, the number of output layer nodes, training functions, performance functions, and so on

Complexity
Step 5: train the model to process residential benchmark land price samples Step 6: test the BP neural network model. When it successfully passes the test, the model can then be used to evaluate and update the residential benchmark land price in real time.

Data Collection.
We selected the most representative data samples from the 10 central districts of Wuhan, based on the latest residential benchmark land price data for 2018. e 110 groups of residential benchmark land price valuation project information were used to train and test the model. e related data sources are shown in Table 3.
All the collected information was imported to the GIS database to draw an electronic vector map of Wuhan, as shown in Figure 2.
is map can be used to obtain the geographic characteristics of the impact factors, which were used to analyze the correlations between the target plot benchmark price and each impact factor.
Combining the data collection and preprocessing steps mentioned in Section 3.2, the values of the impact factors for each sample point were finally obtained. Table 4 shows 10 groups of sample data, and the descriptive statistics of the whole data set is provided in Table 5.
After a series of training simulations and tests, the optimal BP structure was determined to be 13-10-1. e other model parameters and equations are shown in Table 6.

Training of BP Neural Network Model.
Considering the limited sample size, we used 10-fold cross-validation to train the BP network, obtaining the following 10 sets of test residential benchmark land price results (Figure 3). e test error results of the ten groups are shown in Table 7.
From the previous results, it can be seen that there was a certain gap in the size of the MSE obtained by these 10 groups of experiments, but the overall error was within 2%, which is acceptable in practice. We randomly selected a group of data sets to evaluate the experimental results; the results are shown in Figure 4.
According to the above results, it can be seen that the model reached the best verification state at the 1 st epoch, where the mean square error was at the lowest value. Moreover, the regression analysis shows that the training data set, testing data set, and overall data set all showed a linear relationship between the values, with a correlation coefficient close to 1, as shown in Figure 5.

Test of BP Neural Network Model.
In order to make the test results more intuitive and easier to understand, the training output obtained by the model was denormalized to the predicted value of residential benchmark land price, as shown in Table 8. We compared the actual value with the predicted value, and the absolute error and relative error were given as well. e relative error is calculated as follows: where y is the actual benchmark land price, y ∧ is the forecast benchmark land price, and ε is the relative error. e result of the error analysis is shown in Table 9.
According to the obtained results, it can be seen that the error of the model itself was relatively small, further reflecting the applicability of the framework. It proves that, in addition to real estate [24], the automatic evaluation can also be used in land evaluation [8,22,23,53] as well. Moreover, from the results, the validity of the five dimensions of impact factors on residential land price is consistent with the previous practice results of the valuation industry.

Results
Analysis. Using GIS, the error comparison data were reflected in the form of a heat map, as shown in Figure 6 Figure 6(a) illustrates that Wuhan's benchmark land prices showed a trend of gradual decreasing from the central city to its surroundings, consistent with the distribution of land prices in most Chinese cities. Among the surrounding urban areas, it is worth noting that the prices in the Huangpi and Xinzhou districts (in the northeast) were higher than       those of the Caidian and Hannan districts (in the southwest). is is mainly because there are more natural scenic spots in the Huangpi district than in the southwest urban districts, and an important part of the nationally planned Wuhan Yangtze River Economic Belt, e Yangtze River New City, is located at the junction of Huangpi and Xinzhou districts. ese factors have promoted an increase in land prices. In Figure 6(b), it can be seen that, although the predicted value   Complexity has a good fit with the actual benchmark land price, there were still some deviations in some regions. e deviations indicate that the price in the northeast was overestimated and the price in the southwest was underestimated. e sensitivity of the model to environmental and economic factors caused the price of the urban area in the northeast to be overestimated. e urban area in the southwest is a national automobile industry base, and the related employed population has a high demand for housing. e impact mechanism of the industrial base on local land prices was not fully reflected in the model. However, these factors have strong localization features. Under the premise of ensuring the universality of the framework, these factors can be added separately when conducting the valuation for specific cities.
Overall, the main reasons for the deviations between the predicted value and the actual value of the benchmark land price were as follows: (1) when quantifying the impact factors of residential benchmark land price, some factors used an index scoring method, which affected the accuracy of the valuation. (2) e heterogeneity of certain factors was ignored. For example, when considering the impact of the school district, only the distance to the nearest primary and middle schools was selected as the main consideration. However, school rankings have a more obvious impact on land prices, which was ignored in the model. (3) e analysis of macrofactors was not very comprehensive; for example, the impact of some government regional support policies was not considered in depth.

Conclusion
As one of the main means of the Chinese government's macrocontrol of land demand, the benchmark land price needs to be adjusted, in a timely manner, along with economic development. e traditional benchmark land price valuation process has the drawback of being subjective and unable to process large amounts of data. In recent years, the development and application of the GIS technology has provided a basis for the collection and real-time update of large-scale geographic data. erefore, in this paper, we propose an automated valuation framework based on GIS data and neural networks.
is framework automatically extracts spatial data which affects land prices through the GIS database, thus reducing data distortion. Using an impact factor valuation system to avoid the incommensurability of the data, we finally designed an automated valuation model based on the BP neural network model, effectively reducing the subjectivity and uncertainty of the valuation.
In the case study, in which we took Wuhan as an example, we found that although there were still some errors in the model training and testing results, they generally met the accuracy requirements. It can be seen that the BP neural network had good performance in automated valuation. e model can also be applied to the prediction of residential benchmark land prices outside Wuhan. If there are a few factors that affect the benchmark land price significantly, the model can be further adjusted. As data continue to accumulate, the fitness of the BP neural network can be further improved.
is is of great practical significance for regulating the order of the Chinese land market, formulating relevant land management laws and allowing the public to more conveniently receive land transaction information. Due to the project schedule requirements, the number of samples data provided by the collaborator is not large, and the effectiveness of the proposed framework needs to be further verified with a larger data set in the future work. In addition, except for the spatial data obtained from the GIS, more data require manual collection. e way to introduce data mining and web crawling technology to improve the automation of the proposed framework is also a focus of future work.
Data Availability e data are available on request to the corresponding author after the paper is published.

Conflicts of Interest
e authors declare that they have no conflicts of interest.