Housing Value Forecasting Based on Machine Learning Methods

2014 Jingyi Mu et al.ThisisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing the real estate on corresponding regions or not. In this paper, support vector machine (SVM), least squares support vector machine (LSSVM), and partial least squares (PLS) methods are used to forecast the home values. And these algorithms are compared according to the predicted results. Experiment shows that although the data set exists serious nonlinearity, the experiment result also show SVM and LSSVM methods are superior to PLS on dealing with the problem of nonlinearity. The global optimal solution can be found and best forecasting effect can be achieved by SVM because of solving a quadratic programming problem. In this paper, the different computation efficiencies of the algorithms are compared according to the computing times of relevant algorithms.


Introduction
Facing the upcoming era of big data, more and more people begin to engage in data analysis and mining.Machine learning [1], as a common means of data analysis, has gotten more and more attention.People of different industries are using machine learning algorithms to solve the problems based on their own industry data [2,3].Experts in the field of industry used machine learning in pattern recognition [4] and fault diagnosis [5,6].People in the field of economy began to use machine learning algorithms in economic modeling [7,8].The advantages of these algorithms were taken by the specialist in the field of aerospace in the aspect of classification and prediction [9].Researchers in the field of construction combined machine learning methods with the professional domain knowledge of construction industry.Many intelligent systems are used in the construction industry, and lots of them have achieved good economic and social benefits.In general, the budget of construction project and benefit analysis of construction are usually gotten by the experience of the professionals and the construction of traditional models.If the housing values can be accurately predicted, the government can make a reasonable urban planning.The historical housing price index was used by Malpezzi in 1999 to predict the changes of prices of 133 U.S. cities [10].He thought that the price of the house was not randomly changed but followed certain rules.So, the prices can be partly predicted.Anglin predicted the real estate prices of Toronto by establishing a VAR model [11].The results showed that the growth of the real estate prices is associated with unemployment and consumer prices.Some experts predicted the property's value by neural network.
Artificial neural network (ANN), which is constructed by a large number of neurons nodes and corresponding weights, is an artificial system to simulate the neural network of animals and plants in nature.Because of its good nonlinear characteristic, ANN can simulate the nonlinear function.But its accuracy is low and convergence speed is also not very ideal [12].Since the ANN system is composed of a large number of individual neurons, the system has strong expansibility and evolutional ability.Due to the existence of multiple equilibrium position, ANN may be trapped in local 2 Abstract and Applied Analysis minimum problem in optimization.It is difficult to obtain the global optimal value [13].
In order to find the global optimal solution, SVM method, which was founded on the statistical learning theory, was put forward in the 1990s [14,15].By solving a quadratic programming problem, SVM can find the global optimal solution.But when the samples are a lot, it leads to a higher complexity.For solving the problems of less number of samples, sample data is nonlinear and samples have high dimensions; SVM has great advantages.SVM not only has a strong learning ability but also has a strong generalization ability.SVM is mainly solving classification and regression problems.For the linear undivided sample data, SVM mapped the samples to a high-dimensional space to make them linear separable [16].LSSVM is the improved results of SVM [17].By changing the inequality constraints in the SVM into equality constraints, the original quadratic programming problem becomes a problem to solving system of linear equations.When compared with SVM, LSSVM not only reduced the complexity of the calculation but also improved the efficiency of calculation.In addition, the parameters that need to be adjusted in LSSVM are less.But LSSVM lost the sparse characteristic of SVM [18].PLS is put forward by Wold to solve chemical sample analysis problems in the late 1960s.It has a lot of advantages in solving the problem of multiple variables, especially when all sample variables exhibit serious internal correlations.PLS algorithm is simple and has strong explanatory power.So it was gradually applied to other aspects besides the area of chemistry [19].
In this paper, the background and current situation of the application of machine learning methods are firstly introduced.The development of its application in construction and real estate value is also expounded.Then, several machine learning algorithms which are involved in this paper are introduced.The mathematics process of them is described in detail.It validates the models constructed by SVM, LSSVM and PLS with the real data of housing value of Boston suburb.According to the prediction results of these several methods, a discussion is made.

Algorithm
2.1.The Regression Algorithm of SVM.SVM was proposed in 1995 based on statistical learning theory [20].Compared with the traditional machine learning methods, the machine learning algorithms at present stage are more rigorous in logic and more outstanding in generalization performance.
SVM is constructed on the basis of VC dimension theory and structure risk minimum principle, pursuing the best balance point between the learn ability and model complexity.In the problems of small sample, nonlinearity and high dimension, SVM has significant advantages.Small sample refers not to the absolute number of samples, but the number of samples which SVM algorithm requires is relatively small to the complexity of the problem.Solving the nonlinear problems is the core of SVM method.By the introduction of kernel function and slack variable SVM algorithm cleverly solved the problem of linear inseparable.Kernel function is a function that satisfies the Mercer condition.The complexity of the calculation is effectively reduced by the kernel function.The reason why SVM has very big advantages in solving high dimension problems is that SVM does not need to use all of the samples in dealing with problems and only those support vectors are needed.The troubles in sample storage and computing are avoided [21].
SVM is mainly used to solve the problems of classification of the samples of different categories and the regression of the samples.The classification problem mainly refers to seeking a hyperplane in the higher dimensional space to separate out the samples of different categories.For SVM, the multiple classification can be solved via constructing two classifiers.SVM regression is to predict the concrete value on the basis of different sample characteristics [22,23].The training data set is defined as (  ,   ),  = 1 ⋅ ⋅ ⋅ , where   is the input data and   is the corresponding output data.
The regression function can be expressed as follows: where  * represents weight vector and  is a normal.Both of these parameters can be gotten from the function as follows: min where  is the insensitive loss function,  represents the punishment coefficient, and  *  is the slack variable.The introduction of kernel function can solve the problem that the nonlinear mapping function cannot be expressed.According to the kernel function, the inner product of higher dimension feature space can be obtained.The nonlinear regression function can be expressed as where α() = (α 1 , α 1 , . . ., α , α  )  is the solution.Like other multivariate statistical models, the selection of each variable's value in the model has deep influence to their model's performance, such as the type of kernel function and the corresponding kernel parameters.In SVM regression model, regularization parameter and -insensitive loss function played a crucial role.Regularization parameter determines the balance between maximum classification interval and minimizing training error.If regularization parameter is chosen inappropriately, the model may appear overfitting and underfitting. is also an important parameter, in which value depends on the type of noise.

The Regression Algorithm of LSSVM.
LSSVM is put forward by Suykens et al. [24].The purpose of it is to solve some problems that exist in SVM algorithm, such as the selection of hyperplane parameters, and the matrix scale is highly affected by the number of training samples when solving a quadratic programming problem.Suykens starts from the loss function, using the two-norm in the objective function and replacing inequality constraints in SVM algorithm by equality constraints.Using Kuhn-Tucker conditions can get the solution of this set of equations.When the data size reaches a certain extent, the scale caused by solving a quadratic programming problem is very large.Some traditional methods are difficult to apply in such a large data size.By solving the linear equations, LSSVM not only reduces the difficulty of solution but also promotes the speed to solve the problem.So it is more suitable for applying to solve the problems of data on a large scale [25].
Although LSSVM may not necessarily be able to obtain the global optimal solution, it still can get a high recognition rate.The square of error is used in objective function of LSSVM optimization problem, that is the Lagrange Multipliers are proportional to the error term.The direct consequence is that the ultimate decision function has relationship with all of the samples, which represents that LSSVM loses the sparse characteristic of SVM solution [26].
The training samples are described as (  ,   ),  = 1 ⋅ ⋅ ⋅ .  is the input data.Each column represents a feature.  is the output data.Owing to these training data, the regression model can be described as follows: where () represents the nonlinear mapping function,  represents the weight factor, and  is the invariable coefficient.
The cost function can be described as follows: where   represents random error and  represents regularization parameter.Lagrange function is introduced to solve this equation where α represents Lagrange multiplier.The solution of ( 6) can be obtained by partially differentiating Equation ( 7) can be described as The solution of  and α can be obtained from (8).After introducing the kernel function, the objective function could be rewritten as

Partial Least Squares Regression Algorithm.
In 1983, PLS regression was put forward for the first time.As a new method of multivariate statistical data analysis, it has many advantages that traditional multivariate statistical analysis methods do not have [27].PLS can find the best function matching with the original data accordingly to minimize the sum of the squares of error.Although the independent variables have multiple correlation, PLS is still able to build the model [28].All of the independent variables will be contained in the final model of PLS regression.And maximum information will be extracted from the original data, which ensures the accuracy of the model.PLS model is easier to distinguish the noise from normal information.When the number of variables is greater than the number of sample points, PLS algorithm is often chosen to construct model.PLS can construct regression model, simplify the data structure, and analyze the correlation at the same time [29].
The mathematical derivation processes of PLS regression are to be described as follows.
The input matrix should be normalized at first.Normalization can make the follow-up data processing more convenient and prevent the predicting accuracy to be influenced by the existence of singular samples.Consider  = [ 1 ,  2 , . . .,   ] × and  = [] ×1 , where  represents mean value of   ,   represents standard deviation of   ,  represents mean value of , and   represents standard deviation of .ℎ 1 is the main ingredient, which is extracted from  0 ; ℎ 1 =  0  1 .The more ℎ  includes the information of  0 , the better ℎ  can be on behalf of  0 .After this, ℎ 1 has the major information of , where  1 and  1 are two parameters, The residual matrix can be expressed as follows: According to the above steps, if the regression equation has achieved sufficient accuracy, algorithm will continue to perform the following steps.But when the precision does not meet the requirement, more ingredients should be picked up from the residual matrix.
Sufficient ingredient can be extracted after  rounds; the equation has achieved the precision,  = 2, . . ., , where  represents the number of main ingredients.The regression equation of  0 is described as follows: Since  0 can be expressed by the combination of ℎ 1 , . . ., ℎ  , Ñ0 is expressed as follows: where    is    = ∐ −1 =1 ( −      )  and  is the unit matrix.The regression equation of  on  can be obtained by an inverse process of standardization.

Experimental
In this paper, the data is selected from UCI data sets, which can be downloaded from the Internet.Housing value of Boston suburb can be measured through the data of 13 features.These features include per capita crime rate by town, proportion of nonretail business acres per town, and index of accessibility to radial highways.
Housing value of Boston suburb is analyzed and forecast by SVM, LSSVM, and PLS methods and the corresponding characteristics.After getting rid of the missing samples from original data set, 400 samples are treated as training data and 52 samples are treated as test data.Housing value of the training data can be seen in Figure 2. All features of the training data can be seen in Figure 1 and predicted values, the effects of different methods can be compared.
After constructing SVM model by the training samples, housing value can be forecast on the basis of testing samples.From the contrast between real data and forecasting data in Figure 3, it can be seen that although forecasting data have certain deviations from the real data, they can also reflect the change trend of different samples anyway.The mean square error of estimating data, which is obtained by SVM, is 10.7373.The running time of SVM algorithm is 0.4610 s.
LSSVM algorithm is used to predict the housing value of Boston suburb.As shown in Figure 4, several forecasting data have relative great deflection to the real data.The mean square error of forecast data is 15.1310, which shows that the accuracy of SVM is bigger than that of LSSVM.The running time of LSSVM algorithm is 20.3730 s.Since a parameters optimization step is joined in LSSVM program, the calculation time of overall program is longer than that of SVM method.After removing the parameter optimization process, the operation time is 0.3460 s, which suggests that LSSVM has a higher computational efficiency.But the corresponding prediction effect will be worse.
As shown in Figure 5, PLS is used for predicting the homes' value of Boston suburb.The predicting situation is not very ideal.There is a big deviation with the real value.The mean square error of predictive value is 25.0540.The algorithm's running time is 0.7460 s.For this nonlinear data sets, the forecasting ability of PLS is obviously worse than that of SVM and LSSVM.
According to the predicting results of home's value of Boston suburb, SVM has a higher prediction accuracy than LSSVM and PLS.Because of LSSVM simplified mathematical mechanism of SVM, its computation efficiency is the highest.Due to the presence of strong nonlinearity about the home's value of Boston suburb in the data set, the forecast result of PLS algorithm is not very ideal and the computation efficiency is very low.

Conclusion
In this paper, SVM, LSSVM, and PLS algorithms are used in the field of construction to predict the housing value.According to multiple characteristics, the housing value of Boston suburb is forecasted.The models of several machine learning methods should be constructed and analyzed at first and then combined with the corresponding characteristics of testing data to predict the housing value.The prediction results of various machine learning approaches are not the same.Aiming at the nonlinear data, SVM and LSSVM have better prediction effect, learning ability, and generalization ability than PLS.The prediction effect of SVM is superior to that of LSSVM.LSSVM is remoulded on the basis of SVM  mathematical process.The quadratic programming problem in SVM is transformed into solving an equation system in LSSVM, which leads to the fact that the computation complexity is reduced and the calculation efficiency of LSSVM algorithm is higher than SVM algorithm.Compared with PLS, SVM and LSSVM are more suitable for the nonlinear field.Due to the simplicity of algorithm, PLS algorithm is more suitable for the linear system.At this stage, PLS is widely used in industrial and other fields.

Figure 1 :
Figure 1: Features of the training data.

Figure 2 :
Figure 2: Housing value of the training data.
value of owner-occupied homes in $1000's
value of owner-occupied homes in $1000's