^{1}

^{2}

^{3}

^{1}

^{2}

^{3}

In the era of big data, many urgent issues to tackle in all walks of life all can be solved via big data technique. Compared with the Internet, economy, industry, and aerospace fields, the application of big data in the area of architecture is relatively few. In this paper, on the basis of the actual data, the values of Boston suburb houses are forecast by several machine learning methods. According to the predictions, the government and developers can make decisions about whether developing the real estate on corresponding regions or not. In this paper, support vector machine (SVM), least squares support vector machine (LSSVM), and partial least squares (PLS) methods are used to forecast the home values. And these algorithms are compared according to the predicted results. Experiment shows that although the data set exists serious nonlinearity, the experiment result also show SVM and LSSVM methods are superior to PLS on dealing with the problem of nonlinearity. The global optimal solution can be found and best forecasting effect can be achieved by SVM because of solving a quadratic programming problem. In this paper, the different computation efficiencies of the algorithms are compared according to the computing times of relevant algorithms.

Facing the upcoming era of big data, more and more people begin to engage in data analysis and mining. Machine learning [

Artificial neural network (ANN), which is constructed by a large number of neurons nodes and corresponding weights, is an artificial system to simulate the neural network of animals and plants in nature. Because of its good nonlinear characteristic, ANN can simulate the nonlinear function. But its accuracy is low and convergence speed is also not very ideal [

In order to find the global optimal solution, SVM method, which was founded on the statistical learning theory, was put forward in the 1990s [

In this paper, the background and current situation of the application of machine learning methods are firstly introduced. The development of its application in construction and real estate value is also expounded. Then, several machine learning algorithms which are involved in this paper are introduced. The mathematics process of them is described in detail. It validates the models constructed by SVM, LSSVM and PLS with the real data of housing value of Boston suburb. According to the prediction results of these several methods, a discussion is made.

SVM was proposed in 1995 based on statistical learning theory [

SVM is constructed on the basis of VC dimension theory and structure risk minimum principle, pursuing the best balance point between the learn ability and model complexity. In the problems of small sample, nonlinearity and high dimension, SVM has significant advantages. Small sample refers not to the absolute number of samples, but the number of samples which SVM algorithm requires is relatively small to the complexity of the problem. Solving the nonlinear problems is the core of SVM method. By the introduction of kernel function and slack variable SVM algorithm cleverly solved the problem of linear inseparable. Kernel function is a function that satisfies the Mercer condition. The complexity of the calculation is effectively reduced by the kernel function. The reason why SVM has very big advantages in solving high dimension problems is that SVM does not need to use all of the samples in dealing with problems and only those support vectors are needed. The troubles in sample storage and computing are avoided [

SVM is mainly used to solve the problems of classification of the samples of different categories and the regression of the samples. The classification problem mainly refers to seeking a hyperplane in the higher dimensional space to separate out the samples of different categories. For SVM, the multiple classification can be solved via constructing two classifiers. SVM regression is to predict the concrete value on the basis of different sample characteristics [

The regression function can be expressed as follows:

The introduction of kernel function can solve the problem that the nonlinear mapping function cannot be expressed. According to the kernel function, the inner product of higher dimension feature space can be obtained. The nonlinear regression function can be expressed as

Like other multivariate statistical models, the selection of each variable’s value in the model has deep influence to their model’s performance, such as the type of kernel function and the corresponding kernel parameters. In SVM regression model, regularization parameter and

LSSVM is put forward by Suykens et al. [

Although LSSVM may not necessarily be able to obtain the global optimal solution, it still can get a high recognition rate. The square of error is used in objective function of LSSVM optimization problem, that is the Lagrange Multipliers are proportional to the error term. The direct consequence is that the ultimate decision function has relationship with all of the samples, which represents that LSSVM loses the sparse characteristic of SVM solution [

The training samples are described as

Equation (

The solution of

In 1983, PLS regression was put forward for the first time. As a new method of multivariate statistical data analysis, it has many advantages that traditional multivariate statistical analysis methods do not have [

PLS can find the best function matching with the original data accordingly to minimize the sum of the squares of error. Although the independent variables have multiple correlation, PLS is still able to build the model [

The mathematical derivation processes of PLS regression are to be described as follows.

The input matrix should be normalized at first. Normalization can make the follow-up data processing more convenient and prevent the predicting accuracy to be influenced by the existence of singular samples. Consider

According to the above steps, if the regression equation has achieved sufficient accuracy, algorithm will continue to perform the following steps. But when the precision does not meet the requirement, more ingredients should be picked up from the residual matrix.

Sufficient ingredient can be extracted after

Since

The regression equation of

In this paper, the data is selected from UCI data sets, which can be downloaded from the Internet. Housing value of Boston suburb can be measured through the data of 13 features. These features include per capita crime rate by town, proportion of nonretail business acres per town, and index of accessibility to radial highways.

Housing value of Boston suburb is analyzed and forecast by SVM, LSSVM, and PLS methods and the corresponding characteristics. After getting rid of the missing samples from original data set, 400 samples are treated as training data and 52 samples are treated as test data. Housing value of the training data can be seen in Figure

Features of the training data.

Housing value of the training data.

After constructing SVM model by the training samples, housing value can be forecast on the basis of testing samples. From the contrast between real data and forecasting data in Figure

Forecasting results by SVM.

LSSVM algorithm is used to predict the housing value of Boston suburb. As shown in Figure

Forecasting results by LSSVM.

As shown in Figure

Forecasting results by PLS.

According to the predicting results of home’s value of Boston suburb, SVM has a higher prediction accuracy than LSSVM and PLS. Because of LSSVM simplified mathematical mechanism of SVM, its computation efficiency is the highest. Due to the presence of strong nonlinearity about the home’s value of Boston suburb in the data set, the forecast result of PLS algorithm is not very ideal and the computation efficiency is very low.

In this paper, SVM, LSSVM, and PLS algorithms are used in the field of construction to predict the housing value. According to multiple characteristics, the housing value of Boston suburb is forecasted. The models of several machine learning methods should be constructed and analyzed at first and then combined with the corresponding characteristics of testing data to predict the housing value. The prediction results of various machine learning approaches are not the same. Aiming at the nonlinear data, SVM and LSSVM have better prediction effect, learning ability, and generalization ability than PLS. The prediction effect of SVM is superior to that of LSSVM. LSSVM is remoulded on the basis of SVM mathematical process. The quadratic programming problem in SVM is transformed into solving an equation system in LSSVM, which leads to the fact that the computation complexity is reduced and the calculation efficiency of LSSVM algorithm is higher than SVM algorithm. Compared with PLS, SVM and LSSVM are more suitable for the nonlinear field. Due to the simplicity of algorithm, PLS algorithm is more suitable for the linear system. At this stage, PLS is widely used in industrial and other fields.

The authors declare that there is no conflict of interests regarding the publication of this paper.

This present work was supported partially by the Open Foundation of State Key Laboratory of Robotics and Systems (HIT). The authors highly appreciate the above financial supports.