Establishment of Architectural Heritage Evaluation Indicator System Based on Cluster Analysis in the Era of Big Data

In the era of big data, the data is collected and applied in every aspect of life. Establishing a reasonable architectural heritage evaluation indicators system is the key of architectural heritage evaluation. According to the connotation of architectural heritage and the standard of eliminating duplicate information and the standard of the maximum weighted R cluster grade, this paper constructs an architectural heritage evaluation indicators system through quantitative approaches of R cluster and rough set analysis. The contribution lies in the following: Firstly, it uses the method of square-sum of dispersion to classify the evaluation indicator criterion of the same guideline, which ensures the di ﬀ erent indicators re ﬂ ect information that is di ﬀ erent. Secondly, it uses the rough set analysis to solve the weighted rough set grade of each indicator of the similar, then screening the indicator that contain the maximum weighted rough set, in which ensuring the selected indicator is the most signi ﬁ cant in ﬂ uence to the evaluation result. Thirdly, through quantitative approaches of R cluster and rough set analysis, establishing the architectural heritage evaluation indicators system composed of 18 indicators includes three guidelines: heritage historical value, heritage artistic value, and heritage scienti ﬁ c value.


Introduction
Historical architectural heritage is the precious material cultural heritage of human remains, which has important aesthetic, historical, and sociological significance [1,2]. In the process of urban reconstruction and construction in the past, historical buildings have been destroyed to varying degrees. In recent years, with the rise of historical preservation movement, historical architectural heritage has attracted more and more attention because of its scarcity and nonrenewable. How to select appropriate indicators to evaluate the comprehensive value of historical buildings, so as to provide a more comprehensive and objective basis for the protection of historical buildings and realize the coordination and balance of multiple values, is of great practical significance [3,4].  [1]; the evaluation index of architectural heritage in the Venice Charter adopted in 1964; and the evaluation index of architectural heritage in Nara Document on Authenticity (1994) [5][6][7].
The two is the evaluation index system of academic literature. There are many evaluation systems of architectural heritage value [8,9]. In recent years, with the continuous development of science and technology and the continuous strengthening of interdisciplinary intersection and penetration, theories and methods of different disciplines are combined, and many kinds of evaluation systems of architectural heritage value have emerged. There are mainly fuzzy mathematics evaluation system, multivariate statistical index evaluation system, comprehensive evaluation index system based on the complexity and diversity of evaluation objects, and index evaluation system based on information theory [10]. There are some problems in these index systems; for example, the information reflected by indicators is repeated and the system is too complicated.

Research Status of Evaluation Index Selection Method of
Architectural Heritage Value. The first is the subjective selection method based on expert experience. Mingyu established the evaluation index system of architectural heritage by using the method of artificial subjective determination and weighting of evaluation indexes [11]. The problem of the subjective screening method is that it is very arbitrary. Second, an objective screening method was based on a quantitative method. Qiqian constructed an evaluation index system of architectural heritage using principal component factor analysis [12]. Zhongchao and Yinhua used information entropy to evaluate the value of heritage corridors [13]. The problem of the objective screening method is that it relies too much on the index data and ignores the actual meaning of the index.
In view of the above problems, based on the value connotation of architectural heritage, the evaluation index system of architectural heritage through the selection of indicators and the quantitative selection of indicators based on rough set cluster decision analysis is established in this paper; the defects of single subjective method or objective method are overcome by this method.

Construction Principle of Evaluation Index
System of Architectural Heritage 2.1. Foundation of Index System. Based on the commonly used evaluation index parameters of internationally renowned architectural heritage evaluation institutions [14][15][16], combining with architectural heritage assessmentrelated academic literature indicators and data availability principles, the index system of architectural heritage evaluation is established.

Criteria for Selecting Indicators
(1) Criteria for removing information duplication indicators: the deviation level method is used to cluster evaluation indicators to ensure that the response information of different indicators is not repeated after screening.
(2) Rough set analysis is used to calculate the approximate classification quality coefficient of similar indexes in R cluster, and the index with the maximum approximate classification quality coefficient is removed, and the index with the minimum approximate classification quality coefficient is selected. So the greatest impact on the architectural heritage evaluation is ensured by the selected indicators.

Clustering of Index Data and Rough Set Analysis and
Screening Principle. Firstly, the evaluation indexes are clustered by criteria with the sum of squares deviations to ensure the response information of the selected indexes is not repeated. Secondly, the K-W test is used to test whether the classification number of indexes is reasonable, so as to avoid the subjective and arbitrary determination of classification number. Thirdly, rough set analysis is used to solve the approximate classification quality coefficients of similar indexes in R clustering, and one of the indexes with the smallest correlation degree is selected. So the selected indicators having the greatest impact on the architectural heritage evaluation are ensured. A rough set model based on clustering analysis is established by combining the clustering analysis method with rough set theory which can delete redundant data and retain necessary data, play a role in data preprocessing, and make attributes and attribute values. The reduced rules are more concise.
The index screening principle based on clustering rough set analysis is shown in Figure 1.

Construction Method of Evaluation Index
System of Architectural Heritage

The Establishment of Index System for the Audition.
Focusing on the connotation of architectural heritage and the high-frequency index [17] of the classical views of authoritative institutions at home and abroad, 44 indicators were selected by combing the literature [18,19], as shown in the second and fifth columns of Table 1. According to the observability principle, the indexes that cannot be obtained are deleted to ensure that the preliminary screened indexes can be quantified, and the indexes that cannot be deleted are shown in Table 1. (2) Standardization of Negative Indicators. The index of negative index is smaller and the better index. Set the following: P ij is the standardized value of i index in j year; Vijis the standardized value ofiindex number injyear; N is the year number. According to the normalized formula of negative indicators,

Establishment of Index
The meaning of formula (1) is as follows: the closer the index value is to the minimum, the greater the value after standardization.
The letters in (1) have the same meaning with the letters in (2). The meaning of formula (2) is as follows: the closer the index value is to the maximum, the greater the value after standardization.
(4) Standardization of Moderate Indicators. The purpose of data standardization is to eliminate the differences between features, which can make different features have the same scale and make different features have the same influence on parameters.
The medium index refers to the index which is closer to a specified value, and the standardized formula of the medium index can be expressed as Among them, Þ; the ideal value for the index j is V j0 ; the meaning of the other symbols is the same as those in (1) and (2). The meaning of formula (3) is as follows: the closer the index value is from an ideal value, the greater the value after standardization.

Clustering of Similar Indexes
(1) Purpose of Clustering. The following is the cluster analysis: classify individuals (samples) or objects (variables) according to similarity (distance), so that elements in the same class are more similar than those in other classes. The indexes in each criterion layer are classified by R clustering, so that different classes represent different aspects of the criterion layer. It not only guarantees that the information reflected from different screened indexes is not duplicated but also ensures that the screened index system can cover all aspects of the criterion layer.
(2) The Basic Model of R Clustering. R is used to cluster the evaluation index. The following is the setting: theNevaluation indexes divided intolclasses; the deviation square sum ofS i -classiindexes (i = 1, 2, ⋯, l). The number of classi indicators;X i ðjÞ -the standardized sample value vector of thejevaluation index in classiindicators (j = 1, 2, ⋯, n i ) and the sample mean vector of classiindex; then the difference square of class i and S i is It is intraclass variance.
The total deviations squared and S of k classes are It is the sample variance of all categories. The specific steps of the deviation square sum clustering method are as follows: (1) regard n evaluation indicators as l class. (2) Combine any two of the n evaluation indicators into one group; the others remain unchanged, so there are nðn − 1Þ/2 combination schemes. According to Equation (4), the sum of total deviations squares of each merging scheme is calculated, and a new classification is made according to the merging scheme with the smallest sum of total deviations squares. Repeat steps until the final classification number is l.
(3) Determination of Clustering Number. Cluster number L is generally given. The K-W test is used to test whether there is significant difference in the numerical characteristics of the same index after clustering to determine whether the cluster number l is reasonable.

Canonical view of high frequency indicators
Literature review and investigation

Selection of evaluation index of architectural heritage
The index in each criterion layer is clustered by using the sum of squares of deviation The attribute value of index is obtained by rough set decision and the minimum index of approximate classification quality coefficient is screened   Wireless Communications and Mobile Computing In the K-W test, if the significance level of each category is greater than 0.05 (that is, there is no significant difference between the same category of indicators), the number of clusters is reasonable; otherwise, the number of clusters is unreasonable and needs to recluster.

Rough Set Analysis for Index Selection.
Rough set attribute reduction, in a simple way, is to delete irrelevant or unimportant knowledge without affecting the original knowledge system classification when the classification ability of knowledge base remains unchanged, thus simplifying the original system.
Rough set analysis uses approximate classification quality coefficient to evaluate the impact of an index after deletion on the evaluation results. It can effectively analyze and deal with inaccurate, inconsistent, incomplete, and other incomplete information, and its main research is attribute reduction and rule extraction. If the approximate classification quality coefficient of an index after deletion is 1, then the index can be deleted. This is the principle of rough set deletion to evaluate the factors that influence the index less than.
The approximate quality coefficient of classification obtained by rough set analysis reflects the importance of the index in architectural heritage evaluation. The smaller the approximate classification quality coefficient is, the more important the index is in the architectural heritage evaluation and the greater the impact on the evaluation results; otherwise, the larger the index is, it should be eliminated.
(1) Screening the Purpose of Approximate Classification Quality Coefficient Minimum Index. Rough set is used to select the index with the smallest approximate classification quality coefficient, which ensures that the selected index has the most significant impact on architectural heritage evaluation.
(2) The Basic Steps of Rough Set Analysis. The following are the basic steps of rough set analysis: (1) Raw dimensionless multidimensional dataset (2) Calculating the approximate estimates of rough sets Let X be the result of classification for all evaluation objects applied to all evaluation indexes and RðV i Þ be the new result of classification for evaluation objects excluding index V i . X and RðV i Þ can be obtained by clustering the same index as before; for the number of elements contained in the set j⋅j, β is the critical value of error, and R β () is the result of two classification invariables and can reflect the new classification. The set of objects whose error threshold value is above β is The practical implication of Equation (6) is that the classification results in the two classifications are invariable and can reflect the union of information above the critical value beta of the new classification error. InjX ∩ RðV i Þj/jRðV i Þj, the bigger the index, the smaller the impact on the evaluation results.
Inβsetting, in the range of values, when the two classification results are exactly the same, jX ∩ RðV i Þj/jRðV i Þj = jR ðV i Þj/jRðV i Þj = 1. When the two classification results are inconsistent, According to experience, the threshold error beta is 0.9; that is, the 90% of the original classification results can be reflected by the new classification results, and more importantly, the information of the original classification can be better reflected by the new classification.
(3) Approximate Classification Quality Coefficients Are Calculated. Assuming R β () is the number of objects in upper (7), jUjrepresents the total number of objects. The approximate classification quality coefficient is calculated.
The implication of Equation (7) is that the result of classification in the two classifications is invariable and can reflect the ratio of the number of objects above the letter β in the new classification to the number of all the objects evaluated in the original classification. If the index V i is deleted, the number of objects in the two classifications is the same as that of the original evaluation object; that is, the approximate classification coefficient gamma γ R is 1, indicating that the index V i has no effect on classification, and the index V i can be deleted.
Rough set theory is used to delete the classification index of the same criterion layer, which ensures that the selected index system has a significant impact on the evaluation results.

Determination of Rationality of Index System
(1) Criteria for Determining Rationality of Index System. If the number of final indicators is less than 30% of the selected indicators and the original information is more than 95% [10], the index system is considered reasonable.
(2) Calculation of Information Content in Index System. According to the principle that the variance of the index data reflects the information content of the index system, the information content of the index system is defined as the sum of the variance of the selected index system. Assume S is the covariance matrix of index data, trS is the trace of covariance matrix, s is the number of screened indicators, and h is the number of selected indicators. The information contribution rate of the selected index to the audition index In is The meaning of equation (8) is as follows: the ratio of tr S s /trS h indicates the information of the selected s indexes representing the selected h index information, trS s is the sum variances of the selected s indexes, and trS h is the sum variances of the selected h indexes.

Application of Architectural Heritage
Evaluation Index Screening Model 4.1. Sample Selection and Data Sources. In this paper, Shenyang is selected as an empirical sample to build an evaluation index system of architectural heritage. The index data are derived from the Shenyang historical and cultural city application resource, as shown in columns 5-7 of Table 2.

Standardization of Index Data for Audition
(1) Standardization of negative indicators: substitute the data of negative indicators in columns 5-7 of Table 2 into Formula (1), and get the standardized values of each negative indicator, which are listed in the corresponding rows in columns 8-10 of Table 2.
(2) Normalization of forward indicators: the positive indicators in Table 2 are substituted in Formula (2), and the standardized values of each positive indicator are listed in the corresponding rows in columns 8-10 of Table 2.
(3) Standardization of moderate indicators: the forward indicators in Table 2 are substituted in Formula (3), and the standardized values of the forward indicators are listed in the corresponding rows in columns 8-10 of Table 2.

Hierarchical Clustering of Similar Indexes
(1) R clustering: the data of column 8-10 of the three criterion layers in Table 2 are substituted in Equations (3) and (4), respectively. The three criterion layers are clustered into 8 categories, 6 categories, and 8 categories by SPSS software. The results are listed in column 3 of Table 3, and the corresponding clustering categories are listed in column 4 of Table 3.
(2) K-W test: the data of column 8-10 in Table 2 are input into SPSS software, and the K-W test value is obtained by a nonparametric K-W test, which is listed in column 5 of Table 3. As the K-W test values in column 5 of Table 3 are significantly greater than the critical value 0.05, the classification number of each criterion layer is reasonable.

Approximate Classification Quality Coefficient for Index.
A guiding role is played by the index of architectural heritage in the development of architectural heritage. According to the principle of "thick the present, thin the past" and attaching importance to recent data, rough set is used to select indicators in each criterion layer. As mentioned earlier, the precision β is 0.9. Taking X1 historical value criterion layer as an example, the unobservable indexes are divided into 8 categories after R clustering. Rough set is used to reduce each category of indexes. According to R cluster analysis, the historical value criterion layer of X1 can be divided into 8 categories. Taking the first category as an example, there are 4 indexes. According to Equation (7), the approximate classification quality coefficient γ 11 = 0:368 can be calculated. It shows that deleting X 1,1 will have a significant impact on the evaluation object, so X 1,1 should be retained. And gamma γ 13 = 0:931 indicates that deleting X 1,3 will not have a significant impact on the evaluation object, so X 1,3 should be deleted. According to the above method, X 1,9 and X 1,12 can be deleted. Similarly, each category index of each criterion layer can be analyzed, the index can be deleted by using rough set, and the index can be deleted by using "Rough Set Analysis Deleting" in column 7 of Table 1.

Determination of Rationality of Architectural Heritage
Evaluation Index System. The variance of each index is calculated according to the original data of columns 5-7 in Table 2. The sum of the variance of the selected indexes tr S s and the variance of the selected indexes trS h is substituted by Formula (8). The information contribution rate of the selected indexes to the selected indexes is In = trS s /trS h = 3:18 × 10 8 /3:41 × 10 8 = 93:17%. That it is to say, the index of 40.9% (18/44 = 40:9%) is selected, reflecting the original information of 93.17% of the index system.

Index Screening Results.
Through R cluster-rough set analysis, an architectural heritage evaluation index system including 18 indicators of historical value, artistic value, and scientific value is constructed, as shown in Table 4.

Main Conclusions.
Through R clustering and rough set analysis, the heritage evaluation index system is established, 6 Wireless Communications and Mobile Computing   [1,2], better evaluation results are obtained in this paper with fewer dimension data. It is sufficient to illustrate the validity of the proposed method.

Main Innovations and Characteristics.
Firstly, the evaluation indexes are clustered by the method of deviation squares to ensure that the response information of different indexes is not repeated after screening. Secondly, rough set analysis is used to solve the approximate classification quality coefficient γ ij of each index in R clustering and to select an index with the smallest approximate classification quality coefficient. Samples in the same dataset should be similar to each other, and samples belonging to different groups should be sufficiently dissimilar cannot guaranteed in the conventional methods. Compared with other index selection methods, rough set does not need prior information, which ensures that the selected indexes are not affected by prior information, and has the greatest impact on architectural heritage evaluation. Third, the final index system is established, which reflecting the original information of 93.17% with 40.9% indicators.
When extracting dimension reduction features from big data using the method in this paper, it is necessary to solve the inverse matrix of high-dimensional matrix. When the matrix dimension reaches a certain order, it will increase the difficulty of solving and consume time, affecting the timeliness of solving. The future research direction is how to solve high-dimensional data matrix by iterative method to reduce the difficulty of solving.

Discussion
Big data dimension reduction is widely applied in various fields in today's world, especially in the field of big data information extraction and block chain. Since the blockchain needs to carry all the information generated before replication, the amount of information in the next block should be larger than the amount of information in the previous block, so that the block writing information will increase infinitely, which brings information storage, verification, and capacity problems to be solved. How to transfer blockchain information with less data will be the future challenge of blockchain big data processing. Big data dimension reduction will be one of the effective means to solve this problem. The future development direction of dimension reduction of big data is how to carry out data quantization encryption and ensure the security of transmission on the basis of effectively compressing the dimension of original data.

Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.