Study on Semantic Contrast Evaluation Based on Vector and Raster Data Patch Generalization

and Applied Analysis 3


Introduction
With the development of computer technology and the constant deepening of the GIS application technology, the traditional paper maps are gradually being replaced by digital maps [1,2], while as a frontier and hot issue in cartography, cartographic generalization has also faced unprecedented changes [3].The cartographic generalization refers to the process where, under the premise of maintaining the structures and characteristics of the spatial entities, the extraction and processing are conducted for the map data of the cartographic regions through appropriate selection, generalization, and other operations according to the factors such as scale and use of the map as well as geographical characteristics of cartographic regions, so as to finally achieve the purpose of passing more and more important spatial information on to the limited representation media [4][5][6][7].
Due to factors such as the various changes of geographical space itself and the relative uncertainty of generalization results, the cartographic generalization has always been a difficult international problem in the field of geographical science.Up to the early 1920s, many map scholars at home and abroad have been engaged in the theoretical and practical research of cartographic generalization and have achieved fruitful results since M. Echert first put forward the term "cartographic generalization" [4,8].The structures of the vector data are intuitive and simple, with each specific target element being directly endowed with spatial position and attribute information, and the vector data having natural advantages in the aspect of calculating the quantitative and qualitative indicators of elements such as distance, area, and topological relations.Therefore, the research emphases of most scholars were mainly focused on the vector data.The "decomposition type" combination method based on Delaunay triangulation and skeleton line structure was proposed for maintaining the area balance of the patch; the automatic generalization method based on genetic algorithm was proposed for the overall optimal configuration of the punctate element annotation; and the Douglas-Peucker algorithm, declination algorithm, and rounding algorithm were proposed for the linear distribution elements [7,9].The raster data divide the space into regular grids, with the cell value of each grid having attribute information, and the row-column number being the position information.From the perspective of data structure, it is easier for the raster data to conduct the simple and fast neighborhood analysis and the construction of the mathematical model.In the early 1980s, Monmonier had applied mathematical morphology to the research on the cartographic generalization of the planar elements and proposed that the raster data structure was more suitable for research on the cartographic generalization of land use [10]; subsequently, Su et al. discussed the processing methods of the raster data such as feature simplification, integration, and displacement using mathematical morphology [11,12]; based on the mixed data of the raster vector, Huilian et al. proposed the GABP neural network model and achieved the simplification of the buildings [13]; at the beginning of this century, Li et al. of British Kingston University first applied cellular automata to the cartographic generalization, bringing progress and breakthrough to the cartographic generalization based on the raster mode [14].
On the basis of the research results by the abovementioned scholars, according to the different characteristics of the vector data and raster data, we used the corresponding generalization methods, conducted generalization for the land use patches of the same cartographic region, and carried out the semantic contrast evaluation for the generalization results.Among them, for the vector data, we comprehensively considered the auxiliary spatial topological relations of the land type in order to achieve the amalgamation of the adjacent patches and established the polygon through the buffer intersection nodes to achieve the aggregation of the adjacent patches; for the raster data, we used the closing operation in the mathematical morphology to achieve the aggregation of the patches and added the semantic conception of the cellular automata operation of the mode filtering operation rules to achieve the generalization.

Vector Data Patch Generalization Algorithm Process
2.1.Definition of Topological Relation.Since the spatial distribution of the land use data is characterized by full coverage, no overlap and no gap, the generalization of the land use data inevitably cannot dispense with the aggregation and amalgamation operations of the patch polygon.The multilevel semantic features make the decision-making of the comprehensive guidelines more complex.In this research, we gave comprehensive consideration to the cartographic generalization of the auxiliary topological adjacent patches of the patch category.
In the land use data, we selected the set LandSet{  } of the polygons whose patch area   is less than the smallest epigraph area  Area and set   as the patch boundary. distance is the minimum distance between the elements on the map, (  ,   ) is the distance between patch   and patch   , and Area(  ) is the area of the element   .The topological relation of   and   within the set LandSet{  } is defined as follows: In the generalization process, the merging processing is only conducted directly for the similar patches which satisfy the condition A, while further category analysis should to be conducted for other conditions, before comprehensive treatment can be continued [15].

Aggregation Processing.
A merging operation needs to be conducted for the parted patches on the space, so as to prevent the similar small patches in close proximity to each other from being deleted in the generalization process to result in too large of an area change of the generalization result.This operation process is known as aggregation.The specific process is as follows: as shown in Figure 1,  1 and  2 are the patches of the same land type.First, as shown in Figure 1, draw the buffers of  1 and  2 with the minimum distance between elements ( distance ) as the radius, which are, respectively, Buffer( 1 ) and Buffer( 2 ); obtain the intersection of the two buffers Buffer( 1 ) ∩ Buffer( 2 ) (the grid section in Figure 1(b)); then, extract the nodes of the polygons  1 and  2 in the intersection of the buffers (Buffer( 1 ) ∩ Buffer( 2 )) and establish the polygon  new according to the nodes (the red area as shown in Figure 1(c)); finally, conduct merging processing for  1 ,  2 , and  new and delete the overlap area generated in the merging result [16].The aggregation result is shown in Figure 1(d).

Amalgamation Processing.
Aiming at the nonsimilar patch ( 0 ) of topological adjacency whose patch area is less than the smallest epigraph area, the method of extracting the Delaunay triangulation skeleton lines is used to conduct the subdivision processing for  0 .The secondary patch  0 is decomposed according to the skeleton lines; after decomposition, the small patches are, respectively, merged into the adjacent main patches, and their original land use type names are changed [17][18][19].The specific process is shown in Figure 2.  1 ,  2 ,  3 , and  4 are the patches of different land types, and the area of  1 is less than the smallest epigraph area.The Delaunay triangulation is made for  1 , the subdivision is conducted for  1 according to the skeleton lines (as shown in Figure 2(b)), and the generalization result is shown in Figure 2(c).

Theoretical Basis and Algorithm Process of
Raster Data Patch Generalization

Mathematical Morphology.
Mathematical morphology is a discipline with complete mathematical foundations, established on the basis of set theory; its main idea being to use a structural element with certain size to detect the geometrical shapes in the images.The most basic mathematical morphology operators include erosion, dilation, opening operation, and closing operation.In the cartographic generalization, if there are tiny connections between two patches and the structural element is large enough, the erosion operation can be used to separate them; if there are small gaps between two patches and the spacing is less than the structural element, the dilation operation can be used to achieve the connection.Their operational formulas are as follows: In the formula,  is the set of the pixel points to be processed,  is the structural elements used for detecting the geometrical shapes of the images, Θ represents that  is eroded by the structural element , and  ⊕  represents that the set  is dilated by the structural element .The detailed process is shown in Figure 3. Figure 3(a) shows that there are only the binary images of 0 and 1 in the set elements to be processed, and Figure 3(b) represents the structural elements of the four neighborhood pixels of the focus.The translation is conducted successively for the structural element  along the row-column of the binary image .When the center pixel of the structural element  overlaps with the pixels whose pixel value is 0 in the binary image , its four neighborhoods will be eroded, the pixel value will become 0, and the eroded pixels are represented by "−" (as shown in Figure 3(c)); on the contrary, when the center pixel of the structural element  overlaps the pixels whose pixel value is 1 in the binary image , its four neighborhoods will be inflated, the pixel value will become 1, and the inflated pixels are represented by "+" (as shown in Figure 3(d)).In the cartographic generalization, for the patches whose rasterized distance is close but parted, the single dilation erosion operation cannot achieve the effect of patch aggregation.Generally, several dilation operations need to be conducted first, before the erosion operations of corresponding times can be conducted, and this process is known as the closing operation in mathematical morphology, while the operation process of the opening operation is the exact opposite to that of the closing operation.Their formulas are as follows: Closing operation:  ⋅  = (( ⊕ ) Θ) , Opening operation: I = ((Θ) ⊕ ) . ( In practical application, the number of times of erosion and dilation needs to be determined according to the distance of the aggregation land type.Taking the ratio operation as an example, if the distance of the two patches is , it needs to first conduct the dilation operation for (/(2 * cellsize)) times and then conduct the erosion operation of the corresponding times in order to achieve the aggregation operation.

Cellular Automata.
Although the algorithm of the mathematical morphology has the unique advantages of the natural parallel implementation structures, the feature generalization of a single attribute can only be conducted in this algorithm.Obviously it is difficult to cope with the multisemantic land use data generalization by using mathematical morphology alone.Therefore, it needs to introduce the cellular automata with "mode filtering" as the rule of transformation to achieve the patch amalgamation.The cellular automata is a kind of grid dynamics model in which the cellular units with discrete and limited state interact in the local space and evolve in the discrete time dimension [20].It includes four basic elements, namely, cellular, state, neighborhood, and transformation rule.The raster pixel in the research is the cellular, the attribute category of the pixel is the cellular state, all the cellular whose distance to a cellular is within the scope of the determined radius  are the neighborhood of the cellular, and the function used for controlling the change of the cellular state (mode filtering) is the transformation rule.In the generalization process, according to the characteristics of the mode filtering algorithm, the 5 × 5 neighborhood is used as model to traverse the full figure, with all the raster values within the scope of the center cellular neighborhood being extracted, and the raster values which appear the most number of times are maximum in the neighborhood and are used as the center raster values of the next moment.The land use data generalization based on cellular automata is a gradual process.Therefore, many iterative operations need to be conducted so as to achieve the ideal state of generalization.

Semantic Evaluation Theory
In the map generalization process, the operations such as merging, exaggeration, and deletion can not only change the geometrical and topological relations between the elements, but also change the semantic relation fundamentally.In the research, we did not excessively consider the complex semantic relations between the land use data.In order to contrast the generalization results of the vector and raster data, we selected the two evaluation elements (namely, semantic consistency and semantic completeness) from the semantic perspective and conducted the comparative analysis and evaluation for the two generalization results, respectively, from the levels of land type and maps.and  2 are used to, respectively, represent the total area of the land type   before and after generalization, and then the calculation formula of the semantic consistency   of the land type   can be expressed as follows: Further expansion is conducted for Formula (3), and the semantic consistency of the whole map on the map level can be expressed as follows: where Δ represents the sum of the absolute value of the area change of each land type before and after the generalization, and  0 represents the sum of the total area of various patches before the generalization.Whether it is the land type level or the map level, the range of value of the semantic consistency is [0, 1].The higher the value is, the better the semantic consistency is, and the better the generalization effect is.

Semantic Completeness.
The semantic redundancy and degree of omission are known as semantic completeness.
Taking land type   as an example, according to the deletion of the target before and after the generalization, the semantic completeness in the level of land type is defined as follows: where Δ  is the deletion number of the land type   after the generalization and   is the number of belonging to the land type   before the generalization.Further expansion is conducted for Formula (5), and the semantic completeness of the map level can be expressed as follows: In the formula, Δ is the total number of the deletion targets of each land type after generalization and  0 is the total number of the targets of each land type before generalization.The value range of the semantic completeness is [0, 1].The closer to 1 the value is, the higher the semantic completeness is; on the contrary, category deletion is serious.

Case Study and Analysis of Results
In the research, we selected the patch data with a scale of According to the planning requirements of the land use, the data scale is integrated from 1 : 10000 to 1 : 50000 in the generalization process.Before the generalization, the smallest area on the map of the original vector data is 400 m 2 .According to the cartographic generalization requirements of the 1 : 50000 scale land use data of the overall planning of the land use in the period from 2006 to 2020, we selected 10,000 m 2 as the smallest epigraph area and 30 m as the maximum distance of the patch aggregation.The iteration was conducted for the raster data with the resolution of 5 × 5, and the images tended to stabilize after the iteration was conducted eighty times.
The secondary component development of ArcGIS Engine under the VS 2010 programming environment was used to achieve the generation of the patch skeleton lines in the vector data, and the patch generalization of the vector data was completed in combination with ArcGIS; the scientific computing package (numpy) of the Python script program was used to achieve the integrated mode filtering program of combining mathematical morphology and cellular automata,  so as to complete the patch generalization of the raster data.The generalization results are shown in Figure 5.
The statistical results of the quantity and area of the patches before and after the generalization are shown in Table 1.Compared with the total patch number, the result of the raster data generalization was better, and the patch number was reduced from the original 1,805 to 256, while, comparing with the total area change of the patch, the effect of the vector data generalization was better, and the total areas before and after the generalization were completely consistent, mainly because the types of land use are considered in the processes of the patch amalgamation and aggregation of the vector data, making the area of each land type maintain balance to the maximum limit before and after the generalization.Compared with the area change of each type of land use, in the result of the vector data generalization, the type of land use with maximum area reduction is the traffic and water conservancy land, and, compared with the original data, the area is totally reduced by 42.35 hm 2 , mainly because such kinds of land contain long and narrow regions whose width is less than the epigraph distance such as highway and railway, and the subdivision processing is conducted for these long and narrow regions to merge them into other features; in the result of the raster data generalization, the area reduction of the natural reserve is maximum, up to 30.39 hm 2 , mainly because the shapes of the land type patches are relatively complex, while the mode filtering algorithm merely has the action of boundary smoothing, and finally the smoothing processing is conducted for the projecting parts of the patch shape, reducing its area.
As can be seen from Table 1, on the level of the whole map, compared with the original data, the generalization results of the vector and raster data basically adhere to similar semantic consistency, while because the vector data generalization is relatively more balanced on the holding, its semantic consistency is higher, up to the value of 1.Compared with semantic completeness, the semantic completeness of the generalization results of the vector and raster data are not high, and especially of note is the fact that the semantic completeness of the raster data is a mere 0.142, mainly because the combination of the closing operation of the mathematical morphology and mode filtering algorithm changes the pixel values and achieves patch aggregation, significantly reducing the patch number, with a serious deletion of comprehensive categories.
Combining Table 1 and Figure 6, as can be seen from Figure 6, in the generalization results of the vector and raster data, except for the waters, the semantic consistency of the generalization results of other land types is above 0.6, mainly because there is only one water patch whose area is small but meets the epigraph requirements in the research area; there are more secondary features around it, and many small patches are absorbed in the generalization process, resulting in too large area change rates of the water patch and reducing its semantic consistency.As can be significantly seen from Figure 6(b), because there is only one water patch before and after generalization, the semantic completeness of the two generalization results reach the maximum value 1. Comparing the two generalization results, for the land types with the exception of water, the semantic completeness value of the vector data is higher than that of the raster data, fundamentally explaining the reason why the land type deletion of the raster data generalization result is relatively serious with respect to the vector data on the whole map scale, namely, the semantic completeness is lower.

Conclusions and Discussions
In the research, we used the buffer overlap, Delaunay triangle subdivision, mathematical morphology, cellular automata, and other theoretical methods to achieve the patch generalization of the vector data and raster data and compared the advantages and disadvantages of different spatial data in the patch generalization process, so as to provide the basis for the selection of generalization data for the land use patch.
Analyzing from the perspective of the data structure, the object oriented vector data is more suitable for quantitative and qualitative researches, while the highly structured raster data have more advantages in aspects such as rapid modeling and extracting neighborhood.The research shows that, after the generalization of the raster data, the degree of reduction for the total number of the patches is greater, and the

Figure 4 :
Figure 4: Location of study area.

Figure 5 :
Figure 5: Contrast between result of generalization and original data of land use.
and   are topological adjacency; B if   ∩   = 0 and (  ,   ) ≤  distance , and   and   are topological adjacency.

Table 1 :
Patch number, area change and semantic computation before and after generalization.