A Suite of Tools for Assessing Thematic Map Accuracy

1 Centro de Investigaciones en Geograf́ıa Ambiental, Universidad Nacional Autónoma de México, Antigua Carretera a Pátzcuaro 8701, Colonia Ex-Hacienda de San José de La Huerta, 58190 Morelia, MICH, Mexico 2Dirección de la División de Ingenieŕıas, Universidad de Guanajuato, Avenida Juárez 77, Zona Centro, 36000 Guanajuato, GTO, Mexico 3 Centro de Investigaciones en Ecosistemas, Universidad Nacional Autónoma de México, Antigua Carretera a Pátzcuaro 8701, Colonia Ex-Hacienda de San José de La Huerta, 58190 Morelia, MICH, Mexico


Introduction
Thematic maps such as land use/cover maps are widely used to support management and environmental policies and therefore they should be supported by a statistically rigorous, credible accuracy assessment [1,2].Thematic accuracy is a measure of correctness that can be defined as the degree to which the attributes of a map agree with "truth" reference datasets.Accuracy assessment is typically based on a sample of reference sites to which the "true" land use/cover category is compared to the one in the map.
A variety of sampling designs can be used to select these references sites (sample units).The objectives, the desirable criteria, and the resources of the assessment have to be taken into account to choose the sampling design.First of all, the sampling design should be a probability sampling design, which means that the sample unit is selected randomly; the inclusion probability for each sample unit into the sample is known and must be greater than zero for all the units in the area under assessment.Probability sampling enables statistical inference allowing the computing of accuracy estimates along with their confidence intervals.Convenient procedures such as selecting training data used during supervised classification or by limiting the random sampling of reference sites to accessible sites or area covered by available high resolution images do not fulfill these requirements and cannot be considered as probability sampling procedures [3].
The most commonly used probability sampling designs are simple random sampling, systematic sampling, and stratified random sampling.In the simple random sampling design each sampling location is equally likely to be selected; that is, all the locations have the same inclusion probability (equal 2 Geography Journal probability sampling).The advantages of this design are its simplicity: the equations used to calculate the standard errors are less complex than in other designs and the sample size can be augmented or reduced easily.However this design may not produce appropriate sample sizes for rare categories to provide estimates with acceptable confidence intervals.Simple systematic sampling is achieved by selecting units using a systematic pattern, such as a grid.Systematic sampling is easy to carry out, gives a good spatial coverage, and is generally more precise than random to assess overall accuracy [3].As in the simple random sampling design, rare categories will be rare in the sample because it is an equal probability sampling.Simple random and systematic sampling designs do not enable user to focus the sampling on a particular region or category.
When users are interested in obtaining more detailed information of a particular subregion or specific category, then a stratified sampling should be used.In stratified sampling, the area under assessment is divided into various subregions (strata) and each stratum is sampled independently.For example, a simple random sampling is applied in each stratum (stratified random sampling).The categories of the map under assessment are often used to stratify the sampling.In that case, the stratification may be used to guarantee a minimum sampling size in each stratum and obtain more precise estimates for rare categories.This approach may also enable users to adapt the stratum sample size to the precision requirements of each category according to the objectives of the study.Sample size may be augmented for category of interest which requires a precise accuracy estimate and reduced for less important categories, improving costeffectiveness.It is worth noting that in these cases the number of sampling units per category is not proportional to the category area (nonequal probability sampling design).Equations that needed to calculate the accuracy indices and their confidence intervals in stratified sampling are more complex than those used in simple random or systematic sampling.This is because estimates combining data across strata must weigh the unequal inclusion of probabilities that result from "forcibly" allocating sampling points in subrepresented areas that would seldom host validation plots when using a simple random or systematic sampling [4].
Cluster sampling is also a popular design that reduces the cost of collecting data by constraining unit samples to fall within a limited number of sites (clusters).However, it introduces a larger spatial correlation in the sample data, reducing the precision of the accuracy estimates [5].We do not take into account this sampling design in the present study.A detailed review of the basic sampling designs can be found in Stehman [3,6] and Stehman and Czaplewski [2].
Although the methods to carry out accuracy assessment are well established, few studies producing land use/cover or land use/cover change maps present sound and complete accuracy assessments [7] partly because it is not a straightforward procedure and because mainstream GIS or satellite image processing software programs only provide incomplete tools to carry out such assessment.For instance, GIS software often has built-in tools to carry out accuracy assessment, but these are limited to cases of equal probability samplings designs where the number of sample units by category is proportional to the category area, for example, the simple random sampling.Equations used to estimate accuracy indices depend on the sampling design, which in most cases is a stratified sampling [7][8][9].In case of providing data obtained by nonequal probability sampling, as stratified random sampling, estimates of accuracy provided by these tools are erroneous.Moreover, usually these tools do not provide information about the certainty of the estimates (confidence intervals) neither estimates of area adjusted to eliminate bias due to map classification errors.This paper presents a set of free tools that are readily available to the public and which enable users to compute accuracy indices and to estimate a corrected area of a given category and construct confidence interval (CIs) for quantifying the uncertainty of estimates.The aim is to aid users beyond the Geographic Information Science Community to adopt statistically sound accuracy assessment methods as part of a routine practice.With land use/cover map applications growing sharply to support environmental management, policy strategies, and even scientific hypotheses, we expect that these free and easy to use tools will boost accuracy assessments in both the academic and policy sectors.The tools are implemented in Dinamica EGO, Q-GIS, and .
In Section 2, we briefly describe the software programs we used and review the method behind the tools that produces a statistically rigorous report of accuracy of any given categorical map, including the estimation of correct areas within CIs.In Section 3 we apply the tool to assess the accuracy of a 2010 land use/cover map for central Mexico.
(http://www.r-project.org/) is an open source language and environment for data manipulation, statistical analysis, and graphic elaboration.A large number of packages (collections of  functions and compiled code) are available for download and installation from the CRAN package repository (http://cran.r-project.org/web/packages/).Q-GIS (http://www.qgis.org/) is a popular cross-platform open source desktop geographic information system (GIS) software program that provides both vector and raster data viewing, editing, and analysis capabilities.Plugins, written in Python, extend its capabilities.Q-GIS enables also users to run  scripts.

Accuracy Estimates.
In order to assess the accuracy of a map with  categories, a sample of reference sites (e.g., pixels) is selected by systematic, simple random, or stratified random sampling (using the map categories as strata).A confusion matrix, also referred to as the error matrix, is constructed by using the sample counts.The map categories ( = 1, 2, 3, . . .) and reference categories ( = 1, 2, 3, . . .) are represented by rows and columns, respectively, (Table 1).
For stratified sampling, the number of samples for each map category is not necessarily proportional to the area covered by each category.This lack of proportion should be taken into account when calculating accuracy indices.Prompted by the considerations suggested by Card [8], we adjusted the confusion matrix derived from a stratified sampling by weighing the number of sites using the area of each category on the map.
As an example, we can consider a confusion matrix with  columns and  lines as the matrix in Table 1, where each element of the array   is replaced by p which is an unbiased estimator of the proportion of area using where   is the proportion of area of category  in the map,   is the number of samples mapped as  and belonging to category  in the reference data, and  + is the number of samples mapped as  in the map.
In this new adjusted matrix, each cell element p represents the probability that a randomly selected area is classified under category  in the image and under category  in the reference data.As a consequence, the sum of the cells of each row p+ is equal to   , which is the proportion of category  in the map.Based on this matrix, the computing of the overall, user, and producer accuracy indices is carried out as described in (2), (3), and (4).
The overall accuracy Ô is the overall proportion of area correctly classified and calculated by adding the p values of the diagonal matrix as follows: where  is the number of categories.User accuracy Û and producer accuracy P are, respectively, calculated using ( 3) and (4).Producer accuracy, which is related to omission errors, shows that proportion of the reference sample of a particular category is correctly classified in the map.User accuracy, related to commission error, is the proportion of samples classified as a particular category in the map which are correctly classified: For stratified sampling, the CIs for the overall, producer, and user accuracy estimates are calculated as follows [8]: where HCI Ô is the half-width CI of the overall accuracy and  corresponds to the percentile of the normal distribution (for 95% confidence,  = 1.96): where HCI Û is the half-width CI of the user accuracy for category : where HCI P is the half-width CI of the producer accuracy for category .This method is also applicable to designs which include simple random and systematic sampling.When no stratification is applied to these sampling designs, the stratified estimator is referred to as a "poststratified" estimator to distinguish between using stratification in the sampling design (i.e., stratified sampling) and using stratification in the estimator (i.e., poststratified estimation).This improves the estimates of accuracy indices because simple random and systematic samplings do not guarantee that the proportion of the various categories among samples is exactly the same as the proportion for the map category areas.
CIs can also be estimated by bootstrap stratified resampling [17,18].In order to carry out bootstrapping, the tool enables the replication sample sets by resampling with replacement from the original sample set.It uses stratification to insure that each sample has the same proportion of each category as in the original sample.The computing of accuracy indices is performed on each replicated sample.Then CIs are estimated using the bootstrap percentile interval method, which uses the empirical quantiles of the bootstrap replicates.

Area Estimates.
Due to asymmetrical classification errors, the area of a particular category directly obtained from the map (e.g., by pixels counting) is likely to be biased [4].For example, the area of a category systematically affected by an omission error will be underestimated whereas the area of a category mainly affected by a commission error will be overestimated in the map.Therefore, areas obtained from the map should be adjusted to eliminate bias due to map classification errors and these error-adjusted area estimates have to be accompanied by confidence intervals that reflect their uncertainty.This method enables users to estimate error-adjusted areas using information from the accuracy assessment sample to correct the bias in area estimates [7,8,19].In the error-adjusted matrix, the sum of the elements of column  + is an unbiased estimator of the proportion of the area of category .Therefore, the area of category , Â is calculated by where  tot is the total area.Equation ( 9) gives the estimated half-width confidence interval for the estimated area proportion  + : 2.4.How Do the Tools Work?Dinamica EGO models are designed as workflows that execute sequences of geoprocessing operations and are constructed by dragging and connecting data "functors" (data operators) in a model diagram displayed in the graphic interface.Models can be saved as submodels and stored as new functors in the functor library, thus helping users to better organize and share models [11,16].For this, new library called "Accuracy Assessment" composed of five submodels was created.It enables a user to carry out various operations related to accuracy assessments using maps in raster format.These operations include the construction of the confusion matrix, the bias-adjustment of this matrix using Card's method [8], computing estimates of accuracy indices, and the error-corrected area estimates along with their CIs (Table 2).
The confidence intervals are estimated by using estimations described in the previous section or by bootstrapping.The library, along with the application data and submodels, which integrate the tool, is available for downloading at http://www.ciga.unam.mx/ciga/images/proyectos/vigentes/modelos/images/AccAssess.zip.A brief and concise user's manual based in this paper is also available.The Q-GIS plugin "AccurAssess" enables the user to carry out the accuracy assessment using vector or raster inputs map.The tool computes the bias adjusted of this matrix, the estimates of accuracy indices, and the errorcorrected area estimates along with their CIs.The  package "MapAccurAssess" has to be fed with the raw matrix and with a two-column text table which give for each reference site the mapped and the true categories.This kind of table is easily obtained through map overlay within a GIS.It enables user to calculate bias-adjusted matrix, the estimates of accuracy indices, and the error-corrected area estimates along with their CIs.

Tool Application for a Case Study Area in Central-West Mexico
We applied the tool on a 2010 land use/cover map for the Ayuquila basin (411,500 ha) in central-west Mexico (Figure 1).The map was produced by visual interpretation (monoscopic) of SPOT5 images projected over a computer screen at 1 : 40,000.Six SPOT5 scenes were used with a processing level of 2A, which were acquired on November 11, December 12, and December 17, 2010.All six scenes were mosaicked followed by a fusion between the panchromatic (2.5 m) and color (10 m) bands for spatial enhancement.A first order polynomial transformation was conducted for rectifying the images, using nearest neighbor resampling and a threshold value of 2.5 m for residuals (i.e., less than image resolution).Finally, two band arrangements: 1, 2, 3 and 4, 1, 5 were used to "bring forward to the eye" different characteristics of land use/cover classes.Various image features were considered by the interpreter such as color, texture, shade, and tone.Classified polygons (i.e., vectors) were converted into a 100 m resolution raster map.A set of 110 reference plots (one hectare each) were distributed following a stratified design based on land use/cover categories.This was done in order to compensate for less representative categories such as bare land or riparian forests.A second independent interpreter classified all reference plots using the same imagery and approach described above but at scale 1 : 5,000.This was possible given the resolution of the SPOT5 imagery used (2.5 m).Finally, all reference plots were labeled according to the land use/cover category covering the plot.In cases where more than one category was present, the category covering the majority of the plot was selected.
The raw confusion matrix (Table 3) presents the number of reference samples.Rows correspond to land use/cover categories in the classified raster map (Figure 1) and columns correspond to land use/cover categories used for labeling reference plots when classified at 1 : 5,000, (i.e., "true" category).
As shown in Table 4, the number of samples by category is not proportional to the category's area in the map due to stratification.Categories such as 3, 5, 6, 8, 9, 11, 13, and 17 are better represented in the sample set than in the map.The extreme case is category 17 with 4.5% of the reference plots belonging to this category, which covers only 0.01% of the map.As a consequence this bias has to be corrected before computing accuracy indices.
Table 5 represents the bias adjusted matrix based on (1) using the Dinamica submodel Generate Confusion Matrix, the Q-GIS plugin, or the  package.
Accuracy indices along with their respective half confidence intervals were calculated using the submodel Calculate Accuracy Indices and CI, the Q-GIS plugin, or the  package (Table 6).Overall accuracy resulted in 0.915 ± 0.052, which represents a CI of (0.863-0.967).It is worth noting that if we compute the accuracy indices directly from the raw matrix, the estimates are different.For instance, the value of overall  accuracy is 0.89 and the values of producer accuracy are 0.83 and 0.80 for categories 5 and 15, respectively.These values are not valid because they are biased by the sampling design.However, it is worth noting that this approach is often an overoptimistic estimate of accuracy, since when there are no off diagonals in a given column or row, the category accuracy will be 1, and the CI will be zero, suggesting that there is no uncertainty about this estimate.In these cases, the half CI has to be considered with caution and may be better represented as not available.Other approaches such as Bayesian analysis could be considered allowing users to combine prior information in the error matrix analysis and improve the precision of accuracy indices [20].In many cases, due to the low number of sampling units per category, the estimates of accuracy present high uncertainty.For instance, the CI of user accuracy for categories 10 and 13 is (0.17-1.00).In these cases, sample size should be augmented to improve precision of the estimate.
Table 7 shows the area for each category derived directly from the map (pixel counting) along with the estimate of the area adjusted for the error and its confidence interval.Despite the high accuracy of the map, in some cases, the error-adjusted area is rather different from the area directly obtained from the map.For instance, the area for category 2, which presents commission errors, was overestimated by the map whereas the area for category 12 was underestimated by the map due to omission errors.

Discussion
According to Stehman [9], accuracy indices directly interpretable as probabilities of encountering certain types of misclassification errors should be preferred to measures not interpretable as such.Overall accuracy is the probability for a randomly selected location in the map to be correctly classified.User accuracy for category  is the conditional probability that an area classified as category  in the map is classified as category  in the reference data.Producer accuracy for category  is the conditional probability that an area classified as category  in the reference data is classified  as category  in the map.After applying the bias-adjustment proposed by Card [8], our tool provides accuracy indices that possess such probabilistic interpretation.The tools do not calculate the Kappa index because it does not fulfill this requirement due to the adjustment for hypothetical chance agreement [9].Moreover, this index has been strongly criticized [21].However, to avoid biased results it is important to avoid nonprobability sampling by convenient procedures including selecting training data used during supervised classification, limiting the random sampling of reference sites to accessible or homogeneous areas.These procedures will conduce to nonrepresentative samples and generally to optimistically biased estimates of accuracy.When preparing the sampling design, it is also crucial to clearly identify the sampling unit as well as the evaluation and labeling protocol used to assign a category to the sample unit [2].
Finally, accuracy assessments should be applied to transitions (or change categories) in land use/cover change analyses.In particular, a confidence interval should be provided in order to quantify the uncertainty of the land use/cover change area estimates [7].This is particularly relevant when reporting critical transitions such as deforestation processes.The set of reference plots can be selected using popular sampling such as stratified random, simple random, and systematic designs.
In studies aimed at estimating the area of land use/cover change, the estimation of accuracy is generally based on a stratified random sampling because the categories of interest (the change areas) present a much smaller area than the areas of permanence.Stratification should be based on transitions' categories instead of land use/cover categories.Finally, they have to be labeled accordingly as land use/cover transitions.The same procedure described above is then applied to assess accuracy and improve area estimates.For example, Olofsson et al. [7] assessed the accuracy of a deforestation map and found that the error-adjusted area estimate of deforestation was about two times larger than the mapped area due to a large error of omission of deforested areas.The uncertainty of the change area estimate, expressed through the CI, can be used to assess the uncertainty of estimates based on land change area as input, such as carbon release due to deforestation.

Conclusions
The confusion matrix provides users with information on the magnitude and patterns of the classification errors.However, in order to calculate accuracy indices and their associated uncertainty, the type of sampling design used to select the verification sites should be taken into account.The confusion Table 7: Estimates of category's area (ha) directly derived from the map (by simple pixel count) and by using error adjustment along with their confidence interval.As for matrix enables also users to carry out an adjustment of the area estimator and avoids the possible measurement bias associated with the area obtained directly from the map (e.g., pixel counting).Unfortunately, accuracy assessments often fail in correctly computing accuracy indices and providing the required information to use data.The tools presented in this paper enable users to carry out accuracy assessments of thematic categorical maps.As shown in the case study, the tools enable users to compute the overall, user, and producer accuracy estimates along with two types of confidence intervals and provide an error-adjusted area estimator.When reporting accuracy, it is recommended to report both user and producer accuracies as well as the full error matrix and sampling design [9].We believe that these tools will provide a wide range of users worldwide with user-friendly programs to carry out statistically rigorous accuracy assessments and complete reports, without ample expertise in GIS and statistics.Given that Dinamica EGO, Q-GIS, and  enables users to build their own tools, it is possible to improve and modify these tools or complement it with new elements.Thus we hope this study will provide users with tools useful to design and implement thematic map accuracy assessment following "good practice" recommendations and that these tools will evolve in order to follow the improvements in technology and the development of new methods in geographical sciences.

Figure 1 :
Figure 1: Land use/cover map obtained through the classification of SPOT5 imagery by visual interpretation at 1 : 40,000 (Ayuquila basin, central-west Mexico).

Table 1 :
Confusion matrix expressed in sample counts.

Table 2 :
Main characteristics of the five submodels for the accuracy assessment Dinamica tool.

Table 4 :
Sampling intensity among map categories.

Table 5 :
Adjusted confusion matrix (values were multiplied by a factor of 100 for clarity).

Table 5
, half CI of zero has to be considered with caution.