^{1}

^{1}

^{2}

^{3}

^{1}

^{2}

^{3}

Updating categorical soil maps is necessary for providing current, higher-quality soil data to agricultural and environmental management but may not require a costly thorough field survey because latest legacy maps may only need limited corrections. This study suggests a Markov chain random field (MCRF) sequential cosimulation (Co-MCSS) method for updating categorical soil maps using limited survey data provided that qualified legacy maps are available. A case study using synthetic data demonstrates that Co-MCSS can appreciably improve simulation accuracy of soil types with both contributions from a legacy map and limited sample data. The method indicates the following characteristics: (1) if a soil type indicates no change in an update survey or it has been reclassified into another type that similarly evinces no change, it will be simply reproduced in the updated map; (2) if a soil type has changes in some places, it will be simulated with uncertainty quantified by occurrence probability maps; (3) if a soil type has no change in an area but evinces changes in other distant areas, it still can be captured in the area with unobvious uncertainty. We concluded that Co-MCSS might be a practical method for updating categorical soil maps with limited survey data.

Soil is an important natural resource and is also an essential component of ecosystems. The spatial distribution of different soils represents a special kind of natural landscapes (called soilscape). Soils are traditionally classified into a number of types and delineated as categorical maps based on multiple attributes observed at sample profiles, tacit knowledge of experienced surveyors, remotely sensed landscape features, and a specific classification system. Categorical soil maps are widely used in ecological and agricultural studies and provide crucial information for natural resource and environmental management. Because existing soil maps may be of low quality or too outdated to reflect current soil distributions, map update is necessary for providing current, more accurate, or more detailed information to meet the requirements of applications. For example, most soil series maps in United States (e.g., the USDA Soil Survey Geographic Database) were made on the basis of field surveys carried out in the 1950s, and they may not have been effectively updated to reflect recent soil changes. However, large-scale detailed soil survey is too costly to be carried out frequently for generating new high-quality maps. If an existing soil map is of sufficient quality and appropriately scaled, updating may not require a new full-coverage soil survey for a revised soil map because the types of soils at most places in the legacy map may not have changed. Consequently, we may be able to update a legacy soil map with only limited new survey data on soil distribution. When qualified legacy soil maps are available, we may only need to address areas where the previously determined soil types have a large possibility of type change due to some reasons (e.g., internal or environmental changes, incorrect mapping, or taxonomy change), identified by careful map examination with ancillary information. Changes can be found through a limited soil update survey or simply map examination by experts. Other reasons of using legacy soil maps and survey data together to create current categorical soil maps include that:

A variety of quantitative modeling methods have been used or developed to predict spatially explicit soil categorical characteristics. These methods may have their own merits in different contexts. One group of methods is soil-landscape models, which use environmental soil-forming factors to predict soil patterns over unvisited areas. These methods include multinomial logistic regressions (MLRS), classification and regression tree analysis, and fuzzy methods; see applications in predictive categorical soil mapping [

Recently, Markov chains were extended into a new spatial statistical approach, that is, the MCRF approach, for simulating categorical spatial variables [

It is easy to understand that legacy soil data, whether they are map data or observed point data, contain valuable information that is relevant to present soil patterns. Legacy soil maps also contain the tacit knowledge of experienced surveyors, who were intensively trained for soil survey but may not be available at the time of soil map updating [

In this study, we assume that the legacy soil maps from the last update or made from last extensive soil surveys need limited corrections related to natural or anthropogenic soil changes or other reasons. Consequently, update is only necessary in altered areas or erroneously mapped locations. As such, we assume that the legacy soil maps are mainly outdated rather than being of low quality, and that update is necessary for a variety of reasons. This is reasonable because

The chief obstacle to extending one-dimensional Markov chains to multidimensional causal random field models such as Markov mesh models [

A MCRF refers to a random field defined by a single spatial Markov chain that moves or jumps in space and decides its state at any uninformed (i.e., unobserved and unvisited in a simulation process) location by interactions with its nearest neighbors in different directions and its last stay (i.e., visited) location [

If we consider (

This sequential Bayesian updating process on nearest neighbors starts from nearest neighbor

Neighborhood structures with six nearest neighbors and the sequential Bayesian updating process in basic Markov chain random fields: (a) assuming

If the spatial Markov chain is stationary and its last visited location is far away from the current uninformed location, the influence of the last visited location may be ignored (i.e., the transition probabilities from the last visited location to the current location decay to corresponding marginal probabilities). Thus, the local conditional probability distribution _{0} being estimated (see Figure

Because (

If the spatial Markov chain is stationary and its last visited location is far from the current location

To incorporate auxiliary variables, we need to expand (

In this study, we consider only one auxiliary variable in the form of a legacy soil map. Hence, (

If an auxiliary variable has no correlation with the primary variable, the cross-field transition probabilities will equal the corresponding class mean proportions of the auxiliary variable, and the corresponding cross-field transition probability terms in (

The conditional independence assumption was assumed for nearest neighbors in different directions to derive the simplified general solution of MCRFs. Such an assumption is practical, often used in nonlinear probability models [

In fact, it is also unnecessary and difficult to consider many nearest neighbors in different directions in applications. Nearest neighbors outside correlation ranges can be eliminated from consideration. The influence of remotely located data on the current uninformed location is typically screened by closer data within a certain angle. In addition, the conditional independence assumption apparently does not hold for clustered sample data. Therefore, it is proper for MCRF-based Markov chain models to consider only the nearest neighbors in several cardinal directions within a search range to both approximately meet the conditional independence assumption and increase the computation efficiency.

The four nearest neighbors in four cardinal directions can be regarded as conditionally independent given the state of the surrounded central location in a sparse data space [

Here, we assume that the last visited location of the spatial Markov chain is always within the four nearest neighbors; if it is not so, we assume that the spatial Markov chain comes through one of them (Figure

Illustration of the Markov chain random field colocated cosimulation model with quadrant search and one auxiliary variable for random-path sequential simulation. Double arrows represent the moving directions of the spatial Markov chain. Dashed arrows represent the interactions of the spatial Markov chain with nearest neighbors and auxiliary data.

A tolerance angle is required because nearest neighbors in a neighborhood may not be located exactly along cardinal directions. To cover the whole space of a search area, sectors can be substituted for cardinal directions, and we can seek one nearest neighbor from each sector to represent the neighborhood (Figure

The MCSS algorithm was developed based on the above quadrant search method and was effective in simulating multinomial classes in two horizontal dimensions [

To perform simulations using Co-MCSS, transiogram models are needed to provide transition probability values at any needed lag distances. The transiogram was formally established in recent years to meet the needs of related Markov chain models [

For a colocated cosimulation conditioned on one auxiliary variable, one cross-field transition probability matrix (CTPM) is sufficient. Transition probabilities in a CTPM can be estimated by counting point-to-point frequencies of different class pairs from the sample data of the primary variable to the colocated data of the auxiliary variable using the following equation:

The major purpose of this case study was to test the method proposed in this paper, rather than a real application. Because a real field soil survey was unavailable to us, synthetic data extracted from a piece of a real soil series map (9 km^{2 } area) [

The area was discretized into a ^{2}. The soil map has seven soil types. Here, the exact soil series names are not our concern. For convenience, we denote them as S1, S2, S3, S4, S5, S6, and S7. This soil series map (Figure

The data for categorical soil map update by Markov chain cosimulation: (a) the legacy soil map; (b) the reference soil map, representing the current distribution of soil series; (c) the sample data set (646 points), including field survey data and pseudosample data directly extracted from the unchanged areas in the legacy soil map. Previous soil series: S1, S2, S3, S4, S5, S6, and S7. Updated soil series: SU2, SU3, SU4, SU6, and SU7. SU2 = S2, SU3 = S3 + S5, SU4 = S4, SU6 = S6 + part of S7, and SU7 = S7 + S1 + part of S6.

Because we assumed only a few of small areas were subject to soil type changes, our limited field survey was also confined to these small areas. Thus, the survey data are insufficient and also biased for estimating the parameters (e.g., transiogram models) used in the cosimulation. Our suggestion is to use pseudosample data, that is, sample data directly extracted from unchanged areas in the legacy soil map. Therefore, we sampled a sparse data set of 646 points (Figure

Experimental transiograms were estimated from the sample data to generate transiogram models for conditional simulations. Two subsets of omnidirectional transiogram models interpolated from the experimental transiograms are provided in Figure

Cross-field transition probability matrix from sample data (5 soil series) to colocated data in the legacy soil map (7 soil series).

Data | Soil series^{†} |
Legacy soil map | ||||||
---|---|---|---|---|---|---|---|---|

S1 | S2 | S3 | S4 | S5 | S6 | S7 | ||

Sample data | SU2 | .0000 | 1.0000 | .0000 | .0000 | .0000 | .0000 | .0000 |

SU3 | .0000 | .0000 | .9011 | .0000 | .0989 | .0000 | .0000 | |

SU4 | .0000 | .0000 | .0000 | 1.0000 | .0000 | .0000 | .0000 | |

SU6 | .0000 | .0000 | .0000 | .0000 | .0000 | .8143 | .1857 | |

SU7 | .2169 | .0000 | .0000 | .0000 | .0000 | .0271 | .7560 |

Two subsets of transiogram models interpolated from experimental transiograms estimated from the sample data. The numbers in transiogram labels (1 to 5) refer to the five updated soil series (i.e., SU2, SU3, SU4, SU6, and SU7), respectively.

The search radius chosen is 30 pixels (i.e., 600 m). One hundred realizations were generated for the cosimulation conditioned on both the sample data and the legacy soil map using Co-MCSS, and occurrence probability maps were estimated from those realizations. The optimal prediction map was obtained from maximum occurrence probabilities. For the purpose of comparison, the same was done without conditioning on the legacy soil map using MCSS. The PCC (percentage of correctly classified locations) values were estimated for the optimal prediction map and realization maps against the reference soil map (sample data being excluded) to verify the simulation accuracies.

The updated categorical soil maps include the optimal prediction map, a series of simulated realization maps, and occurrence probability maps. But the most important should be the optimal prediction map generated from maximum occurrence probabilities that reflect the best predictions for a chosen method and available data. The optimal prediction map of the soil series and the corresponding maximum occurrence probability map (Figure

The optimal prediction map (a) and the maximum occurrence probability map (b) of updated soil series conditioned on sample data and the legacy soil map using the Co-MCSS method.

Similar to hand-delineated maps, optimal prediction maps of categorical spatial variables normally also have an omission effect: minor classes are underrepresented because of their lower occurrence probabilities at most unsampled locations and major classes are consequently overrepresented [

Two simulated realization maps of updated soil series conditioned on sample data and the legacy soil map using the Co-MCSS method.

The simulated realization maps (Figure

Occurrence probability maps of updated single soil series conditioned on the sample data and the legacy soil map using the Co-MCSS method. (a) SU2; (b) SU3; (c) SU4; (d) SU6; and (e) SU7.

To verify the improvement and advantages of Co-MCSS over MCSS, which cannot incorporate auxiliary information, we also used the MCSS method to conduct a simulation conditioned on the same sample data. Comparing optimal prediction and maximum occurrence probability maps (Figure

The optimal prediction map (a) and the maximum occurrence probability map (b) of updated soil series conditioned only on the sample data using the MCSS method.

The PCC value represents the accuracy of a classified map compared to reference data. Using the reference map modified from the legacy soil map (Figure

Percentages of correctly classified locations (PCCs) of optimal prediction maps and simulated realizations (averaged from 100 realizations) generated by Co-MCSS and MCSS. PCCs (%) are estimated relative to the reference soil map with sample data being excluded.

Item | Accuracy | |
---|---|---|

Optimal prediction map | Realization maps | |

MCSS | 82.50 | 79.32 |

Co-MCSS | 98.25 | 97.23 |

Absolute improvement^{†} |
15.75 | 17.91 |

Relative improvement^{‡} |
19.09 | 22.58 |

^{‡}Relative improvement = absolute improvement/PCC of MCSS × 100.

Sample data directly extracted from the unchanged areas of the legacy soil map are not real survey data for map updating. They were used for fairly estimating the transiogram models and the cross-field transition probability parameters and also for conditioning the simulations. This study does not show that the conditioning of the extracted pseudosample data for unchanged soil series (including merged unchanged soil series) in simulations is necessary, as these unchanged soil series are simply reproduced from the legacy soil map. But if a soil type change is confirmed at a place by a survey sample datum, pseudosample data should not be extracted nearby unless they are surely correct because pseudosample data confirm the unchanged status of soil series at their locations.

Updating categorical soil maps is necessary for many reasons, such as being outdated or of low quality. We assumed that the most recent legacy soil maps may need only limited corrections due to modest natural and anthropogenic soil changes occurring during the intervening time period. As a result, updates to the legacy maps are necessary in only the changed and mistakenly mapped areas. In essence, we assume that the legacy soil maps were outdated but of good quality. Such a situation may be applicable to the soil map update of the United States, where quite detailed large-scale categorical soil maps exist for each county in most states.

We introduced the random-path Co-MCSS algorithm, which extended the random-path MCSS algorithm, for revising categorical soil maps and applied it to a case study of synthetic data that involved the revision of a legacy soil series map using limited survey data. Simulated results show that

Finally, other related data, such as land cover/land use and discretized DEM-derived data (e.g., elevation), are often correlated with the spatial distributions of soil series and may also be incorporated as auxiliary information to improve the accuracy of soil mapping, especially when legacy soil maps are of low quality or unavailable and the survey sample data are very sparse. In this study, because we assumed that legacy soil maps were available and of high quality and only limited soil changes occurred, other auxiliary variables were not considered.