The aim of this research was to determine the variables that characterize slate exploitability and to model spatial distribution. A generalized linear spatial model (GLSMs) was fitted in order to explore relationship between exploitability and different explanatory variables that characterize slate quality. Modelling the influence of these variables and analysing the spatial distribution of the model residuals yielded a GLSM that allows slate exploitability to be predicted more effectively than when using generalized linear models (GLM), which do not take spatial dependence into account. Studying the residuals and comparing the prediction capacities of the two models lead us to conclude that the GLSM is more appropriate when the response variable presents spatial distribution.
The exploitability of a slate deposit depends on many qualitydetermining factors that are spatially correlated. Knowledge and study of these factors are essential for the evaluation of deposits [
Traditionally, the main aim of geostatistical models has been to predict a spatially correlated response variable. Under this approach, estimating the parameters of the geostatistical model is not usually the main interest. However, estimating and inferring parameters enables a more precise identification of the factors influencing the geographical distribution of exploitable slate, thus allowing greater knowledge to be gained regarding the response variable of interest.
In our research, the modelbased geostatistics methodology was adapted in the analysis of slate exploitability using a generalized linear spatial model (GLSM). With this type of model, the objective of inference can be focused on the parameters of the regression function, on the properties of the residuals, or on the distribution of the residuals conditionally on the response variable.
A brief description of the statistical models used in this study is given in Section
Generalized linear models (GLMs) were introduced by [
In a GLM, a response variable
An important extension of the GLM is the generalized linear mixed model (GLMM) [
Consider
Let
Conditionally on
When the regression parameters
To estimate the parameters for the GLSM and due to the fact that the stationary Gaussian process
When the marginal distribution (in the GLM) or conditional distribution (in the GLSM) of the response variable
We used the AUC and residual semivariograms to demonstrate the goodnessoffit of the binary GLSM compared with the binary GLM when working with spatially correlated data.
The data used to build the proposed model was collected from borehole samples taken from slate deposits in Baja Cabrera Leonesa (northwest Spain), an area with a long tradition of extracting, processing, and exporting roofing slate.
When surveying a slate deposit, indepth studies of the rock are performed by taking continuous borehole samples, which enable geologists to study the living rock and analyse the possibility of using it as ornamental slate, see [
The specific borehole logging process was based on manual and visual inspection of the borehole by an expert who, after evaluating the aesthetic and functional defects and properties of the slate, differentiated between seams of commercial and unusable slate. The survey was performed by taking a control sample every 25 centimetres; rock quality designation (RQD), however, was defined by homogeneously fractured sections.
A total of 313 equally spaced indepth observations were obtained, resulting from prior evaluation of various parameters affecting the ornamental quality of the slate and from direct binary values (0 or 1) assigned by the expert to indicate exploitation potential. The 9 specific variables that affected the results of borehole logging were as follows.
RQD: borehole core samples recovered in pieces greater than 10 cm long as a percentage of the total borehole length. This is an indicator of the degree of rock mass fracturing.
Veins: presence of microfractures filled with quartz that determine the breakage resistance of a commercial slab.
Crenulations: effect of crenulation cleavage on the main schistosity planes. This increases the roughness of the foliation surfaces of the slate and reduces fissility.
Kink bands: Presence of microfolding caused by late Variscan deformations.
Sandy laminations: presence of sedimentary sand layers which cut the schistosity planes and have a negative effect on fissility.
Microfractures: presence of barely visible fractures which determine the breakage resistance of slabs measuring 3–5 mm thick.
Pyrite: presence of iron sulphides.
Oxidation: degree of oxidation of iron sulphides in the slate.
Rough cleavage: slate with poor fissility due to textural heterogeneity.
Indepth knowledge of the variability and distribution of exploitable slate and possible correlation between properties are conducive to the use of GLSM to spatially model the geographic database.
Table
Correlation matrix of the
RQD  Veins  Crenulation  Kink bands  Sandy laminations  

RQD  1  
Veins  0.1624  1  
Crenulation  0.3038  0.1631  1  
Kink bands  0.1040  0.0408  0.1962  1  
Sandy laminations  0.0177  0.0187  0.0200  0.0736  1 
Microfractures  0.6696  0.0035  0.1284 


Pyrite 





Oxidation  0.2053  0.0837  0.1722  0.0821 

Rough cleavage 


0.2096 


Microfractures  Pyrite  Oxidation  Rough cleavage  

Microfractures  1  
Pyrite 

1  
Oxidation  0.1162 

1  
Rough cleavage 

0.0065 

1 
The response variable,
Binomial error distribution was used by Diggle et al. [
We adopted a Bayesian framework for inference and prediction of the parameters, using algorithms based on MCMC.
The parameters of this binomial GLSM are
We initially included all the variables that characterize slate exploitability, namely, RQD, veins, crenulations, kink bands, sandy laminations, microfractures, pyrite, oxidation, and poor fissility. Taking this data and, considering slate exploitability as the response variable, we fitted a binary GLM, called GLM1. A ROC curve was estimated for this complete binary model and the AUC was 0.99.
Next, binary GLMs were fitted to different groups of dependent variables in an attempt to find the minimum number of variables that would provide a high AUC value, that is, close to 0.99. The model, called GLM2, fitted with the RQD, crenulation, kink band, and microfracture variables obtained an AUC of 0.92. Figure
ROC curves and the corresponding AUC values for the two binary GLMs included in the study.
Table
Estimated coefficients and 2.5%, 25%, 75%, and 97.5% quantiles for the GLM.
Coefficient  2.5% quantile  25% quantile  75% quantile  97.5% quantile  








1.2826 

0.4192  2.0516  3.9633 

0.5875  0.3288  0.5089  0.6761  0.8352 

0.5668  0.3965  0.5019  0.6395  0.7857 

2.5815  0.9221  1.6093  3.9181  6.7967 
The study of the residuals of the GLM2 model fitted with four explanatory variables detected a spatial dependence that could be modelled using an exponential theoretical semivariogram with range 0.81 and sill 1.61. Figure
Experimental semivariogram for the GLM2 residuals (points), together with the fitted theoretical model fitted according to an exponential model (continuous line).
Given the presence of spatial dependence in the residuals, it then made sense to fit a GLSM, maintaining four explanatory variables and assuming an exponential model for the process
Using this procedure, the parameters were estimated as
Estimated coefficients and 2.5%, 25%, 75%, and 97.5% quantiles for the GLSM.
Coefficient  2.5% quantile  25% quantile  75% quantile  97.5% quantile  








0.2521 

0.1291  0.3726  0.5388 

0.7334  0.3762  0.6121  0.8283  1.0691 

0.9189  0.5881  0.7802  1.0274  1.2261 

1.2911  0.8834  1.0125  1.4480  1.6261 
It is important to remember that these parameters should be conditionally and not marginally interpreted and so should not be directly compared with the parameters estimated for the GLM. Direct comparison of a spatial and nonspatial GLM could lead to erroneous conclusions, as the estimation methods are fundamentally different. Nonetheless, there is a certain correlation in the conclusions to be drawn from these tables, with variables such as crenulation, kink band, and microfracture remaining significant; RQD, on the other hand, was not significant at 5% level.
Figure
Twodimensional likelihood profile for
A study of the spatial dependence of the GLSM residuals indicated that the spatial component had been correctly modelled on this occasion. Figure
Experimental semivariogram for the GLSM residuals (points), together with the theoretical model fitted according to a nugget effect model (continuous line).
The ROC curve and AUC were calculated for the GLSM. The AUC of the binary spatial model was 0.99, which indicates a substantial improvement in the precision of the GLSM. This improvement is reflected in Figure
ROC curves and corresponding AUC values for the binary GLM2 and the GLSM.
The comparison between the two models was completed with a simulation study, designed to compare the reliability of the predictions in three scenarios of varying levels of difficulty. Randomly selected for the first scenario was 95% of the 313 initial observations, composing the training set that was used to fit the GLM and GLSM. The fitted models were then validated with the remaining 5% of the observations. This procedure was repeated 100 times and the number of errors in the slate exploitability prediction was recorded for each repetition. For the second scenario, we randomly selected 90% of the observations for the training set and the remaining 10% made up the test set. This simulation was also repeated 100 times. The procedure for the third scenario was similar, but this time 85% and 15% of observations made up the training and test sets, respectively.
Table
Error rates for the binary models for three scenarios representing different levels of prediction difficulty.
5% test  10% test  15% test  

GLM 



GLSM 



In all the cases, it can be observed that the GLSM provided a better explanation not only of the effect of the variables determining slate quality, but also of the spatial behaviour of exploitable slate, thereby producing lower prediction error rates.
A general interpretation of the GLSM used in our analysis is that the spatial term
In GLMs, the fact that the spatial correlation of the variables is not taken into account can significantly affect the quality of statistical results. Our study highlights the potential risk of using GLMs when the data is spatially structured.
The conclusion reached after comparing ROC curves and their corresponding AUCs is that GLSMs predict slate exploitability better than GLMs. Therefore, it would seem essential to include unexplained spatial variation when modelling spatially correlated variables.
Based on the comparison of the semivariograms of the GLM and GLSM residuals, we would like to draw attention to the presence of spatial dependence in the GLM residuals, in contrast to what occurs when a GLSM is implemented. This indicates that spatial dependence has been captured correctly by the stationary Gaussian process
The simulation study demonstrates that, for varying levels of prediction difficulty, the GLSM had lower error rates than the GLM.
Although the parameters of the GLSM must be interpreted conditionally rather than marginally to
This work was funded partly by the Project INCITE10REM304009PR of the Xunta de Galicia and by the Projects MTM200803129 and MTM201123204 of the Ministry of Science and Innovation.