Multimodel Inference for the Prediction of Disease Symptoms and Yield Loss of Potato in a Two-Year Crop Rotation Experiment

1 Applied Plant Research for Arable Farming, Multifunctional Agriculture and Field Production of Vegetables, Wageningen UR, P.O. Box 430, 8200 AK Lelystad, The Netherlands 2 Centre for Crop Systems Analysis, Wageningen University, P.O. Box 430, 6700 AK Wageningen, The Netherlands 3 Biometris, Department of Mathematical and Statistical Methods, Wageningen University, P.O. Box 100, 6700 AC, Wageningen, The Netherlands


Introduction
To answer research questions in agronomy, statistical hypotheses are formulated and an analysis of variance on experimental results is performed in order to detect the effect of different treatments [1].In data obtained from common types of experiments, effects of treatments are causal because of randomization of the treatment levels over the experimental units.In crop rotation experiments, a wider approach is needed in the sense that not only controlled treatments are considered, but also other predictors such as weather and soil characteristics.Ultimately, such procedures should lead to a statistical model giving the best prediction of crop yield.In crop rotation experiments on soil borne pathogens, different pathogen species are present and various regression models can be formulated about the harm caused by these species in combination with the effect of abiotic predictors for crop growth.The aim of this study is to select from a set of plausible predictors for tuber yield the ones that definitely should be included in a prediction model.We will use the multimodel approach in which a model is constructed as a weighted combination of all models.The advantage of this approach is that one uses information from all models instead of choosing the best model.This can be attractive when a number of models are doing about equally well.To calculate the weights, the Akaike Information Criterion is used.In the generalized linear models used in this study, the multimodel method also leads to improved estimates of parameters based on the weights of the different models.The study used data of an experiment by Scholte [2] on the application of trap crops to reduce the damage from potato cyst nematodes.
Farmers grow potatoes as a component in a crop rotation system.Short intervals between successive potato crops may result in a buildup of harmful soil-borne pathogens, including several fungi and nematode species [3] Tagetes or oats or fallow in summer and autumn Figure 1: The two-year crop rotation scheme, representing the sequence of crops and treatments for one of three blocks.Each block was split into two "sets": the experimental cycles in Set 1 started with the potato cultivar Disco in 1994.The experimental cycles of Set 2 started with the same potato cultivar in 1995, and red clover was grown in 1994.A cycle consisted of a sequence of crops and treatments lasting two years.Year one of a cycle consisted of two parts: spring treatments and treatments applied in summer plus autumn.Spring treatments were either fallow or the cultivation of a trap crop against potato cyst nematode.Summer/autumn treatments were one of three options: cultivation of Tagetes, oats, or fallow.In year 2 of the cycle, potatoes were grown on all plots; treatments consisting of three potato cultivars showed differing degree of resistance to potato cyst nematodes (cvs Seresta, Karida, or Elkana); potato cultivar treatments were split up in two sub treatments consisting of incorporation or removal from the field of potato haulms.See Scholte [2] for more details.
threshold level of infestation, tuber yield will suffer from the presence of potato cyst nematodes [4,5].
If more than one soil-borne pathogen is present, interaction effects are expected resulting in a further increase in crop damage [6].Apart from Scholte [2], there is a wealth of other studies reporting on the relation between inoculum density of interacting diseases and crop response.Evans [7], for instance, showed in a pot experiment with several potato cultivars higher infection rates of the fungus Verticillium dahliae in the presence of cyst nematode species (Globodera rostochiensis or G. pallida).Potato early dying [8] is caused by V. dahliae and V. albo-atrum, but severity aggravates when soil is also infested by Pratylenchus penetrans.
Many results have been published on the harming effects of soil-borne pathogens in combination with environmental factors such as temperature, soil pH, and soil texture.The index of powdery scab (Spongospora subterranea) tended to take lower levels when the pH of the soil increased by adding lime [9].Yield loss of potato by G. pallida is more severe at high than at low pH values of the soil.This is attributed to a lower availability of phosphate at higher pH values [10].On sandy soils, damage from Meloidogyne spp., Pratylenchus spp., and Trichodoridae is greater than on clay soils.Temperature affects growth and development of nematodes as well as the agricultural crops themselves.Moreover, the number of generations of Meloidogyne may be one or two in cool summers, while at higher temperature three generations are possible [11].
In the past, mathematical models have been introduced to quantify the damage from pathogens and their interactions using multiple linear regression [12,13].Wheeler et al. [14] modeled the simultaneous effect of P. penetrans and V. dahliae on tuber yield of potato with nonlinear regression, while Jacobsen et al. [15] modelled tuber yield as a function of the density of both M. hapla (x 1 ) and V. albo-atrum (x 2 ) as predictors on the natural log scale, resulting in an additive model plus an interaction term: where y is the yield of potato tubers and α, β, γ, and δ are regression coefficients.Here and elsewhere "log" means the natural logarithm.
This paper presents a new statistical analysis of a case study dealing with four factorial treatments applied in a crop rotation experiment (1995-1998) with a rotation cycle of two years [2].Results of such an experiment can be used to formulate models for the prediction of pathogen dynamics and yield loss based on measured values of predictor variables.Parameters in expressions of this type are estimated by means of Generalized Linear Models (GLMs) using a logarithmic link function.In this study we took an information-theoretical approach for model selection and multimodel inference [16].First, we select a biologically plausible subset of predictors from the available set including their interactions.The second order Akaike information criterion [17] is used to select the best models.The approach is applied to the prediction of three responses: (a) the incidence of S. subterranea at tuber skin, (b) percentage of haulm infection by V. dahliae, and (c) tuber yield.

Experimental Design.
The data used in this study are from a crop rotation study with cycle of two years (Figure 1) [2].The experiment consisted of two so-called "sets": preparations for Set 1 started in 1994 and for Set 2 in 1995.In both sets the same treatments were applied in the same sequence.The advantage of having two sets is that observations on main potato crops could be done each year.
Set 1 consisted of two two-year cycles of the crop rotation and Set 2 of 1.5 cycles of the crop rotation.The first year of the crop rotation cycle is divided into the spring period and the period of summer taken together with autumn.In the spring period, two trap cropping treatments were applied: (i) cropping with potato cultivar Kartel acting as a trap crop for potato cyst nematode (PCN), it resulted in a decline of the infestation (see the Section 2.2) and (ii) a fallow control.In summer and autumn (after destroying the potato trap crop), three green manure cropping treatments were applied: (i) growing of Tagetes, (ii) growing of oats (Avena sativa), and (iii) fallow control.In year two of the crop rotation cycle, potatoes were grown; the choice of cultivar is also a treatment (Seresta, Karida, and Elkana).In addition to the cultivar factor, another treatment factor was applied, immediately after the harvest potato haulms were either incorporated into the soil or removed from the plot.
The two trap cropping treatments, the three green manure cropping treatments and the two haulm removal treatments, were combined in a factorial design.The 12 treatment combinations of trap crop × green manure crop × haulm removal were randomized over 12 main plots.The potato cultivars split the main plots into three plots meaning that one replicate of one set consisted of 36 plots.Thus, the design consisted of three blocks each being split into the two sets as mentioned before.Set 1 of the experiment was terminated in November 1998 after growing potato in that year.Set 2 was terminated in March 1999 after sampling the plots for determination of soil infestation with pathogens.The experiment was located in Achterberg (51 • 59 N, 5 • 35 E), Netherlands, on a sandy soil.

Justification of Treatments.
The treatments were designed to differentially affect the infestation level of the soilborne pests and diseases.In the first year, potato itself was used as a trap crop against PCN; the other half of the plots remained fallow.Potato can be used as a trap crop for PCN, provided the crop is killed by glyphosate by the end of June, that is, before PCNs complete their life cycle and multiply [2].The green manure Tagetes was used to reduce the density of P. penetrans, oats was expected to enhance antagonists of Rhizoctonia solani [18,19], while fallow was used to reduce the density of all soil-borne pathogens.In the second year, the three potato cultivars were grown: Seresta, Karida, and Elkana.They are, respectively, highly resistant, moderately resistant, and susceptible to G. pallida, and all three are highly resistant to the pathotypes of G. rostochiensis.Potato haulms are the most important source of V. dahliae microsclerotia.Hence, the soil inoculum level of V. dahliae can be influenced by incorporating or removing haulms [20].

Inoculation.
In 1994 and 1995, inoculations were made to uniformly infest the plots with the organisms under study.R. solani, anastomosis group 3 (AG3), and Pratylenchus crenatus were already present in the soil at high densities as a result of previously grown crops.Inoculations were made with G. pallida, M. hapla, M. chitwoodi and P. penetrans, and V. dahlia; details are in [2].

Soil Sampling.
Before planting main crop potatoes, 60 soil cores were taken at random from each plot with a 2.0 cm diameter auger to a depth of 20 cm early in March.Each sample was gently but thoroughly mixed and then separated into four parts: 300 mL soil was used for a biotest for the presence of root-knot nematodes, 2000 mL to analyze the populations of other nematodes, 250 mL to determine soil infestation with V. dahliae, and the remaining soil was used to measure pH and analyze N, P, and K levels.In spring soil pH was measured as well as the soil concentrations of nitrate, ammonium, and water soluble P and K on the plots to be planted with the potato main crop.

Data Acquisition of Nematode Densities.
Cysts of nematodes were extracted from a subsample of moist soil using the method described by Oostenbrink [21].After crushing the cysts, numbers of viable juveniles were counted using a stereo dissecting microscope.In 1999, Set 2 was treated differently: air-dried soil samples were used from which cysts were separated using the method described by Van Bezooijen et al. [22].Populations of Pratylenchus spp.and Meloidogyne spp.were counted after extracting them using the method described by Oostenbrink [21].
2.6.Haulm Infections with V. Dahliae and R. Solani.Each year, in the second week of August, 20 randomly chosen plants of the potato test crop were harvested, five from each of the inner four rows of each plot.A transverse section of 1-2 mm thickness was cut from the middle of each haulm piece, and eight sections per dish were placed on a selective medium in Petri dishes and incubated for two weeks [20].Colonies of V. dahliae were counted using a stereo dissecting microscope.Rhizoctonia haulm canker was recorded at the underground part of each haulm using five classes of disease severity.

Inoculum Density of V. Dahliae in Soil.
Microsclerotia are the surviving structures of V. dahliae in soil quantified by a colony forming unit (cfu).To that end, soil samples were dried and sieved and were dispersed on a selective agar plate.The plates were incubated and after three weeks the number of cfu of V. dahliae was determined using a stereo dissecting microscope; the number of cfu was expressed per g soil, details are in [2].

Specification of Plausible Predictors.
In the case study, a large number of variables were measured for monitoring soil fertility, density of pathogens in soil, and growth of the potato crop.To save time and effort in the selection process and obtain parsimony using multimodel inference, we only retained predictors in the model that are meaningful as possible predictors of tuber yield: some disease symptoms were measured in the field during crop growth or afterwards on harvested tubers.These quantities have no predictive International Journal of Agronomy value and were discarded from the analysis.Potatoes are host to the fungus V. dahliae and to the nematodes G. pallida, Trichodoridae, and Meloidogyne spp.Densities of these pathogens was measured in the soil before planting.Trichodoridae stands for all nematodes belonging to the genera Trichodorus and Paratrichodorus.Also soil fertility measurements were made before planting the crop.Only pH and water soluble P were assumed to be relevant [10] and retained, while total N and water soluble K were discarded from the analysis as these nutrients are too mobile and variable in the soil to have predictive value.
Following the literature [7,11], the interaction between V. dahlia, and the three nematode species were added to the predictor set.We did not include the interaction between pH and nematode density in spring as Mulder [8] did, because there the range of pH values was much larger than in our case study.Three of the nine predictors are interactions.From all 2 9 models, we only considered those in which both predictors were included when their interaction was in the model [23].Therefore, for each of the three responses (i.e., incidence of S. subterranea on harvested tubers, percentage of haulms infected with V. dahliae and dry tuber yield) only 139 predictive models were selected to which multimodel inference was applied.The model with all nine predictors is called the global model.

Regression Models.
A simple nonlinear model for yield loss caused by nematodes of one species is formulated as where μ is the expected tuber yield and x the initial density of the pathogen at the moment potatoes are planted.Parameter μ 0 is the maximum yield when x = 0, and the parameter ρ represents the intact fraction of the roots after attack due to one unit of pathogen [24].(2) has also been used by Mol et al. [25] for the assessment of yield loss by V. dahliae, where x was the initial density of cfu g −1 soil.Here, we assume that parameter ρ is identical for all three cultivars we consider.This means that the relative damage per unit of pathogen is assumed to be the same for all cultivars.However, we use the parameter μ 0c to express the differences in yield of the three cultivars (c = 1, 2, 3).In this way we capture the effect of more than one soil-borne pathogen species in a multiplicative yield loss model with a restricted number of parameters where x 1 , x 2 , . . ., x n are the initial densities of the n soil-borne pathogens [26].It is noted that the parameter ρ varies with the pathogen type.Yield decrease due to two pathogens, for example, V. dahliae, x 1 , and G. pallida, x 2 , is modeled as The parameters of (4) were estimated with a Generalized Linear Model (GLM) with a logarithmic link function and with the response having a normal distribution [27].The product of x 1 and x 2 can be written as x 3 , so that (4) transforms into where β 0c = log(μ 0c ), β i = log(ρ i ), i = 1, 2, and β 3 = log(ρ 12 ).For an analysis based on a reciprocal link function, we refer to [28,29].
The environmental predictors, pH and water soluble P content of the soil, can be added in the same way to (5).We have p predictors for which the regression coefficients are identical for the three cultivars.Furthermore, the intercept is assumed to depend on the choice of the cultivar, so that there are p + 3 regression parameters.Including the error term, the GLM with a logarithmic link function takes the form Besides the tuber yield, we have the two other response variables.Firstly, the percentage of haulms infested with V. dahliae for which a logit link function is used with the infested number as response and the total number of haulms as binomial total.Secondly, the distribution of tubers over the five severity classes of coverage with S. subterranea is analyzed with ordinal regression [27].

Second Order Akaike Information Criterion (AIC c
).In our case study, nine predictors are available for prediction of the tuber yield μ. We assume the tuber yield to be normally distributed, so we have to maximize the normal likelihood, L, as function of the regression parameter vector, β, and the standard deviation, σ, of the error term given the data, y, [23]: where i is the observation index.Replacing σ 2 by its estimate (the residual sum of squares divided by n), we are left with a likelihood function of β only that has to be maximized.For model selection and multimodel inference, we use the second order Akaike Information Criterion (AIC c ): where n is the number of observations and K = p + 4 the number of parameters (p regression coefficients, 3 intercepts for the cultivars and σ 2 for the error term).Burnham and Anderson [16] demonstrated that AIC c should be preferred above the standard Akaike Information Criterion if n/K < 40.In our case study, n/K is indeed below this value for most of the candidate models and so we use (8).In the following, we take for the parameters in the model those values that minimize the AIC c .
2.12.Kullback-Leibler Distance.Let R candidate models be selected in advance.For each model r = 1, . . ., R the AIC c r with the best choice of parameters is calculated.Following Burnham and Anderson [16], the model with the lowest AIC c is denoted by AIC c r * .The Kullback-Leibler distance Δ r between AIC c r and AIC c r * is used to compute an appropriate weight w r for each of the r = 1, . . ., R models: Using w r , we can order the models: r 1 = r * and so on.The summed weight of the first k models w rj (10) denotes the probability that from all R models the best approximation to truth is among these k models.

Multimodel
Inference.Now, based on the weights w r , we compose the (multi)model having estimated regression coefficients w r β r j , j = 1, . . ., p, (11) meaning that in a model without predictor j the coefficient β r j is given the value zero.Furthermore, each predictor can be given a weight where I j (r) = 1 if predictor j is in model r, otherwise I j (r) = 0.The weight v j takes values in the interval [0, 1] and is an indication for the importance of this predictor in the process of constructing the best regression model.Following Lukacs et al. [30], the variance of the regression coefficient j takes the form 2.14.Statistical Package.All statistical analyses are carried out using GenStat [31].

Statistics.
The index (0-100) for S. subterranea varied from 0 to 78.4 with a median equal to 12.7.Percentage of haulms infected with V. dahliae varied from 0 to 86.4 with median value 12.8.Tuber dry yield ranged from 910 to 1729 g m −2 .Densities of V. dahliae, G. pallida, Trichodoridae, and Meloidogyne spp.showed low medians because only on a fraction of the plots there were higher levels of infestation.
Water soluble P varied from 5.5 to 34.0 kg/ha in the tillage layer.A similar range of values was found for these predictors in 1996 and 1997.
In multiple regression analysis, high correlations between two or more predictors, also called multicollinearity [32], may complicate the analysis and the interpretation of the outcomes.In the first place multicollinearity may increase the standard errors of the parameters in a model ("inflation").Secondly, multicollinearity increases model uncertainty by inducing different estimates across different models, which enlarges the second term in (13).Table 1 gives the correlation matrix for the nine predictors (in 1998).As expected a positive correlation between interaction terms and their corresponding two predictors was found.All correlation coefficients were below 0.7, so multicollinearity appeared not to be a problem.
The goodness of fit of the global model, with all candidate predictors present in the model, (6), was quantified by the coefficient of multiple determination R 2 (Table 2).In the analyses over the three years R 2 ranged from 0.38 for S. subterranea in 1997 to 0.66 for the percentage of haulms infected with V. dahliae in 1998.The R 2 for the model with only the cultivar term varied from 0.05 for S. subterranea in 1996 to 0.37 for tuber yield in 1997.The part of R 2 representing the variance explained by the nine predictors ranged from 0.15 (= 0.52−0.37)for dry tuber yield in 1997 to 0.43 (= 0.55−0.12)for the percentage of haulms infected with V. dahliae in 1996.The coefficients of determination in Table 2 are rather low.This may be caused by predictors not taken into consideration and/or measurement errors.Wider ranges in the predictors would have resulted in a wider range of the response, increasing R 2 .The question whether all nine predictors substantially contributed to the size of R 2 was answered in the multimodel approach.

Akaike Information Criterion.
For each year and for each of the three response variables, the 139 candidate models were fitted to the data.The regression coefficients, with corresponding standard errors, were saved for each model and the AIC c was calculated using (8).For each response variable, the models were sorted according to ascending AIC c .

Multimodel Inference.
Multimodel inference results are presented from analyses for the three response variables.The summed weights, ν j , for predictor j, j = 1, . . ., 9, are given by (12) and lie between 0 and 1; Tables 3, 4, and 5 show their values in the analysis of the three responses.A value close to 1 for a predictor indicates that it has a high predictive value.The multimodel inference for each of the response variables showed the following.
(a) For S. subterranean, the summed weights v 1 , v 8 , and v 9 of the predictors V. dahliae, pH, and water soluble P have the value 1 in each of the three years.The severity of S. subterranea on the tubers increased with the density of V. dahliae and with the pH.It only increased with the water soluble P concentration in 1997 (Set 2) and decreased for higher concentration in the other years.The summed weights, v 2 and v 4 for G. pallida and Meloidogyne spp.were 0.99 for both of them in 1998.Also in 1998, the summed weight v 5 and v 6 of the interactions between V. dahliae and G. pallida and between V. dahliae and Trichodoridae were high: they were 0.93 and 1.00, respectively.
(b) As expected the number of cfu of V. dahliae g −1 soil measured in soil in spring was the most important predictor for the percentage of haulm infections.In 1998 V. dahliae was the only predictor with a high summed weight.Water soluble P and pH in 1996, and pH in 1997, also showed high summed weights.
(c) For tuber yield loss, V. dahliae (cfu g −1 soil) in spring was the most important predictor.The summed weight v 1 took the values 1.00, 0.91, and 1.00 in the three consecutive years.The summed weights v 3 and v 9 , for Trichodoridae, and water soluble P were high in 1996; for Trichodoridae and pH, the summed weight v 3 and v 8 were high in 1997.In 1998 the summed weights v 2 and v 5 for G. pallida and for the interaction between V. dahliae and G. pallida were high (0.94).

Discussion and Conclusions
We dealt with the estimation of crop damage from soilborne pathogens in potato using predictors that are at hand at the start of the growing season.Predictors included the initial densities of nematode species for which potato is a host and soil properties, such as pH value.In this way we ended up with only predictors that are biologically relevant and were only weakly correlated (Table 1).This resulted in six candidate predictors and three interactions.Instead of considering all possible combinations of these nine predictors, we discarded the models with interactions for which at least one of the predictors forming the interaction was absent [23].This reduced the number of models from 512 to only 139.For each model r, a weight, w r , is calculated using the Akaike information criterion corrected for small datasets.The weight w r denotes the probability that model r provides the closest approximation to the true model out of all the models considered [33].Averaging of parameter estimates based on multimodelling is carried out with the use of these model weights.This approach utilizes the information from all models and requires less computational effort than, for example, a Bayesian alternative [34].
Differences in the levels of pathogens were created in four orthogonal treatments.For example, removal of potato haulms from the field in autumn on half of the plots considerably reduced the cfu g −1 soil of V. dahliae compared to plots where haulms were incorporated into the soil [2].Density of G. pallida was higher on fields where no trap crop was grown in the second year of each cycle.So the design of the experiment has been successfully given that the experimenter aimed at analyzing separately the effects of several soil-borne pathogens in rotations by varying (via factorial treatments) the soil infestation levels of the target organisms.If continued after 1998 for further years, the experiment would probably have given more insight into possible interactions between the pathogens that were present.The summed weights, as given in Table 3, 4, and 5 for the years 1996, 1997, and 1998, already showed this trend in crop damage from these predictors.In 1998, for the first time the summed weight for the interaction of V. dahliae and G. pallida was high.If the time series covered more years, the modelling results would probably show a rise of the summed weights for Trichodoridae, Meloidogyne spp.and their interaction with V. dahliae in the years after 1998.
By using data from only one experimental field, results are strongly influenced by the local soil characteristics.Differences in the pH values as well as in the levels of infestation with S. subterranea were already present at the start of the experiment.As the incidence of S. subterranea is positively correlated with the value of soil pH, the statistical results should be interpreted with care because other factors in the soil may be correlated to the pH, such as antagonists of S. subterranea.Furthermore, inoculation of P. penetrans was not successful, may be because P. crenatus was already present in the soil.The presence of antagonists of soilborne pathogens varies between locations and has a strong influence on observed treatment effects [35].
Our statistical analysis on factors that affected the response variables may provide a reliable estimation under conditions comparable with those of the experiment and leads to the following conclusions.
(a) For the incidence of S. subterranea, the predictors V. dahliae, pH, and water soluble P had the largest effect on the distribution of tubers over the five severity classes (Table 3).No hierarchy within this set of three predictors could be discerned.
(b) The percentage of haulms infected with V. dahliae is mainly determined by the number of cfu/g soil measured in spring (Table 4).
(c) V. dahliae is the most important predictor of potato yield loss.According to the regression coefficients for V. dahliae (Table 5), yield loss was most severe in 1997 when the measured density of cfu was low [2].
The approach we presented is worth considering in other applications, such as the design of new crop experiments and the planning of potato crops based on expert systems.With multimodelling, predictors are given a weight which can be employed in a farmers' decision support system.Identification of influential predictors plays a role in various other disciplines, such as in conservation biology where the persistence of a species depends on the interaction with other species and on the habitat with its different abiotic components.Since no specific model choice has to be made, multimodelling guarantees that no information gets lost in an early stage of the analysis.
. Above a

Table 1 :
Correlation matrix for the nine predictors in the year 1998.There are 108 pairs in each correlation.Units: V. dahliae in cfu g −1 soil; G. pallida, Trichodoridae, and Meloidogyne spp. in mL −1 and Water soluble P in kg ha −1 .

Table 2 :
The coefficient of multiple determination, R 2 , of the global model for the three responses over the three years.In the global model, the nine predictors are present as well as the cultivar term.Between brackets, R 2 is given for the model with only the cultivar term.