Monitoring Population Phenology of Asian Citrus Psyllid Using Deep Learning

Department of Computer Science, Government College University, Faisalabad, 38000, Pakistan Department of Entomology, University of Agriculture Faisalabad Sub-Campus Burewala, Vehari 61010, Pakistan South Eastern Kenya University, Kitui, Kenya Cardiovascular Engineering Inc, 1 Edgewater Drive, Norwood, MA 02062, USA Department of Mathematics, Clausthal University of Technology, Erzstraße 1, D-38678 Clausthal-Zellerfeld, Germany


Introduction
e citrus greening disease, which is also known as Huanglongbing, is a severe affliction to citrus plants that causes significant losses to the citrus economy, caused by a phloem-dwelling bacterium Candidatus Liberibacter asiaticus. is incurable and economically damaging disease is transmitted by infection of a sucking pest, Asian citrus psyllid (ACP), which is a transmitting vector agent of Candidatus Liberibacter asiaticus phloem-inhabiting bacterium. Effective management of ACP is crucial for preventing the losses caused by Huanglongbing and ACP complex [1]. ere are three types of bacteria, Candidatus Liberibacter asiaticus, Candidatus Liberibacter americanus, and Candidatus Liberibacter africanus, associated with the spreading of Huanglongbing throughout the citrus-growing areas worldwide [2]. Huanglongbing is a vector-borne disease, and its causative agents grow and transmit through ACP [3].
Psyllid population growth rate is directly associated with the flush phenology (biotic factor) of host plants because female adults are only able to lay eggs on young, tender, and succulent plant leaves, and resultantly, the nymphs are more likely to hatch and grow during the season of abundant flush growth on citrus plants. Availability of flush growth and optimum meteorological conditions leads to large infestations of ACP on citrus plants. Different meteorological conditions such as relative humidity, temperature, and rainfall in the study area are important factors influencing the existence of ACP stages in the field. Citrus host plant phenological characteristics have the tremendous ability to influence psyllid biology, survival, and resultant pest outbreaks under optimum environmental conditions [4][5][6].
Entomologists carried out various trials previously regarding population change of psyllid over time and these types of studies are significantly useful in future pest prediction and forecasting. Keeping in view, the significant effect of weather factors on insect populations and natural enemies' existence can be correlated with the changing pestnatural enemies' populations, which will better explain the density curves of both psyllid and associated insect enemies [7][8][9]. e science of ecology in natural sciences studies the mutual relationships among the biotic and abiotic components of the ecosystem to understand ecological processes and make predictions about future trends. Machine learning (ML) techniques have advantages over typical statistical approaches because these techniques are modeling ecological processes in a better way by allowing better decisionmaking and informed actions in the real world without (or with minimal) human involvement. ML techniques not only provide a flexible framework for the execution of datadriven tasks but also help for the integration of expert knowledge into the system [10]. e abilities of ML algorithms to model high-dimensional and nonlinear data with complex interactions, missing values, and identification of complicated structures from more complex datasets are defeating typical statistical approaches in population modeling [11]. Recent advancements in ML are deep learning (DL) techniques [12]. e DL approaches have the potential of automated feature learning, and the complex structures allow to solve more complex problems faster and accurately and reduce error while modeling regression problems and increasing accuracy in classification problems analysis in the presence of large dataset availability [13].
Machine learning techniques have been used in several studies related to pest's population prediction such as modeling the population dynamics of paddy stem borer (Scirpophaga incertulas) [14], the population density of Scirtothrips dorsalis Hood [15], risk of Melon thrip (T. palmi), diamondback moth (P. xylostella) [16], fluctuating trends of Dendrolimus superans population [17], population phenology of Black Planthopper (Nilaparvata lugens) [18], population occurrence of mosquitoes in correlation with different socioeconomic factors and landscape variables [19], Prostephanus truncatus infestation and accompanying damages to maize grain storage in correlation with abiotic factors [20], fluctuating trends of cotton's pest population ( rips tabaci linde) [21], and the effect of temperature and rainfall monitored by Watts and Worner [22] to the establishment of mealybug (Planococcus citri) and aphid (Myzus Persicae, Aphis gossypii, Eriosoma lanigerum, and Brevicoryne brassicae). e random forest regressor (RFR) model has been employed by researchers in various fields related to prediction and classification problems; for example, the authors of [23][24][25] used this ensemble learning approach prediction of dengue, citrus flatid planthopper, and sunn pest's nymphal stage, respectively. For earlier prediction of pest's risk, the multiple linear regression (MLR) model was adopted by numerous researchers. e authors of [26,27] implemented MLR approach to model potential risk of black planthopper and oriental fruit fly (Bactrocera dorsalis) population, respectively.
Deep neural network (DNN) has broader applicability in the following agricultural domains in general. Chlingaryan et al. [10] used DNN for estimation of crop yield prediction. e authors of [28,29] deployed DNN for the prediction of soil moisture contents, and Scher [30] used DNN for weather conditions' prediction. DNN has been also used for land cover and crop type classification, image identification, and classification of plants and weeds [31][32][33][34]. Rammer and Seidl [35] deployed DNN and RFR to predict damages that will occur in the future through bark beetle population outbreak using pest's historical data and concluded that DNN has the tremendous power to model bark beetle outbreaks' dynamics and other ecological prediction problems. is review of previous studies shows a research gap concerning the use of ML and DL models in the prediction of phenological stages of insects-pests. Keeping in view the literatures, the present study was conducted to (a) make comparative analysis of different machine and deep learning techniques to predict phenological stages of ACP and (b) monitor the cumulative effect of different weather factors and host plant phenology on psyllid phenological stages.
In present research, we made a comparative analysis of different regression-based approaches, i.e., DNN, MLR, and RFR models, to predict the population of different ACPphenological stages using environmental variables and hostplant phenology variables as independent variables. By using the abovementioned regression approaches, we evaluated the combined effect of different independent variables on three ACP-phenological stages, i.e., eggs, nymphs, and adults separately. data collection is given in [36]. We used datasets spanning 25 months to reduce experimental errors and to confirm the psyllid response in different weather conditions in different 2 Complexity seasons. If psyllid population increases in spring, then we repeated this for next spring to see if psyllid responds similarly. Meteorological data during the experimental period regarding daily temperature (maximum, minimum, and average temperature), rainfall, and relative humidity on daily basis were documented from the meteorological observatory of the Crop Physiology (CP) department in the Agricultural faculty of UAF. e effect of meteorological (abiotic) factors was also monitored by calculating the percentage of branches infected with different life stages of ACP, i.e., eggs, nymphs, and adults individually and collectively.

Model Development.
In this study, three models, i.e., RFR, DNN, and MLR, were employed to model population dynamics of ACP. e Google Collaboratory was used, and it is an effective cloud computing environment for developing python-based applications.

Random Forest Regressor.
Random forest is an ensembled learning approach proposed in [36] and used both for regression and classification problems' analysis [37]. Each random forest is composed of a specified number of decision trees, and each decision tree trains on samples of training data by following a randomized approach called Bagging (Bootstrap aggregating). Random forest regressor returns the output in the form of the mean value calculated from the results of the prediction of all decision trees. It minimizes the effect of model overfitting by introducing randomness in variables and data instances' selection. RFRs have the capability of efficient training and testing. As each prediction is made by random forests (RFs), a built-in mechanism is usually found in RF to calculate test errors, e.g., root mean squared error (RMSE), mean absolute error (MAE), and confidence [38]. Hyperparameter tuning is an important step in the development of models. In order to train the RF, we set the value of n_estimators (number of decision trees) as 20 and random_state � 42, while keeping other hyperparameters with their default values. We used RMSE as a loss function to calculate test errors. e mathematical formulation of RMSE is given as follows: where yi and yi are the actual and predicted values, respectively, and n is the number of observations. For evaluating the accuracies of the forecasting models, RMSE is a common indicator used in regression problems analysis [39].

Deep Neural Network.
Artificial neural networks (ANNs) were developed in the middle of the nineteenth century. e term "deep learning" refers to training of deeper and larger ANNs. Here, deeper and larger are concerned with more layers and more neurons as compared to ANNs [12]. DNNs are the result of recently developed improved algorithms which are optimizing the weights of the connections [40].
For predicting the population phenology of ACP, we developed a DNN comprised of one input layer consisting of six input neurons/nodes and two hidden layers with six and eight neurons, respectively. e activation function and optimizer used are ReLu and Adam, respectively. DNN architecture also consisted of one output layer with a single neuron to predict each ACP life stage, i.e., eggs, nymphs, and adults separately. We used dense layers to make the model more stable for prediction ( Figure 1).

Multiple Linear
where y refers to predicted or response variable. e range of predictors or controlled variables starts from x 1 to x k . β 0 is called the intercept or constant variable and β 1 to β k are the regression coefficients of controlled variables. ε is fitted or residual error to indicate the uncertainty in the model [41]. We normalized the dataset before fitting the MLR model on the respective dataset to monitor the population growth of ACP in relation to host plant phenology and different abiotic factors.
2.6. Feature Importance. To measure the importance of different input variables for predicting different ACP-phenological stages, we used a feature importance graph using RFR. e feature importance graphs for eggs and nymphs reveal that "branches with flush" is one of the most important variables for ACP egg and nymph growth (Figure 2).

Effect of Abiotic Factors on Population Fluctuations of ACP.
To study the impact of various abiotic factors on the population phenology of D. citri during the experimental time duration of 25 months on an individual and cumulative basis for different citrus species, correlation coefficient values by using Minitab software were calculated (Table 1).
In the case of the ACP-eggs' population, host plant flush growth patterns and average weekly relative humidity have a significant and positive relationship with ACP-eggs' production and growth as R � 0.44 and 0.247 and p ≤ 0.05). ACP-nymphs' growth was found to be positively correlated with input variable branches with flush as R � 0.48 and  (Table 1).

Comparison of Different Regression Approaches to Predict ACP-Eggs' Population.
To predict ACP-eggs' population, all three employed models were fitted using training data. We experimented with these regression-based approaches on eight types of datasets ( Figure 3). Figure 3 shows a comparison of actual and predicted values. We have categorized models from best to worst in the context of their performance in ACP-eggs' population. In most cases, the DNN model resulted in the least RMSE value of 0.63925 while predicting the ACP-eggs' population. e RMSE value was computed by taking the mean of RMSE values obtained from results of eight datasets. e RFR model was the other best regression approach which resulted in the second least RMSE value of 0.70375. RFR is an ensemble method which is much efficient in extracting meaningful information from the given data. It was found to be true in previous studies [18,35]. e MLR model resulted in the RMSE value of 0.7935 as it could not perform well in comparison with other approaches deployed for ACP-eggs' population prediction.
In the case of ACP-nymphs' population prediction, the DNN model performed better when compared with the other competitive approaches (Figure 4). DNN has the least RMSE value. Before the training of a prediction model, hyperparameter tuning was performed to attain the best parameters for each model. en, models were retrained using these best parameters to obtain minimum the loss function's values. e residuals calculated by DNN, RFR, and MLR models were 1.1875, 1.38775, and 1.2715, respectively ( Table 2).
ACP-adult stage considers to be more threatening for all ornamental and citrus plants. Timely identification and removal from citrus cultivars is a matter of great interest for citrus growers. While predicting the ACP-adults' population in relation to different abiotic variables (Table 1) and flush growth patterns, the DNN model resulted in an RMSE value of 3.6776 which was the least RMSE value as compared to residuals computed by RFR and MLR models ( Figure 5). RFR and MLR models resulted in 6.0553 and 8.6883 residuals while predicting ACP-adults' population's fluctuating trends, respectively (Table 2).

Discussion and Conclusion
Pest's population prediction can be used as a tool for areawide integrated pest management programs as it will help to reduce the applications of agrochemicals in fields [27]. Different abiotic factors can be used as independent variables for building a pest's population prediction model [44]. Along with abiotic factors, there are also some biotic factors that can be used for predicting pest population abundance, e.g., host plant phenology [27,45]. It was found that, during seasons of abundant flush growth, more infestation of ACP-eggs and ACP-nymphs were observed in citrus orchards, and the same effects were observed in [1,8,46]. Proper pest management strategies will help to conserve psyllid-natural enemies by minimizing pesticides applications in fields so that they can play their role as a biocontrol agent against ACP effectively. Optimum climatic conditions and host plant phenological  [4,5,47]. ACP's population was found to decrease significantly with rainfall and relative humidity and increased with temperature. e ACPadults' population was seen at its peak from March to April and September to October where maximum ACPadults' population was observed in former study duration.
DNN is an appropriate choice for modeling the ACP population dynamics prediction problems as it has the potential to model complex data [35]. DL, an emerging and powerful evolution in ML, can become a powerful tool for ecologists because of its quantitative and predictive nature [48][49][50]. Because of generalizability of DL algorithms, they are competent models for prediction problems specifically in ecology and generally in all domains of research related to     4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22    Branches with flush, were used in present regression-based models. e key findings of this research can be summarized as follows. (1) e DNN model with differently tuned hyperparameters (Input, hidden, output layers, activation functions, and optimizer) is best suited for predicting population phenology of ACP.   8 Complexity corresponding ACP-phenological stage was considered as the best prediction model.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
ere are no conflicts of interest regarding the publication of this paper.