End-to-End Semantic Leaf Segmentation Framework for Plants Disease Classification

,


Introduction
e plants' diseases in crops and fruits have adverse effects on agriculture production. If these diseases are not identified and treated on time, an increase in food insecurity can occur. Some particular crops, such as wheat, rice, and maize are vital for ensuring the food supply as well as agriculture production. Early warnings and some forecasting are very effective prevention in controlling plants' diseases. Forecasting and prevention play an essential role in adequately managing agricultural production. However, until now, visual observations of producers are the only approach for various plants' disease identification in mostly rural areas, specifically in less developed countries. Continuous monitoring of experts is needed, which might be prohibitively very expensive in large farms.
Similarly, to contact experts' farmers may have to travel large distances, which also makes the consultation expensive and time consuming. We argue that this conventional approach is not practical for large farming areas looking into the demands of the crops in the production industry. erefore, automatic plant disease recognition and classification are still crucial topics in computer vision (CV).
Diseases seriously affect the health of every living organism, including plants and animals. e state-of-the-art (S.O.A) algorithms in the CV and ML domains have enabled us to identify diseases beyond human accuracy. Interestingly, the CV and ML algorithms can be applied to all domains, including plants and humans, with almost no difference in implementation. Modern technology enables human society to generate sufficient food to fulfill the human population's requirements. Conversely, numerous factors still affect food safety, including plant diseases, climate change, and the decline of pollinators. Plant diseases are the primary danger to food safety. At the same time, the deficiency of essential infrastructure makes it hard to identify these diseases in many parts of the world quickly. With the latest developments in CV algorithms, the ML paradigm paves its way for agile and on-spot disease diagnosis.
Plant diseases not taken seriously have caused a decline in agricultural productivity in several countries worldwide. e disease symptoms have a detrimental effect on crop growth, restraining yields and rendering agricultural goods unsuitable to be consumed. erefore, their early detection, modeling, and recognition are essential. is article explores the modeling, detection, and recognition of plant diseases that involve appearance-based analysis and can be captured and modeled using ML. Since the leaves of plants provide expressive appearance-based modeling, from modeling perspectives, our interest is inclined towards disease detection in tomato plants using deep learning (DL).
Determining the health quality of a plant is essential. Several models have been created to deter the loss of crops to pests and diseases. Plant disease symptoms are usually noticeable when the leaves change color or shape. Traditionally, the identification of pests and diseases was done using the naked eye and was supported by agronomic organizations. Presently, detection of plant diseases and pests can be done through machine vision. Plant disease identification using ML is not a new research field. CV experts have reported many good papers worth mentioning in [1][2][3][4][5][6]. e extensive penetration of smartphones, high-density cameras, and high-performance processors have made it possible for diseases to be detected by automated image recognition.
is paper proposes an end-to-end (E2E) segmentation model for plant disease identification and classification. e model uses semantic leaf segmentation (SLS) using an optimized CNN. By successfully highlighting the foreground and background regions, the proposed model classifies them into healthy and disease parts. e proposed model encodes the high-density maps and classifies tomato plant leaves into ten different categories of various diseases. Our model outperforms previous approaches in an evaluation setup on the PlantVillage database. e significant contributions of the proposed work are as follows: (i) A new CNN-based algorithm for plants' disease recognition and classification has been presented in the paper. ML and DL experts have already proposed numerous methods for plants disease recognition; the novelty of our model is providing information about each pixel of a leaf image, which tells if a pixel belongs to a diseased or healthy part. (ii) Second, we contributed a new dataset for tomato leaves disease classification. We collected these images from the Internet. ese images are labeled manually as healthy and diseased images. e database will be available after the publication of the proposed work for research purposes. (iii) e proposed model also provides information regarding how much leaf area is affected by a specific disease. Most of the previous ML-based approaches do not provide this information. Along with predicting the diseased and healthy part, information regarding how much leaf area is affected by a specific disease is also provided.
e presentation of the remaining paper is arranged as follows. In Section 2, we discuss previous research work on the topic. Both conventional ML and DL-based methods are discussed in this part. e proposed CNN-based method is discussed in Section 3. Section 4 presents the experimental setup, obtained results, and comparison of obtained results with previous results. Finally, the conclusion is presented in Section 5 with some promising future directions.

Related Work
Plant diseases typically affect the growth of crops in all stages of development and sometimes may lead to the death of the plant. Plant diseases affect food security globally and affect small subsistence farmers who depend on their crops for food and livelihood. erefore, determining the health quality of a plant is very important. Several models have been created to deter the loss of crops to pests and diseases. Traditionally, the identification of pests and diseases was done using the naked eye and was supported by agronomic organizations. Plant disease identification using ML is a well-researched area. CV experts have reported many good papers on the topic [1][2][3][4][5][6][7][8][9].
In meticulous agriculture, the subdivision of crops in agricultural images is vital. Various techniques have been deployed for the segmentation process, such as SS. e SS marks the multiple features in an image into semantically meaningful items and classifies each item into a class. For example, the various classes can be leaf, stalk, or flower in plants. Several studies have used different SS techniques to identify plants from nonplants. For example, Sodjinou et al. [10] suggest a method grounded on the mixture of SS and K-means for detecting weed from images. K-means algorithm is used for categorizing things that belong to similar groups. e proposed technique provided a more accurate segmentation of weeds and plants from the study results. Several approaches were used by Miao et al. [11] to semantically segment hyperspectral images of sorghum plants, such as manual pixel annotation and classifying each of the pixels as either nonplant or plant. e scholars further classified the plant as belonging to either a panicle, leaf, or stalk of the sorghum plant. ey could separate the plant pixels from the background, only that they could not classify to what organ the plant pixel belonged. In another study, Li et al. [12] used the region-based segmentation to detect crops from the images derived from a natural field. ey used the method to detect cotton specifically. e model was successful as it could even detect the boll opening stage of the cotton plant.
While identifying a plant from a nonplant can be an easy task using SS, identifying plant diseases through an image is tough since plants are complex environments. rough the developmental stages of crops, their flowers, fruits, and leaves change constantly. During the day, solar radiation affects plants' spectral response, so their appearance also changes slightly. Additionally, different shapes, layouts, and colors of plant diseases make them difficult to recognize. Regardless, several successful techniques improve detection methods for diseases in plants, both in a controlled environment and in natural conditions.
Chen et al. [13] proposed using BLSNet to recognize the rice bacterial leaf streak in rice and segmentation based on UNet network. Rice bacterial leaf streak (BLS) is a threatening disease usually found in rice leaves. BLS affects the yield and quality of rice. BLSNet used a large-scale extraction and an attention mechanism to increase the precision of segmentation of the lesion.
One technique that has been widely successful in identifying plant diseases is SS through CNN. e layers of CNN can be viewed as corresponding filters that are directly taken from the input data. CNNs bring out a hierarchy of visual images adjusted for a precise task. e accuracy of CNNs in detecting objects such as plant diseases and image organization has made incredible growth over time [14]. e CNN-based classification network implementation is the most regularly used pattern in categorizing plant diseases and pests, owing to CNN's strong feature extraction capability. Zabawa et al. [15] used SS using convolutional neural networks to extract phenotypic traits in grapevine berries.
According to Bhatt et al. [16], CNN-based methods have been used to achieve extraordinary results in supervised image segmentation of leaves. Usually, the methods used work under fully controlled conditions, whereas the deep CNN models are built on various changing parameters. However, the images would have different backgrounds, lighting conditions, obstructions, and overlapping in an environmental setting. In various stages of growth, plants do have a reasonable amount of variation. e authors propose unsupervised machine learning algorithms to segment the leaves images to make it possible to be applied to various crops and regions. Afterward, the specific segments are then assessed for their texture, size, and color to measure any change, such as the presence of a pest or disease.
Unsupervised feature learning, with fully convolutional networks (FCN) followed by conditional random fields, makes it possible to segment images into an optimal number of clusters devoid of any prior training. e real-time performance of this technique allows easy distribution of devices such as cameras and mobile phones in the fields. In addition, Shao et al. [17] propose using localization and DLbased method to recognize dense rice images. e proposed model can be used to determine rice diseases. e results from the study show that better results can be obtained compared to conventional ML methods. e SS method grounded on deep CNNs can also identify crops from the compound and natural field environments [18]. According to Martins et al. [19], it can also detect tree canopies in an urban setting. e SS method based on DL demonstrates great precision in remote sensing categorization as well, and it necessitates vast sets of data in controlled learning [20]. e simple notion of DL is using a neural network for analyzing information and learning image feature. In their study of estimating sorghum panicles, Malambo et al. [21] applied an image analysis method founded on a SegNet framework. Sorghum panicles are critical phenotypic data in the improvement of sorghum crops.
e study results demonstrated that DL combined with SS shows excellent precision with large data. On the other hand, Pena et al. [22] suggest using data fusion to enrich images used in remote sensing.
In very recent works [23][24][25][26], plant disease recognition models have been improved for better results and performance. Manjula et al. [24] have used ResNet-50 architecture, a variant of the Resnet model that has 48 convolution layers. e accuracy of the developed system is around 97-98%. Chen et al. [25] have improved the plant disease-recognition model based on the original YOLOv5 network model, which accurately identified plant diseases under natural conditions. Hassan and Maji [26] have proposed a novel deep learning model based on the inception layer and residual connection. ey used Depthwise separable convolution to reduce the number of parameters, which led the model to achieve higher accuracy.

Dataset Description and Data Annotation.
e typical DL-based methods require sufficient data for the training phase. In contrast, conventional ML can also be trained on limited data scenarios. One of the main drawbacks of DL techniques is requiring a large amount of data. In this research work, we used an already available dataset and also collected our own database. We use two kinds of data in our experiments, the details of which are provided next.
PlantVillage database [27]: e PlantVillage database is publicly available for downloading and research purposes. It is an open-access repository having more than 54K images. PlantVillage is a large dataset with various plants' leaves and related materials collection. Most of the data in this database are collected in controlled laboratory conditions. Exposure to the real-time scenario is significantly less in the Plant-Village database. erefore, most researchers using only PlantVillage database get nearly perfect results. e database includes images of 14 crops, including grape, corn, tomato, and soybean. e database consists of 10 folders, one for healthy leaves and the remaining for nine different kinds of diseases listed in Tables 1 and 2. We use a subset of images for the tomato plant. Our subset consists of around 16012 images of plant leaves collected from tomato plants. e total number of classes in these images is limited to ten only. Nine classes are of various diseases for tomato plant leaves, whereas one class is for healthy leaves. We keep the resolution of each image as 250 x 250 pixels. Some sample images of the database used are shown in Figure 1.

Complexity
We use all images of the tomato plant contained in the PlantVillage. e diseased leaf images vary from 373 to 5357, as clear from Table 1. e total number of healthy images for tomatoes in PlantVillage is 1591. It is clear from Table 1 that all the ten classes in the dataset are not balanced as far as the number of images is concerned. On the one hand, the minimum value is 373, with a maximum of 5357. We use data augmentation methods to balance all classes, including adjusting the contrast, flipping images vertically and horizontally, and changing brightness levels.
TomatoDB: since images in the PlantVillage are simple and less challenging, comparatively good results are reported in the literature. To assess the framework's performance more precisely, we also tested our model on a collection of images we had taken from the Internet. Our own collected dataset consists of more than 20000 images taken from tomato plant leaves. We collected these images from the Internet. While image collection, real-time scenarios, and more challenging conditions have been considered. e database TomatoDB will be available to the research community after the publication of the proposed research article. All ten classes are equally considered while collecting the database.
For SLS, correctly labeled leaf data for each pixel is needed. is ground truth data are created through annotation. We annotated these images manually using the interface we developed.
is labeling involves selecting the areas of interest, random sketch application, adjustment of contrast and brightness, and assigning a label. Such kind of manual labeling is prone to errors. No automatic tool is used in such labeling. e labeling is highly dependent on the subjective perception of the human doing this labeling process. Hence, chances of error exist while providing an exact label to every pixel.
Images setup for experiments: e model we presented in this paper is applicable and valid to any plant disease with some visible symptoms. However, manual labeling will be needed to create an SS framework for training purposes. As a test case, we select a tomato plant with ten classes. However, since images in the PlantVillage dataset are not exposed to light and other variations, we collected some images (20000) from the Internet. Some tomato leaves images we collected from the Internet are shown in Figure 2. We use a combination of PlantVillage dataset images and our own collected data set. We split the dataset into the ratio of 80 to 20, a commonly used strategy for training and testing DL-based models. Some of the authors also adopt 5-fold or 10-fold cross validation. We set 80% data for training and 10% for validation to know the model overfitting problem. We resize each image in PlantVillage and TomatoDB to a size of 250 x250 before training and testing.
e following section discusses all the hyperparameters of the deep CNN-based model.

Deep Model Learning.
e performance of the visual recognition tasks is improved with the introduction of DLbased methods [28][29][30][31][32][33]. e proposed paper addresses leaf disease recognition and classification using deep CNNs. We utilize the concept of SLS in the proposed research.
Convolution layer: is layer plays a vital role in the features extraction stage. e CovL is an essential component of the CNN model. e layers consist of a set of learnable filters. ese terms are also known as kernels [34]. In this convolution process, the filter with a specific size slides over the image and is convolved with pixel values of the target image. e dot product is computed between kernel and input image pixels producing a feature map.
ReLU: We use ReLU as activation function. is function plays a crucial role in converting the input signal from a specific network node to the output signal. e resultant signal obtained a form as shown in equation.
Pooling layer: e pooling layer follows the CovL. e output from the CovL is given to the pooling layer. ML experts use three pooling strategies: random pooling, maximum pooling (MPL), and average pooling. We, in the proposed work, adapted MPL. e MPL achieved spatial invariance by reducing the feature map size obtained previously from CovL [35]. In this strategy, the max operation is applied to the feature map when the feature map is passed through MPL. is operation can be performed as described by : Classification: we use the SoftMax classifier for classification. e pooling layers provide a feature vector to the  SoftMax in the output layer. In the output layer, a function that appears is the activation function for a multiclass classification problem. e activation function calculates a vector having a real number (k) and performs the normalization task. e normalization converts input values into vectors consisting of probability values in the range 0 to 1. e Softmax returns each class probability value, having the maximum probability value as the target class [36].
Adam: It is a standard optimizer that computes individual adaptive learning rates for each parameter [37]. e exponential decaying average of previous gradients n t is used by this optimizer. e proposed framework is presented in Figure 3. Tables 3 and 4 summarize various parameters in the proposed CNN framework. As an activation function, we use ReLU. For constructing CNN-based model, we use three layers containing CovL, MPL, and FCL. e details of these layers with feature map description, kernel size, and stride are summarized in Table 3. e feature extractor extracts the features from the images of the leaves, including healthy and affected leaves. More description of the feature extraction part is in Figure 4. Features variation is handled by stage 1. Certain environmental factors produce scaling variations in images. ese receptive fields overcome all the variations. Each field has sixteen filters. Stage 1 output is given to stage    Tables 3 and 4 and Figure 4. Data including both training images and ground truth are given to the framework. e density map is predicted in density estimation (DE), taking supervision from the ground truth data. We combined the segmentation map and DE map, feeding the results to the CovL. Loss is added to the algorithm (Dice Coefficient) in the SS section. Additionally, we add Euclidean distance loss for optimizing the estimated density maps.

CNN Optimization.
A complete illustration of how hyperparameters are tuned and optimization is performed is presented in this subsection. Overfitting is a severe problem faced mainly by ML models. We use the methodology as suggested and used in [38] to tackle this problem. We use a combination of four different loss functions. We use the Euclidean distance for better optimization. e obtained segmentation density map can be written as shown in (3) and (4) (3) In (3), p shows the estimated density in the supervision process. Similarly, P k represents the estimated density, and P k represents the ground truth density value. Similarly, M represents the pixel numbers in the GT density map.
We also introduce a loss in the SS part of the framework. e loss in the framework is due to the dice coefficient. e dice coefficient is two times the overlap area between the predicted segmentation and true values. e result is then divided by the total pixels in the ground truth and the original image. e range of the dice coefficient is between 0 and 1. We use another special loss function, called crossentropy loss, which we represent as In (5), the symbol Q represents the total sample, and C shows the number of classes used. Similarly, the ground truth class is shown by x b c , whereas the estimated output is represented by x b . e final weighted loss function is represented by : W.L � Loss int + Loss den + λLoss X−entropy .

Experimental Setup.
We perform our experiments with an Intel i7 workstation and employ NVIDIA GPU 840 graphics card. We perform all our experimental work with Tensor-flow, Keras, and Python. e number of epochs we use is 500, having a batch size of 150. We use the base learning rate as 0.0001 and the dropout rate as 0.4. We use two datasets for experimental work: including PlantVillage  6 Complexity and our own collected database. e PlantVillage consists of more than 16K images, and our own database consists of more than 20K images. We combined both datasets and performed our experiments in the ratio of training to testing as 80 to 20.

Limitations of the Proposed Work.
e numerical solutions and results reported in this paper show that a good performance is achieved by the proposed method; however, our proposed algorithm still has limitations. It is a fact that the research community has concerns about using DL architectures. All DL-based methods are complex and require inputs at several stages. Researchers using these techniques rely on a trial and error strategy. To summarize, these methods are time consuming and very well engineered. However, it is also confirmed that the only choice CV experts have for any CV-based task is DL methods. We use the idea of SLS in our proposed work. Ground truth data are needed for the training and testing phases to implement this model. In order to create the ground truth data, manual labeling is required. Since a single person does all this manual labeling, errors are expected most of the time in labeling. We also did this labeling manually through humans, which is a weakness of our proposed method.

Reported Results and Its
Discussion. Some conclusions that emerge from the results and experiments are summarized in the following paragraphs.
(i) Plant disease classification and identification using ML is not a new research area for CV and ML experts. e state-of-the-art reports many good papers on this topic. Due to diverse applications in agriculture, researchers explored this field sufficiently. However, we notice less emphasis, particularly on interclass disease identification. Researchers mainly focus on a single plant disease recognition, whereas our proposed work focuses on tomato plant disease classification with ten classes.
(ii) Initially, we run the whole experimental setup for a maximum of 14 epochs (please see Figure 5). We run this setup to know how the model performance varies on training and validation databases. As clear from Figure 5, training along with validation accuracy changes very quickly up to value 6. After value 6, change occurs very slowly in the upcoming epochs. Both training and validation losses are also shown. It is clear that loss is high in the initial stages and is gradually reduced after increasing the epochs. is loss reduction clearly shows that the network is fine-tuned gradually with increasing epochs. (iii) We use ten class disease problems in our work. e names of the classes, along with abbreviations, are shown in Table 5. We report the results for P r , R c , and F m for all the ten classes. It is clear from Table 5 that near-perfect results are reported for the class BS using all three evaluation measures. Similarly, better results are reported for the classes LB, TMV, SM, and HL. e worst performance has been shown for the class EB with precision 0.93, recall 0.95, and F-measure 0.95, which also shows acceptable and good results. Our proposed method semantically segments leaf images into background and foreground. Please see some images in Figure 6, where column 1 represents the original images, column 2 ground truth, and column 3 segmentation results. After foreground estimation, each disease classification is performed. Moreover, it is also estimated how much percent of the leaf area is affected by a disease. (iv) C mat is the best choice for multiclass evaluation problems, which ML experts commonly use. It shows the corresponding percentage of the  e C mat for the reported results for the 10-class problem is demonstrated in Table 6. e results vary from 94% (lowest) to 100% (highest). e lowest results are reported for EB, whereas the highest values are reported for HL. e LM, YLCV, BS, and LB results are comparatively better, with predicted accuracy values as 99%, 98%, 98%, and 97%, respectively.

Performance Comparison with Previous Results.
We compared the reported results with S.O.A. in Table 7. It is clear that the reported results are far better than previous results.
e reported results and their comparison with S.O.A. are for accuracy measure only. As most of the papers reported their accuracy results, we compared our work with this metric only. We want to add that some research papers reporting results on plant disease classification using handcrafted features show better results than DL-based methods. However, we believe a better understanding of DL methods   8 Complexity is still required to address a specific task. For example, the requirement of a large amount of data is a problem DL methods face. Generally, traditional ML methods perform well on data collected in indoor scenes; however, researchers report a significant drop in performance when these methods are tested in real-time scenarios. On the contrary, DL architectures extract a higher level of abstraction from the data with much better results. us, the need for feature engineering is minimized to a large extent with DL algorithms.

Summary and Concluding Remarks
Due to diverse applications in the agriculture sector, plant disease identification using DL is an active area of research. Plant disease recognition is more challenging when the method is exposed to real-time data. However, CV researchers have shown tremendous progress in the past 5 to 10 years. Our current research provides unification and extension of our previous work reported in [7]. Our study is mainly motivated by looking into the human visual cortex to design an E2E trainable neural network architecture. We propose an E2E SS framework for plant disease identification using DL. We introduce the idea of SS for plant disease recognition. e proposed model predicts the nature of the disease of the tomato plant and tells how much area of a specific leaf is affected due to a certain disease. e model successfully classifies tomato plant leaves into ten distinct classes. We present a novel loss function that improves the model's performance on a stateof-the-art dataset. We evaluate our model with the standard dataset PlantVillage, noticing much better results than previous results. Along with the PlantVillage database, we also collected a database of more than 20000 images and tested our framework on it. We expect more evaluation using a much better optimized DL model for plant disease recognition from the research community. In the future, we intend to analyze some more tasks to develop robust continual DL models, considering some complex combinations of the neural network along with information extraction.

Conflicts of Interest
e authors declare no conflicts of interest.  EB  LB  TMV  HL  LM  YLCV  BS  SLS  TS  SM  EB  94  1  0  0  2  1  1  1