Greenhouse crop production is growing throughout the world and early pest detection is of particular importance in terms of productivity and reduction of the use of pesticides. Conventional eye observation methods are nonefficient for large crops. Computer vision and recent advances in deep learning can play an important role in increasing the reliability and productivity. This paper presents the development and comparison of two different approaches for vision based automated pest detection and identification, using learning strategies. A solution that combines computer vision and machine learning is compared against a deep learning solution. The main focus of our work is on the selection of the best approach based on pest detection and identification accuracy. The inspection is focused on the most harmful pests on greenhouse tomato and pepper crops,
European agriculture is facing numerous challenges such as population growth, climate change, resource shortages, or increased competition. Today’s challenge is to produce “more with less.” Greenhouse crop production is growing throughout the world, generating 46,377€/ha across Europe. Greenhouses protect crops from adverse weather conditions allowing year-round production. Integrated crop management approaches provide better control over pests and diseases. However, the intensification of greenhouse crop production creates favourable conditions for devastating infestation that can cost a 25% of the potential income.
A pest in agriculture is defined as a population of animals that feed from crop plant tissue (Phytophagous), producing economic damage. Most pests are insects or mites. The development of a pest depends mainly on the local weather, external insect’s pressure, greenhouses design, and crop management practices. Pests can severely damage crops causing important losses: economic (loss of productivity, income, and investments), social (rural areas depopulation), and psychological (commotion and panic). Not only does the presence of pests and diseases represent a risk for the farmer that owns the exploitation, but also it represents a threat for adjacent and sometimes distant holdings. According to the FAO, one of the most important pests in greenhouse or protected cultivation is whitefly, the greenhouse whitefly being predominant (
Early pest identification is of paramount importance in terms of productivity and reduction of the use of pesticides. Eye observation methods have been used in recent years, but they are not efficient in large crops. The automation of this repetitive inspection task can be done by computer vision system in order to increase the reliability and productivity. Furthermore, endowing robotic systems with pest detection capabilities will allow developing innovative and efficient solutions for Integrated Pest Management (IPM) in crops, with robots that have the ability to navigate inside greenhouses while performing early pest detection and control tasks in an autonomous way.
The requirements for computer vision based pest detection vary from species: their location in the plants and the features at the different stages in their development cycle. There are different aspects and visual characteristics of the different pests during their development from egg stage to adult stage. In the case of the whitefly, normally, the pest is located in the under part of leaves. In contrast,
In addition, the integration of an automated pest detection system in a robotic platform implies limits on computation hardware. The inspection solution needs to work in real time in order to modify, if it is necessary, the autonomous robot inspection route on the greenhouse. Algorithms response time is critical in order to inspect more plants due to the robot’s batteries duration. Pest detection and identification techniques with faster response time will be executed in real time on the robot hardware. Other algorithms with higher accuracy and higher response time will be executed on external servers, but the results will not be used in real time.
The work presented on this paper is focused on the most harmful pests on greenhouse tomato and pepper crops, the polyphagous. The main polyphagous pests are the sweet potato whitefly,
This paper presents the work performed to develop and compare two different approaches for vision based automated pest detection and identification, using learning strategies. The paper shows the work performed to select the best approach based on pest detection and identification accuracy. The first approach uses computer vision techniques for pest detection and machine learning for pest classification, while deep learning technique is used for both pest detection and classification. The comparison of both approaches is performed using a large number of pictures that are generated and labelled using realistic setups.
From a qualitative point of view, both approaches present advantages and disadvantages. Solutions based on deep learning algorithms are demonstrated to be very effective in image processing, showing high performance and good results in different research and industrial applications. The main disadvantage of the deep learning approach is its “black box” nature, making it difficult to understand the reason why a deep learning based algorithm makes a specific prediction. On the other hand, computer vision and machine learning algorithms need fewer amounts of data and less time to train them. Deep learning model creation process needs also more amount of computational power. The objective of this paper is to provide objective and quantitative evaluation of the performance of both approaches.
The paper starts with describing the current state of the art of the automatic pest detection and identification in agriculture using computer vision, machine learning, and deep learning strategies in Section
Automatic pest identification is an active research topic in the last years. In most cases, computer vision, machine learning, or deep learning technologies are selected and used to detect plant diseases, but the comparison of the different possible techniques in the same work is not usually found; instead, normally a single approach is selected. Many works on automatic pest detection and identification are focused on a specific selected technological approach [
Computer vision and object recognition made huge advances in last years. Large Scale Visual Recognition Challenge (ILSVRC) [
Latest developments in machine learning and deep learning allow drastically improving the accuracy of object recognition and detection. On the one hand, machine learning methods have been applied as solution for disease detection [
Different approaches to disease detection and classification via machine learning in tomato crops have been analysed. First, using RGB images and different machine learning algorithms (SVM, linear kernel, quadratic kernel (QK), radial basis function (RBF), multilayer perceptron (MLP), and polynomial kernel), tomato yellow leaf curl disease (TYLCD) is detected [
Most of the classifiers in disease detection and classification via machine learning were trained with small datasets, focusing on the extraction of image features to classify the leaves. A large, labelled, and verified dataset of images of diseased and healthy plants is necessary in order to develop an accurate image classifier. Until very recently, it was not available any dataset with these features. To solve this problem, the PlantVillage project has begun collecting and labelling tens of thousands of images of healthy and diseased crop plants. PlantVillage dataset is used as the approach of training deep learning models for different crop disease diagnosis. It is used in most of the latest research projects related with pest detection and deep learning. It contains 18,160 tomato pictures of bacterial spot, early blight, late blight, leaf mold, septoria leaf spot, spider mites, two-spotted spider mite, target spot, tomato yellow leaf curl virus, and healthy leaves.
Unfortunately, it is not possible to use PlantVillage dataset on this work because it does not contain images corresponding to the three diseases this work is targeting. To overcome this gap, the project will generate its own dataset of
Related to deep learning for tomato diseases classification and symptoms visualization, [
The work described in this paper is focused on the comparison of two pest detection and identification techniques. Selected technique will be implemented on a pest detection autonomous scouting robot. As the selected approach for pest detection and identification is based on learning algorithms, it is necessary to generate and label a dataset with infected leaves with
As it has been explained in Section
The quality of the dataset and its labelling will impact on the accuracy of the generated models. On the one hand, manual pictures are taken in order to have pictures with defined features inside the cultivation chamber. Figure
Cultivation chamber and the automatic dataset generator system.
Completely enclosed boxes have been used for the tomato cultivation on the cultivation chamber. These cultivation chambers allow eliminating both external and internal factors such as the contamination by other pathogens. Chambers have been infected with different selected diseases. Plants cultivated in Mendelu’s [
Images are taken using the colour camera AP-3200t-PGE and monochrome camera DataCam 2016R using a standard display system connected to a PC. The selection of the captured area is performed by the worker based on his knowledge and instructions of the leader of the experiment. Different types of lenses and lighting systems are used when necessary. The focus and the shutter speed are manual, and the settings values are determined by the operator's experience. 13,047 pictures were taken in a manual way, 6,016 using monochrome camera DataCam 2016R and 7,031 using the colour camera AP-3200t-PGE.
The following section contains the description of the automatic dataset generator system that is installed on the greenhouse. A huge number of images are needed in order to generate models for pest detection and identification. The variability between different images is crucial when the objective is to create an accurate model for pest detection and identification in pictures. The automatic dataset generator system will complete the dataset that is generated manually by the previous approach. The proposed solution is to take pictures every minute on the greenhouse. As a result, there are pictures with distinct angles, directions, illumination, and localization.
Automatic dataset generator is composed of two microcontroller, two cameras, two tripods, two USB flash drives, two artificial illumination systems, one pan-and-tilt structure, one tilt structure, one power source, one IP65 box, and one portable Wi-Fi 4G router (Figure
Automatic dataset generator system components.
The generated dataset contains pictures of different phases of the plant growth. It also contains pictures of both healthy and infected leaves. Distinct phases of infected leaves are in the dataset. Pictures need to be labelled with its specific disease in order to use this dataset to generate machine learning and deep learning models. Initially, the automatic dataset generator system is placed in a random location of the greenhouse. Then, the system is moved near the place where a disease is located.
The first phase of the generation of the dataset started the
There were different issues on the first phase of the dataset generation that are already solved in the second phase. Pictures without a correct illumination, not correctly focused, with some type of obstacle and not representative were removed from the dataset. 100,593 pictures were collected using the first microcontroller and 75,741 pictures were collected using the second microcontroller. There were many invalid pictures on this set of pictures, and finally 18,050 valid pictures were obtained from the first microcontroller and 19,692 from the second microcontroller.
The quality of the labelling will affect the accuracy of the generated model. There are different open source and commercial solutions in order to label pictures in a faster and semiautomatic way. LabelImg [
Image labeling is a manual and time-consuming work. Due to the necessary time to label all the pictures by experts (Mendel University), a semiautomatic algorithm was developed to sort all the pictures. Pictures were labelled considering the order of the generated list. The order was established measuring the image quality, the variability, and a random selection. Image Quality Assessment (IQA) algorithm was developed to obtain the image quality score between 0 and 100. Generated dataset was divided by pictures of every day. The variability of the order of the list for labeling was obtained selecting both pictures with the higher quality score and some other pictures randomly every week. Every week the order of the list was updated with new pictures and list changes.
4,331 pictures have been labelled using image labelling tool (Figure
Labels summary.
DISEASE | TAGS |
---|---|
| 25,313 |
| |
| 9,559 |
| |
| 13,405 |
| |
| 6,466 |
| |
TOTAL | 54,743 |
Image labelling tool.
A well-known computer vision library used in different industrial projects is selected as the computer vision and machine learning software (HALCON) [
Figure
Computer vision and machine learning approach flow.
The pest detection flow using computer vision can be divided into three different steps (Figure
Computer vision for pest detection flow.
Table
Computer vision functions for pest detection.
IMAGE PREPROCESSING FUNCTIONS | |
---|---|
Check image quality | Checks the image quality level in order to determine whether the image is processed or a new one is requested. Different functions or filters are applied depending on the image quality level. |
| |
Emphasize image | Enhances the contrast of the image. |
| |
Gauss filter | Smoothens an image using discrete Gauss functions. |
| |
Illuminate | Very dark parts of the image are illuminated more strongly, and very light ones are darkened. |
| |
Image enhancement | Modifies the image to improve its visual appearance. Sharpening and magnifying algorithms will accentuate pictures features. |
| |
Image restoration | Removes blur and noises from images. |
| |
BACKGROUND SUBSTRACTION FUNCTIONS | |
| |
Decompose RGB | Converts a three-channel image into three one-channel images with the same definition domain. |
| |
RGB to HSV | Transforms an image from the RGB colour space to an HSV (Hue, Saturation, and Value). HSV is defined in a way that is similar to how humans perceive colours. |
| |
Reduce image domain | Reduces the definition domain of the given image to the indicated region. It subtracts a region to a specific image. |
| |
Region segmentation | Segments images into regions of the same intensity. |
| |
Threshold image | Segments an image using a local threshold. It selects those regions in which the pixels fulfill a threshold condition. |
| |
Automatic threshold | Segments an image using thresholds determined from its histogram. |
| |
Edge detection | Detects edges using filters such as Deriche, Lanser, Shen, Canny, and Sobel. |
| |
FEATURE EXTRACTION FUNCTIONS | |
| |
Get region features | Gets different features related to colour, texture, and shape. |
| |
Connected regions | Determines the connected components of the input regions. |
| |
Select specific shape | Chooses regions according to shape feature values such as area, width, and circularity. |
| |
Count and crop regions | Counts and crops the possible regions with pests. It generates the input for the machine learning algorithm for pest classification. |
Two machine learning algorithms are tested in order to select the one with best accuracy: K-nearest neighbour (KNN) and Multilayer Perceptron (MLP).
One of the main differences between machine learning and deep learning approach to be considered during the system implementation is that machine learning algorithms require complex feature engineering work while, in deep learning, features between different categories are extracted in an automatic way.
Table
Machine learning model features.
FEATURE | DEFINITION |
---|---|
Area | Area of the object. |
| |
Circularity | Shape factor for the circularity of an object. It calculates the similarity of the object with a circle. |
| |
Compactness | Compactness of the object. |
| |
Content Length | Total length of the object. |
| |
Convexity | Shape factor for the convexity of an object. The shape factor is one if the object is convex. If there are holes, the shape factor is smaller than one. |
| |
Rectangularity | It is considered as the shape factor for the rectangularity of an object. |
| |
Elliptic axis | Calculates the main and the secondary radius of the equivalent ellipse. |
| |
Phi orientation | The orientation of the equivalent ellipse. |
| |
Anisometry | The relationship between the main and the secondary radius of the equivalent ellipse. |
| |
Bulkiness | The relationship between the anisometry and the area of the object. |
| |
Structure factor | The relationship between the anisometry and the bulkiness. |
| |
Smallest circle | Determines the smallest surrounding circle of an object. It is the circle with the smallest area of all circles containing the object. |
| |
Inner circle | Calculates the largest inner circle of an object. |
| |
Inner rectangle | Determines the largest rectangle that fits into an object. |
| |
Roundness | Calculates the distance between the contour and the centre of the area. |
| |
Sides | The number of polygon sides. |
| |
Diameter | The maximum distance between two points of the object. |
| |
Orientation | Determines the orientation of the object. |
| |
Smallest rectangle | Calculates the rectangle with the smallest area of all rectangles containing the object. |
Figure
Machine learning model creation.
TensorFlow [
Deep learning based approach flow.
Several data manipulation steps need to be performed. First, images and annotations need to be cleaned up. Not useful pictures and annotations are removed. Second, data augmentation techniques are applied to the dataset. Minor alterations to the dataset by generating modified images can improve model accuracy. Finally, modified annotation files and real and modified images need to be converted to a specific format.
Data cleaning is the first step on data manipulation task. Data cleaning task needs to remove all unnecessary pictures and annotations from the dataset. A proper data cleaning work will improve generated model quality. It is usually a manual work, but some steps can be processed automatically. Images without annotations, annotations without images, and duplicate or irrelevant images will be removed.
Data augmentation is important to be able to detect objects at different scales [
Data augmentation techniques can be divided into two types depending on the execution time. The first option is to execute desired transformations beforehand. This is known as offline augmentation. Using offline augmentation, the dataset will be increased before training the model, by a factor equal to the number of transformations. Offline augmentation is normally used in small datasets. The second option is called online augmentation. Transformations are performed at the time of training the model. Some basic augmentation techniques such as crop, rotation, Gaussian noise, scale, and flip are applied.
TensorFlow pipeline configuration can be divided into four different steps (Figure
On the other hand, Faster RCNN deep learning architecture was developed by Microsoft based on RCNN. It uses Selective Search in order to extract different region proposals. Extracted proposals are sent to a classification network in order to use SVM to classify each region in any of the categories. According to [
Then, the configuration of the trainer step is defined. It decides the elements and parameters that should be used to train the model. The model parameter initialisation, the input preprocessing, and the SGD (Stochastic Gradient Descent) parameters are configured in this section. The learning rate and the batch size configuration are the most important configuration values in this step. Setting the learning rates and the batch size is important to reduce overfitting. Training an object detector model from scratch takes too much time. Generated pest detection models use the weights from another model checkpoint in order to speed up the training process. The path to another object detection checkpoint is defined also in this step.
After that, the input train configuration step is defined. This step defines what dataset the model should be trained on. TensorFlow offers a set of detection models pretrained on different datasets such as the COCO dataset, the Kitti dataset, the Open Images dataset, and the AVA v2.1 dataset. It is also necessary to define the path where the label map is defined. Label map contains the information between all the defined categories and its unique identifier.
The final step is focused on configuring the evaluator. It is necessary to define the metrics that will be used for evaluation. The number of batches used for an evaluation cycle, the size of the evaluation dataset, and the selected metric to run during evaluation are defined.
It is possible to train the model both in a local PC and on the cloud. The training process will be faster if the model is trained using GPU rather than CPU. It is possible to use a computer on the cloud with a faster GPU to decrease the training time. It is mandatory to have the TensorFlow library installed and its dependencies; a set of pictures with its labels and the object detection pipeline needs to be generated in order to be able to train the model. While the model is training, it is continuously generating different checkpoints. When the training process is finished, it is necessary to select the checkpoint number in order to export and generate the model. It is common to select the biggest checkpoint number because it is the latest one. When a new model is exported, a frozen graph, the model checkpoint, the checkpoint files, the pipeline configuration file, and the exported model are generated
The main purpose of this experiment is to analyse the results of pest detection and identification on plants using the combination of computer vision and machine learning, versus the deep learning technique. First, the combination of computer vision and machine learning approach will be validated using k-fold cross validation technique. Second, the deep learning approach will be validated using Average Precision (AP) which is based on the Intersection over Union (IoU) technique. Finally, the comparation between both techniques will be validated using custom metrics.
Pictures of unhealthy plants are available on the generated dataset. As it has been explained in Section
The computer vision and machine learning approach evaluation is divided into two different experiments. On the one hand, the computer vision experiment will be focused on the extraction of the regions with possible pests. On the other hand, the machine learning experiment will be focused on the evaluation and the comparation of the obtained results between the different classification models.
First, computer vision experiment and results are described. The 4,331 labelled pictures are going to be used to test the computer vision algorithm. The objective of the test is to measure the number of possible regions with pests that the algorithms will extract. Due to the image labelling work, we know that from 4,331 original pictures, 54,743 pictures of different insect and eggs are extracted. The computer vision algorithm in the experiment has extracted 667,299 different pictures with possible insects and eggs. It is 12 more times than the exact number of tags. The machine learning model will classify all the cropped pictures, but it will need more time than expected for this work due to the big number of cropped pictures. On average, 154 pictures are extracted by the computer vision algorithm, when they were 12 on the generated dataset. It is possible to reduce this number of pictures, but the objective is to ensure that every possible pest on the image is going to be classified by the machine learning algorithm. If the computer vision algorithm is adjusted to reduce the number of possible pictures with pests, some insects or eggs will not be selected to inspect. As the early pest detection is very important in order to reduce the damage of any disease, we prefer more pictures to analyse and try to detect any pest on early stages.
Figure
Computer vision pest detection example.
Second, machine learning experiment and results are described. k-fold cross validation is used to compare the performance of the four different machine learning models on our generated dataset. The performances of K-nearest neighbour (KNN) and Multilayer Perceptron (MLP) models are measured using k-fold cross validation.
The following steps are performed to evaluate each machine learning model: The original training dataset is divided into ten different folds or subsets. Each fold contains around 4,331 images. For each k value: One fold is kept as the validation set, and the rest of the folds are kept as the training set. The machine learning model is trained using the training set, and the accuracy is calculated using the validation set. The accuracy of the machine learning model is calculated by averaging the accuracies in all the cases of the cross validation.
The training dataset is divided into ten different folds. Each column represents one validation test as shown in Table
Machine learning accuracy results.
T0 | T1 | T2 | T3 | T4 | T5 | T6 | T7 | T8 | T9 | AVR | |
---|---|---|---|---|---|---|---|---|---|---|---|
KNN | 66.11% | 66.16% | 66.25% | 66.45% | 66.83% | 65.94% | 66.92% | 67.23% | 66.36% | 66.50% | 66.47% |
| |||||||||||
MLP | 80.35% | 81.12% | 80.40% | 80.35% | 82.34% | 80.93% | 81.62% | 81.71% | 81.18% | 81.27% | 81.12% |
MLP is the machine learning pest classification model with the best accuracy. Figure
Machine learning image classification.
4,331 pictures are labelled using image labelling tool. 90% of labelled pictures are used for training purposes and the 10% are used for evaluating the model accuracy. The disease detection model is developed with this 90% of the pictures. We use the rest of the pictures (10%) to get the real accuracy of the pest detection and identification approach.
Determining if a disease exists in the image and the location of the insect or egg are the two different tasks to evaluate in the deep learning object detection and identification approach. This approach combines both the object classification and the localization tasks. For the object classification task, the Average Precision (AP) metric is commonly used to measure the deep learning model accuracy. The AP metric is based on the precision and the recall metrics [
Table
Tested deep learning models.
Id | Architecture | Aspect ratio resizer | Fine tune model | Data augmentation | Number of steps |
---|---|---|---|---|---|
M1 | Faster RCNN | 600x1,024 | FASTER RCNN RESNET101 COCO | YES | 335,736 |
| |||||
M2 | SSD | 300X300 | SSDLITE MOBILENET V2 COCO | YES | 200,000 |
| |||||
M3 | Faster RCNN | 600x1,024 | FASTER RCNN INCEPTION V2 COCO | NO | 177,184 |
| |||||
M4 | Faster RCNN | 600x1,024 | FASTER RCNN INCEPTION V2 COCO | YES | 200,000 |
| |||||
M5 | Faster RCNN | 600x1,024 | FASTER RCNN INCEPTION RESNET V2 ATROUS COCO | YES | 110,295 |
| |||||
M6 | SSD | 300x300 | SSD INCEPTION V2 COCO | NO | 200,000 |
Table
Deep learning experiment results.
Metric | M1 | M2 | M3 | M4 | M5 | M6 |
---|---|---|---|---|---|---|
Precision Egg Trialeurodes | 0.54 | 0.63 | 0.58 | 0.71 | 0.56 | 0.59 |
| ||||||
Precision Egg Bemisia | 0.16 | 0.13 | 0.15 | 0.12 | 0.16 | 0.13 |
| ||||||
Precision Insect Trialeurodes | 0.72 | 0.63 | 0.65 | 0.74 | 0.69 | 0.69 |
| ||||||
Precision Insect Bemisia | 0.34 | 0.35 | 0.30 | 0.27 | 0.32 | 0.33 |
| ||||||
Recall Insect Bemisia | 0.20 | 0.16 | 0.32 | 0.27 | 0.34 | 0.16 |
Figure
Deep learning for pest detection and identification.
Once computer vision, machine learning, and deep learning experiments have been presented and measured with their most common metrics, this section describes an experiment to check and compare the accuracy and the speed of the generated pest detection and identification models. The final goal of the work is to build a global positioning system guided autonomous robot for pest control in greenhouses. Since the robot is composed of limited batteries, the speed is another critical factor. The robot needs to inspect the largest amount of plants. Pictures will be analysed both in real time and offline (on the server side), but real-time information will be crucial in making decisions about the robot routes. The inspection solution needs to know if a disease is present in an image. If a disease is detected, the algorithm also needs to determine which type of disease is. It is not so important to detect all existing insects and eggs on a tomato leaf, but to determine if a disease is present or not in a picture.
Considering the final goal of the project, an experiment has been developed in order to determine which technique shows better performance according to the requirements. The criteria for inspection technique selection will be based on accuracy and speed.
For the experiment, the following points were considered: Machine learning KNN model T6, machine learning MLP model T4, deep learning model M4 Faster RCNN, and deep learning model M6 SSD were used. 100 images with possible pest of the generated 4,331 images dataset were used. The set of images was not previously used to train any model. 50 images with healthy tomato leaves were used. The algorithms’ input is a picture and the output is whether the image has or not any disease. If a disease is detected, it needs to classify the disease. For this experiment, if there is more than one disease, the disease with the largest number of insects or eggs will be the disease of the picture. The selected diseases are
The experiment pipeline is the following: 100 images with possible pests and 50 images without pest were selected randomly. Each of the images in the set of pictures with pests contains an XML file with its insects and/or eggs annotation. Using this file and a script, every insect and egg was cropped and saved. All the numbers of insects and eggs per disease were also stored in a database. Cropped pictures and the database are considered the ground truth of the experiment. Computer vision pest detection algorithm loads the 150 pictures and generates a set of regions to inspect by the machine learning algorithm. The number of generated pictures and the processing time were measured. Machine learning KNN model T6 and MLP model T4 pest classification model classifies the generated cropped pictures on different categories. All this information was stored in a database. The processing time was measured. Deep learning pest detections M4 Faster RCNN and M6 SSD models load the 150 pictures. The number of detected pests per categories was stored in a database. All the detected insects and eggs images were cropped. The processing time was measured. Generated data on the database was analysed, and conclusions were extracted.
First, the computer vision experiment was performed. The computer vision test selected 27,976 different regions with possible pest from 150 images. There are 186 cropped images on average per original picture to classify by the machine learning model. The test took 430 seconds to generate all the cropped pictures, that being 2.86 seconds on average per image.
Second, the machine learning experiment was performed. The machine learning experiment used the output of the computer vision experiment. Two different machine learning models were tested, using two different factors of confidence in order to categorize an image inside a category. In machine learning KNN and MKP models, 0.5 and 0.75 confidence thresholds were used. The KNN machine learning model took 324 seconds and the MLP model took 251 seconds to classify all the cropped pictures. On average, the KNN model took 2.16 seconds to classify all the cropped pictures per image, and the MLP took 1.67 seconds.
Third, the deep learning experiment was done. Two different deep learning models were tested. Both models were tested with 0.5 and 0.75 confidence thresholds. The Faster RCNN took 571 seconds and the SSD took 431 seconds to detect and classify all the pests. On average, the Faster RCNN took 3.8 seconds and the SSD took 3.54 seconds to analyse each image. It is important to note that the use of a GPU on the robot to classify the pictures will reduce drastically the inspection time.
All the information obtained for the machine learning and deep learning models was stored in a database in order to perform metrics later. The image name, the method used to extract the disease (ground truth, MLP-0.5, MLP-0.75, KNN-0.5, KNN-0.75, SSD-0.5, SSD-0.75, RCNN-0.5, and RCNN-0.75), number of insects
Table
Disease detection comparation rate.
MLP | MLP | KNN | KNN | SSD | SSD | RCNN | RCNN | GROUND | |
---|---|---|---|---|---|---|---|---|---|
Unhealthy | 94,32% | 94,32% | 92,05% | 92,05% | 77,27% | 72,73% | 85,23% | 79,55% | 100% (88) |
| |||||||||
Healthy | 33,87% | 35,48% | 35,48% | 35,48% | 61,29% | 66,13% | 54,84% | 85,48% | 100% (62) |
| |||||||||
Success | 64,09% | 64,90% | 63,76% | 63,76% | 69,28% | 69,43% | 70,03% | 82,51% | 100% |
Table
Disease identification rate.
MLP | MLP | KNN | KNN | SSD | SSD | RCNN | RCNN | GROUND | |
---|---|---|---|---|---|---|---|---|---|
| 0,00% | 0,00% | 6,90% | 6,90% | 55,17% | 51,72% | 75,86% | 65,52% | 100% (29) |
| |||||||||
Egg success rate | 100,00% | 100,00% | 100,00% | 100,00% | 15,38% | 15,38% | 7,69% | 7,69% | 100% (13) |
| |||||||||
| 0,00% | 0,00% | 0,00% | 0,00% | 52,17% | 45,65% | 41,30% | 41,30% | 100% (46) |
| |||||||||
Healthy success rate | 33,87% | 35,48% | 35,48% | 35,48% | 61,29% | 66,13% | 54,84% | 85,48% | 100% (62) |
Once computer vision, machine learning, and deep learning experiments have been explained, measured, and compared, different conclusions are extracted.
Computer vision for pest detection selects too many regions with possible disease on each image. For example, pictures with few insects or eggs also generate a big number of regions to analyse by the machine learning pest classification model. It is very slow to classify a big number of regions by the machine learning model, because of that, the combination of computer vision and machine learning approach cannot work in real time to inspect the greenhouse’s plants. The balance between inspection speed and accuracy is critical for the real time inspection. Images are going to be analysed both in real time and later on the server side, but real time information is needed in order to modify if the autonomous robot inspection route is necessary. Computer vision algorithms will be affected by the illumination changes in order to segment plants from the background and to select regions with possible insects or eggs.
Using MLP machine learning pest classification model, the project is getting an accuracy of 82.34% on average on the machine learning k-cross validation test.
Pictures that are categorized with a low confidence by the machine learning classification algorithm are removed for the selection. Computer vision algorithm will generate different regions with possible pest that will not have any insect or egg. Due to the machine learning classification confidence factor, these regions will not be categorized as a disease. A 0.5 and 0.75 confidence factor is set on this first approach.
Deep learning technique is a better solution than the combination between computer vision and machine learning. Unlike the computer vision and machine learning approach, the deep learning technique is developing the disease detection and classification in one step. It gets better accuracy, and it can distinguish between
We also realize that all insects and/or eggs are not always discovered by the computer vision or the deep learning approach. The final goal of the inspection approach is to determine whether a plant is infected. When a plant is infected, the inspection approach needs to infer its disease. One tomato plant can be defined as not healthy if an insect or egg is found. It is not necessary to detect all the insects or eggs to achieve the project requirements, but the performance will be better. In general, more pictures from the cultivation chamber and the greenhouse need to be labelled in order to generate more robust models.
Disease detection approach using deep learning is based on the most popular image classification models. On image classification tasks, the network receives one input image and it generates one class label as an output. However, on disease detection tasks, the network gets one input image and it generates various bounding boxes with each class label. The disease detection approach using deep learning is composed of two networks. One network is used for generating region proposals using a Region Proposal Network (RPN), and the other network is used to detect diseases on the selected regions. The RPN proposes a set of different regions of several sizes that apparently contains a disease. Proposed regions are inspected by a regressor and a classifier in order to find regions with an insect or an egg. The RPN region proposal is faster than the results obtained using computer vision, and the quality of the selected regions is also better than using the machine learning approach. Computer vision and machine learning approach is based on features that are provided manually, and deep learning approach extracts these features automatically from the training dataset. Because of that, deep learning pest detection and identification approach has better accuracy, and it also works betters in different scenarios and different illumination conditions. A pest detection and identification model using deep learning, labelled correctly by experts with a huge dataset of different pictures, will extract automatically many features that are impossible to infer by a human. All the combinations of possible features (color, shape, contours, size, etc.) are impossible to add to a computer vision and machine learning algorithm.
Once the deep learning approach is selected to be the pest detection and identification technique, different improvements need to be added to the dataset and to the model. First, more data augmentations techniques need to be added to the model such as image resizing, translation, scaling, flipping, rotation, perspective transformations, and lighting changes. Model accuracy changes need to be tested in order to know if some of these data augmentation techniques are improving the dataset variability. Second, the generation of different synthetically generated scenes with insects and eggs can also improve the dataset variability. Third, in this first approach, pictures of
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The GreenPatrol project has received funding from the European GNSS Agency under the European Union’s (EU) Horizon 2020 research and innovation programme under grant agreement no. 776324 (project GreenPatrol,