Plant Diseases Recognition Based on Image Processing Technology

A new image recognition system based on multiple linear regression is proposed. Particularly, there are a number of innovations in image segmentation and recognition system. In image segmentation, an improved histogram segmentation method which can calculate threshold automatically and accurately is proposed. Meanwhile, the regional growth method and true color image processing are combined with this system to improve the accuracy and intelligence.While creating the recognition system,multiple linear regression and image feature extraction are utilized. After evaluating the results of different image training libraries, the system is proved to have effective image recognition ability, high precision, and reliability.


Introduction
The prevention and control of plant disease have always been widely discussed because plants are exposed to outer environment and are highly prone to diseases.Normally, the accurate and rapid diagnosis of disease plays an important role in controlling plant disease, since useful protection measures are often implemented after correct diagnosis [1].
This system is based on image processing technology and uses MATLAB as the main processing tool.Besides, digital image processing, mathematical statistics, plant pathology, and other relative fields are also considered.Comparing to the traditional image recognition, there are plenty of innovations in image segmentation and system construction.To strengthen the division of the lesion, users have different creative interactive options to meet their own needs.Meanwhile, linear regression model can be used in various types of plant disease.
The remainder of this paper is organized as follows.In Section 2, traditional threshold segmentation methods are introduced.And there are four parts in Section 3 describing this system in detail: improved histogram segmentation method, Disease Recognition System Based on Multiple Linear Regression, multiselective interactive image segmentation methods, and overall process of the disease recognition system.Section 4 shows the results and analysis of this recognition system, and summary is concluded in Section 5.

Traditional Threshold Segmentation Methods
Threshold can never be ignored in image processing.Iterative Method, Otsu Method, and 2-Mode Method are the most common threshold segmentation methods.This section introduces these traditional methods.

Iterative Method.
The Iterative Method can calculate the threshold in a certain extent automatically.For the iterative process, the Iterative Method includes a prior knowledge concerning the image and noise statistics.And the optimal segmentation threshold can be found by continuously reducing the gray scale mean [2].

Otsu Method.
Given the split threshold of foreground and background , the foreground image ratio  0 , the average gray scale  0 , background image ratio  1 , and the average gray scale  1 , the total gray scale of the image is , it becomes the best segmentation threshold [3].

2-Mode Method.
The image is often composed of normal foliage and diseased area, so the histogram of gray scale can be regarded as two normal distribution functions, which is shown in Figure 1.
Select the trough, that is,  position in Figure 1; split the image into two parts, and the result (, ) is is the segmentation threshold, usually  0 = 0 (black),  1 = 1 (white) [4].

Image Recognition System Based on Multiple Linear Regression
In most practical situations, however, traditional methods are not able to be the most appropriate choice.It is unwise to ignore difference between actual images and hypothesized data.Therefore, improved methods and a new recognition system based on multiple linear regression are created.This section introduces the characteristics of this system.

Improved Histogram Segmentation Method (Calculate Threshold Automatically
). Traditional 2-Mode Method needs to set the threshold manually.As the user has huge task burden, it lacks identification efficiency.This proposed segmentation method can automatically determine the threshold.It can greatly reduce the user's task burden and optimize the image segmentation process.The  processing details are as follows; the flow diagram of the improved histogram segmentation is shown in Figure 2 (1) Perform median filtering operation on grayscale image in 5-by-5 neighborhood, smooth value with Robust Loess (quadratic fit) and specified span of moving average (17).These preprocessing operations are designed to provide fit histogram for segmentation.
(2) Limit selection range (100-190 of 255 pixels) and minimum peak interval (10 pixels), because there are some interferential fluctuations on histogram that need to be filtered out.
(3) Pick the maximum height (between peak and trough) and locate this most suitable trough, which is the adaptive threshold .
Comparing with other threshold segmentation methods, this improved histogram segmentation method is more accurate, as shown in Figure 3.
In Figure 3, sample plants come from different disease severities, minor, medium, and serious disasters.It represents visual results of segmentation directly.
Extract Green matrix in RGB color image and calculate mean of this two-dimensional matrix.MeanS is average of green pixels of the segmented image, MeanO represents green pixel mean of original image.The ratio is smaller, the more parts of normal foliage are excluded.Table 1 reflects result of segmentation indirectly by comparing the change ratio of green pixel values, 15 samples for each severity.In the table, the ratio of the improved method is smaller than the other two methods.
From Figure 3 and Table 1, new histogram segmentation method works better in filtering out normal foliage comparing with Iterative Method and Otsu Method, even in different severities.Both of the two methods remain a little more normal foliage parts after segmentation.This is mainly because Iterative Method is greatly influenced by the overall grayscale of the image; it is weak in distinguishing the subtle pattern and will cause division errors.Although Otsu Method is more stable and practical, it will have problems while handling gray statistic, especially when the ratio of the target  lesion area to the background area is very small, so it cannot be used in this system either.
In conclusion, the improved histogram segmentation method is more suitable for lesion segmentation in this system with great advantages, such as fast, efficient, and accurate.

Disease Recognition System Based on Multiple Linear
Regression.In this system, a total of 11 features are extracted from three aspects: color, texture, and shape.In color aspect, three characteristic values (Hue, Saturation, and Value) are extracted to represent the color features of the lesion [6]; in texture part, energy and homogeneity are selected by using the gray level cooccurrence matrix.Meanwhile, the statistical matrix is also used to collect four characteristics (smoothness, third-order moment, consistency, and entropy) [7]; for the shape, this system chooses the degree of rectification and density.All the extracted 11 characteristic parameters still remain stable no matter how the image changes in rotation, translation, and scale.Thus, they are representative and comprehensive.
The selected characteristics are regarded as 11 independent variables  1 ,  2 , . . .,  11 , and the dependent variable  is the severity of the plant disease.Therefore, the regression model is 0 ,  1 ,. ..,  11 are the required coefficients [8].
In this system, plant diseases are divided into four categories, including normal situation, minor disasters, medium disasters, and serious disasters.And a score system, from 0 to 100, is used to assess the severity of disease.The higher the score is, the more serious the disease is.0-25 scores are normal situation, 25-50 scores are minor disasters, 50-75 scores are medium disasters, and 75-100 scores are serious disasters.Each situation has 10 images, the training library has a total of 40 images.
There are 11 selected eigenvalues of the 40 training images showing in Figure 4; the database has 40 columns, which are 40 images, and 11 lines, which are 11 characteristic parameters.Put the 11 features into multiple linear regression models.The coefficients and confidence intervals are calculated via using the least squares estimation algorithm.
From Figure 5(a), the 14th point is an anomaly and exceeds the credible range.Therefore, we remove this point, and then relinear regression of 39 normal points helps to obtain the correct coefficients and confidence intervals, as shown in Table 2.
Finally, after eliminating the residua, the obtained multiple regression system model is evaluated and analyzed to test the accuracy of the recognition system.

Color Cutting Based on True Color Image Processing.
Color cutting is to use a sphere to determine a color of the cluster; extract the following: 0 is the radius of the color clustered sphere,   ( = 1, 2, 3) is the color component of the center of the sphere, and  represents a fixed color element.This system combines true color image processing with disease identification to help users to extract special color lesions.First, select the color of the target lesion as a pixel from the graph, and then stratify color according to the RGB mean of each selected pixel 3 × 3 neighborhood domain, and   set the appropriate color clustering sphere radius ( 0 = 70) to filter out useless stains [9].As shown in Figure 6, the user chooses the certain color parts.After color cutting, as shown in Figure 7, the target color part is retained, the uncorrelated parts are filtered, and the lesions are more accurate.

Multipoint Selection Based on Regional Growth Method.
Regional growth method refers to the method of continually deriving the pixel seed dot regions with spatial domain similarity and finally growing a region [10].
In this system, user selects a number of points; then the system will use the regional growth method only to split out these areas of the lesion which encircle these specific points, as shown in Figure 8.

The Overall Process of the Disease Recognition System.
The system is based on image processing technology and MATLAB as the main processing tool.It is divided into four stages: image preprocessing, image segmentation, feature extraction, and regression model.The detailed workflow of the system can be summarized as follows: (1) Select edge to cut the image and separate leaf from the complex background.
(2) Filter out image noise by using spatial domain image denoising.
(3) Create improved histogram segmentation method to separate the lesion from normal foliage.
(4) Depending on different purposes, user can choose color cutting based on true color image processing and multipoint selection based on regional growth method to extract certain disease parts.
(5) Extract the lesion characteristics from color, texture, and shape, three aspects.
(6) Establish the linear regression model and the disease identification system.
Step (a) Before regional growth method (b) After regional growth method Figure 8: The lesion image before and after regional growth method.

Results and Analysis
The authentication system has the following: (1) server-side configuration: 2.9 GHz Intel Core i7, Intel HD Graphics 630 and Radeon Pro 560, 16 GB memory, MacOS Version 10.13.3; (2) tool platform: MATLAB R2017b.After the establishment of multiple linear regression model, the images from training libraries are placed into the multiple linear regression model; then the disease recognition system is constructed by using the least squares method.Use the images inside and outside the training libraries to test the accuracy of the system.
The results of the discrimination are shown in Table 3 From Table 3, as the disease situation becomes gradually complex, the number of errors gradually increased, because when the disease becomes more serious, the characteristic parameters will become more complex, which will make the results instable.
Increase the number of the training images and collect the new accuracy results; the results are shown in Table 4.
Compare the multiple linear regression recognition system with the traditional minimum distance method [11] with the same training data.The results are shown in Table 5 The results of multiple regression system can discriminate the severity of plant diseases better.Meanwhile, as the increase of the training images, the results are more accurate, which proves the accuracy and good potential of this system.
At the same time, due to the arbitrariness of the independent and dependent variables, user can also choose different parameters to establish new regression models.For example, the random variables can be only relevant to several specific diseases which makes it more accurate to distinguish these diseases.Besides, the selection of the characteristic parameters depends on user, so it is easy to highlight the different lesion characteristics to meet different requirement.In conclusion, the recognition system in this paper has great applicability and modifiability.

Summary
This paper improves image segmentation and disease recognition system.An improved histogram segmentation method is proposed; this method can find appropriate threshold automatically rather than manually, which is more scientific, reliable, and efficient.Meanwhile, the linear regression model can be modified easily by changing the independent and dependent variables; it has accuracy, applicability, and greater potential.

Figure 2 :
Figure 2: The process of the improved histogram segmentation.

Figure 3 :
Figure 3: Segmentation results of each threshold method.

Figure 4 :
Figure 4: Selected eigenvalue of the training images.
After eliminating the residual term

Figure 5 :
Figure 5: Coefficients and confidence intervals before and after eliminating residual items.

Figure 6 :
Figure 6: Target selection interface in color cutting.

( 1 )
and Step (2) belong to the preprocessing part, filtering out irrelevant image information.Step (3) and Step (4) belong to the image segmentation part, separating the lesions accurately.Step (5) belongs to the feature extraction part, extracting the characteristic parameters from various aspects of the lesion.Step (6) belongs to the regression model part, establishing multiple linear regression model to determine the type of disease.

Table 2 :
Coefficients and confidence intervals.

Table 5 :
Comparison of accuracy of multiple linear regression and minimum distance method.