^{1}

^{1}

^{1}

A new image recognition system based on multiple linear regression is proposed. Particularly, there are a number of innovations in image segmentation and recognition system. In image segmentation, an improved histogram segmentation method which can calculate threshold automatically and accurately is proposed. Meanwhile, the regional growth method and true color image processing are combined with this system to improve the accuracy and intelligence. While creating the recognition system, multiple linear regression and image feature extraction are utilized. After evaluating the results of different image training libraries, the system is proved to have effective image recognition ability, high precision, and reliability.

The prevention and control of plant disease have always been widely discussed because plants are exposed to outer environment and are highly prone to diseases. Normally, the accurate and rapid diagnosis of disease plays an important role in controlling plant disease, since useful protection measures are often implemented after correct diagnosis [

This system is based on image processing technology and uses MATLAB as the main processing tool. Besides, digital image processing, mathematical statistics, plant pathology, and other relative fields are also considered. Comparing to the traditional image recognition, there are plenty of innovations in image segmentation and system construction. To strengthen the division of the lesion, users have different creative interactive options to meet their own needs. Meanwhile, linear regression model can be used in various types of plant disease.

The remainder of this paper is organized as follows. In Section

Threshold can never be ignored in image processing. Iterative Method, Otsu Method, and 2-Mode Method are the most common threshold segmentation methods. This section introduces these traditional methods.

The Iterative Method can calculate the threshold in a certain extent automatically. For the iterative process, the Iterative Method includes a prior knowledge concerning the image and noise statistics. And the optimal segmentation threshold can be found by continuously reducing the gray scale mean [

Given the split threshold of foreground and background

The image is often composed of normal foliage and diseased area, so the histogram of gray scale can be regarded as two normal distribution functions, which is shown in Figure

The histogram in 2-Mode Method.

Select the trough, that is,

In most practical situations, however, traditional methods are not able to be the most appropriate choice. It is unwise to ignore difference between actual images and hypothesized data. Therefore, improved methods and a new recognition system based on multiple linear regression are created. This section introduces the characteristics of this system.

Traditional 2-Mode Method needs to set the threshold manually. As the user has huge task burden, it lacks identification efficiency.

This proposed segmentation method can automatically determine the threshold. It can greatly reduce the user’s task burden and optimize the image segmentation process. The processing details are as follows; the flow diagram of the improved histogram segmentation is shown in Figure

The process of the improved histogram segmentation.

Comparing with other threshold segmentation methods, this improved histogram segmentation method is more accurate, as shown in Figure

Segmentation results of each threshold method.

In Figure

Table

The ratio of three methods (10^{−3}).

Minor (15) | Medium (15) | Serious (15) | Average | |
---|---|---|---|---|

Improved Method | 0.76051 | 1.23364 | 0.8973 | 0.963817 |

Iterative Method | 2.2246 | 2.85 | 2.57 | 2.5482 |

Otsu Method | 3.42 | 3.61 | 3.38 | 3.47 |

From Figure

In conclusion, the improved histogram segmentation method is more suitable for lesion segmentation in this system with great advantages, such as fast, efficient, and accurate.

In this system, a total of 11 features are extracted from three aspects: color, texture, and shape. In color aspect, three characteristic values (Hue, Saturation, and Value) are extracted to represent the color features of the lesion [

The selected characteristics are regarded as 11 independent variables

In this system, plant diseases are divided into four categories, including normal situation, minor disasters, medium disasters, and serious disasters. And a score system, from 0 to 100, is used to assess the severity of disease. The higher the score is, the more serious the disease is. 0–25 scores are normal situation, 25–50 scores are minor disasters, 50–75 scores are medium disasters, and 75–100 scores are serious disasters. Each situation has 10 images, the training library has a total of 40 images.

There are 11 selected eigenvalues of the 40 training images showing in Figure

Selected eigenvalue of the training images.

Put the 11 features into multiple linear regression models. The coefficients and confidence intervals are calculated via using the least squares estimation algorithm.

From Figure

Coefficients and confidence intervals.

Coefficients | Coefficient estimation | Coefficients confidence intervals |
---|---|---|

| −816.1596 | [−3122.8 1490.4] |

| −323.6432 | [−634.8063 −12.4801] |

| −9.7902 | [−126.8978 107.3174] |

| −91.8584 | [−399.8541 216.1372] |

| 10.4528 | [−3.4562 24.3619] |

| −0.0038 | [−0.0151 0.0074] |

| −19.4105 | [−664.0947 625.2737] |

| 993.5927 | [−580.5174 2567.7] |

| −72.9863 | [−428.3056 282.3329] |

| 1689.2 | [−118.7379 3497.2] |

_{10} | 65.8747 | [−909.1611 1040.9] |

_{11} | 37.0236 | [−219.3019 293.3492] |

Coefficients and confidence intervals before and after eliminating residual items.

Original

After eliminating the residual term

Finally, after eliminating the residua, the obtained multiple regression system model is evaluated and analyzed to test the accuracy of the recognition system.

Color cutting is to use a sphere to determine a color of the cluster; extract the following:

This system combines true color image processing with disease identification to help users to extract special color lesions. First, select the color of the target lesion as a pixel from the graph, and then stratify color according to the RGB mean of each selected pixel 3 × 3 neighborhood domain, and set the appropriate color clustering sphere radius (

Target selection interface in color cutting.

Lesion image after color cutting.

Regional growth method refers to the method of continually deriving the pixel seed dot regions with spatial domain similarity and finally growing a region [

In this system, user selects a number of points; then the system will use the regional growth method only to split out these areas of the lesion which encircle these specific points, as shown in Figure

The lesion image before and after regional growth method.

Before regional growth method

After regional growth method

The system is based on image processing technology and MATLAB as the main processing tool. It is divided into four stages: image preprocessing, image segmentation, feature extraction, and regression model. The detailed workflow of the system can be summarized as follows:

Step

The authentication system has the following:

After the establishment of multiple linear regression model, the images from training libraries are placed into the multiple linear regression model; then the disease recognition system is constructed by using the least squares method. Use the images inside and outside the training libraries to test the accuracy of the system.

The results of the discrimination are shown in Table

The recognition accuracy of images inside (40) and outside (10) the training library.

Image | Correct | Wrong | Accuracy |
---|---|---|---|

Inside | |||

Normal situation (10) | 9 | 1 | 90% |

Minor disaster (10) | 8 | 2 | 80% |

Medium disaster (10) | 8 | 2 | 80% |

Serious disaster (10) | 7 | 3 | 70% |

Outside | |||

Random situation (10) | 6 | 4 | 60% |

From Table

Increase the number of the training images and collect the new accuracy results; the results are shown in Table

The recognition accuracy of images inside (60) and outside (10) the training library.

Image | Correct | Wrong | Accuracy |
---|---|---|---|

Inside | |||

Normal situation (15) | 14 | 1 | 93.33% |

Minor disaster (15) | 13 | 2 | 86.67% |

Medium disaster (15) | 12 | 3 | 80% |

Serious disaster (15) | 12 | 3 | 80% |

Outside | |||

Random situation (10) | 9 | 1 | 90% |

Compare the multiple linear regression recognition system with the traditional minimum distance method [

Comparison of accuracy of multiple linear regression and minimum distance method.

Image | Multiple linear regression | Minimum distance method |
---|---|---|

Inside | ||

Normal situation (15) | 93.30% | 60.00% |

Minor disaster (15) | 86.67% | 73.33% |

Medium disaster (15) | 80.00% | 73.33% |

Serious disaster (15) | 80.00% | 80.00% |

Outside | ||

Random situation (10) | 90.00% | 67.66% |

The results of multiple regression system can discriminate the severity of plant diseases better. Meanwhile, as the increase of the training images, the results are more accurate, which proves the accuracy and good potential of this system.

At the same time, due to the arbitrariness of the independent and dependent variables, user can also choose different parameters to establish new regression models. For example, the random variables can be only relevant to several specific diseases which makes it more accurate to distinguish these diseases. Besides, the selection of the characteristic parameters depends on user, so it is easy to highlight the different lesion characteristics to meet different requirement. In conclusion, the recognition system in this paper has great applicability and modifiability.

This paper improves image segmentation and disease recognition system. An improved histogram segmentation method is proposed; this method can find appropriate threshold automatically rather than manually, which is more scientific, reliable, and efficient. Meanwhile, the linear regression model can be modified easily by changing the independent and dependent variables; it has accuracy, applicability, and greater potential.

The authors declare that they have no conflicts of interest.