The introduction of deep transfer learning (DTL) further reduces the requirement of data and expert knowledge in various uses of applications, helping DNN-based models effectively reuse information. However, it often transfers all parameters from the source network that might be useful to the task. The redundant trainable parameters restrict DTL in low-computing-power devices and edge computing, while small effective networks with fewer parameters have difficulty transferring knowledge due to structural differences in design. For the challenge of how to transfer a simplified model from a complex network, in this paper, an algorithm is proposed to realize a sparse DTL, which only transfers and retains the most necessary structure to reduce the parameters of the final model. Sparse transfer hypothesis is introduced, in which a compressing strategy is designed to construct deep sparse networks that distill useful information in the auxiliary domain, improving the transfer efficiency. The proposed method is evaluated on representative datasets and applied for smart agriculture to train deep identification models that can effectively detect new pests using few data samples.
Although they have many advantages in performance, deep neural network- (DNN-) based methods often require expert knowledge to label data samples for generating datasets in training. The heavy requirement of labeled data will result in significant training costs, which make it expensive for extension. The deep transfer learning (DTL) can reuse well-trained models for the identification task and transfer knowledge that is learned from laboratory data to help identify in-field data, which alleviates the dependency on labeled datasets to reduce the cost. However, DTL has still not changed the fact that a considerable number of parameters need to be calculated, because they transfer all parameters, and many of the trained DNNs are overparametric. For example, ResNet-18 is the network commonly used as the backbone of DNN-based image recognition, in which up to 11.2 M (million) parameters need to be trained during each epoch [
Agriculture is one of the most important basic industries, which covers a wide range of the world. Agricultural production faces many risks, of which pest and disease outbreaks are the most economic threats [
For further agricultural extension, approaches with general applicability should be able to be used in mobile terminals, smartphones, and other small devices in the edge computing area. To this end, models must balance the performance with applicability and efficiency and adapt to limited processing power on the basis of ensuring detection accuracy. If only the most necessary parts would be transferred in DTL, it is possible to earn simplified models with lower device requirements for image recognition tasks, which could really reduce the computational cost and retain the advantage of inheriting knowledge by transfer learning. The newly proposed
As indicated above, in this paper, a sparse deep transfer learning method is proposed and applied to solve the problem of plant pest and disease identification based on image recognition. Firstly, a hypothesis is proposed that a transferable sparse subnetwork structure can be found and its portability can be verified. Then, the steps of the method are designed and used in DNN-based plant pest and disease identification, to seek and transfer an optimal sparse subnetwork to the target task to explore the application in practical problems. Finally, simulation experiments are carried out to show that the method can achieve an equivalent (or even higher) recognition accuracy with a more simplified network architecture and fewer parameters, while retaining the advantage of utilizing existing knowledge through transfer learning.
Thus, the main contributions can be concluded in two-fold aspects:
To relieve the lack of in-field labeled data and reduce the cost of collecting and labeling data samples by professionals for model training, a DTL-based method is designed, which can moderate the dependence of data in a plant pest and disease identification deep learning model To cope with the defect that the DTL-based method cannot reduce high computational complexity and high hardware requirements, a sparse transfer strategy is designed, which transfers the pruned network structure to reduce the parameters that need to be trained in the model, to simplify the network architecture, reduce the volume and computing cost of the model, and thereby provide the possibility of running the model on ordinary office computers, smartphones, and edge computing devices for better agricultural extension
The rest of this paper is organized as follows. Section
With the development and popularization of image sensors in DNN-based plant pest and disease identification models, the use of the convolutional neural network (CNN) is becoming an important trend in agriculture. The pests and diseases can be detected and classified by insect individuals, lesions, and representative characteristic changes, which are usually manifested on the leaves of affected plants [
In the above deep CNN-based models, there exist two main problems:
For the contradiction between labeled data requirements and the lack of in-field data in (1), DTL is introduced. Mohanty et al. [
For (2), the network structure is optimized [
To take advantage of DTL in building resource-efficient CNNs, researchers have made a series of efforts [
To sum up, in the face of these challenges, in this paper, the LTH is modified by using it in DTL to generate transferable sparse structures. Therefore, it transfers only the most necessary knowledge while reducing the volume of the network, to realize the sparse deep transfer learning.
The LTH states that for a feedforward DNN, there is an implicit optimal sparse subnetwork structure which is retrainable to achieve the same accuracy as the dense network within the number of original’s iterations. It can succeed in finding the subnetwork to retain knowledge and ability from large-scale datasets such as ImageNet in visual recognition tasks [
In this section, on the basis of the studies above, a sparse transfer method named WLTs-SDTL is proposed. It transfers only the most important part of the original network, and LTH is modified for generating sparse subnetworks in DTL. The method is then applied in plant pest and disease identification based on image recognition.
The particular subnetwork is generated when randomly initialized, and to seek for it is like to find a “winning lottery ticket” in the original network. We named it as a WLT-net. The retrainable WLT-net is able to be found by unstructured pruning according to the following conditions:
Consider a feedforward network whose loss function
Then, give the definition of the domain and task in DTL: the source domain is denoted as
Thus, combined with DTL, the sparse transfer hypothesis for WLT-nets is proposed in Assumption
Sparse transfer hypothesis for WLT-nets.
For the task
According to the reverse reasoning of LT hypothesis,
When the loss function The number of iterations is The percentage of accuracy rate is
Now that
When meeting the conditions, a small efficient network can be regarded as the sparse part of a larger dense DNN. In this way, transferable knowledge can be sought from an isomorphic structure in
The WLT-net in LTH has been proven to retain the original network’s performance with only 5%-10% of parameters left. In particular, when
In this section, the specific methods and steps of WLTs-SDTL are proposed. The process is illustrated in Figure
Process of the WLTs-SDTL method.
Concretely, the steps of implementation are as follows:
When It should be reasonable for choosing a DNN, which has been trained and used to solve certain problems in the practical application, to be the It is also able to train a new DNN from scratch as
Based on the proposed
Rank Set a pruning ratio
For parameters whose mask For parameters whose mask Different from the original LTH, the variation trend of weight is considered. The weights of the parameters being pruned are frozen at 0 only when they tend to 0. When the variation trend of weights in the training is moving away from 0, they are frozen at
This way, important skills and information can be inherited from
Compared with the original LTH, in the proposed WLTs-SDTL method, we have made optimizations and improvements in the following three aspects:
The LTH is extended and modified for generating sparse networks in DTL. By regarding There is a more reasonable standard to evaluate parameters in pruning. In the pruning process of the original LTH, the rating standard score Further consider the influence of the trend in weight changing during training on the freeze and reset of parameters. In the original LTH, after pruning, the parameters will be reset at
Furthermore, since the
A reasonable explanation is that these weights contribute less to the network and are not important. However, if it really does not matter, these weights could be set to any value, instead of a particular 0, without affecting the network’s performance. In experiments that freeze these parameters to
In this section, experiments are designed to verify the hypothesis and evaluate the performance of the proposed method in actual solutions. Firstly, the proposed
The sparse transfer hypothesis for WLT-nets is verified on the benchmark datasets, i.e., CIFAR-10 and SmallNORB. Specifically, define a DNN-based task
Identification of plant pests and diseases can be modeled as a multiclassification task based on image recognition. Therefore, two of the classical datasets, CIFAR-10 [
Properties of experimental parameters.
Dataset | CIFAR-10 | SmallNORB |
---|---|---|
Class | 10 | 5 |
Train | 50,000 | 40,000 |
Test | 10,000 | 10,000 |
Image size | ||
Domain | ||
Parameter settings | SGD [ |
About the structure of the network, which contains both the original fully connected dense DNN and the sparse WLT-net generated from it, ResNet-18 is chosen for the backbone. As the classical deep residual network’s 18-layer version (with 17 convolution layers and 1 fully connected layer, 11.2 M parameters to train), it is also used in the original LTH, so the same configuration is set for the experiments. Settings of experimental parameters are shown in Table
Firstly, experiments are designed to validate the performance of WLT-net compared with the original dense network on
The experimental results are shown in Figure
Experimental results: verify the sparse transfer hypothesis for WLT-nets.
Then, on the basis of the above analysis, to validate whether the sparse WLT-net can inherit knowledge from
Experimental results: using WLTs-SDTL on benchmark datasets.
As the experimental data shows, a better performance can be obtained than that of training dense DNN directly when proper pruning is carried out. It proves that the WLTs-SDTL can transfer the necessary ability from
In summary, experiments on benchmark datasets have verified the feasibility of the proposed
In this section, the proposed WLTs-SDTL is used to train a sparse network from a dense detection model based on image recognition and applied in actual solutions of plant pest and disease identification. Specifically, the common diseases of tomato leaves are identified, inheriting the ability from ImageNet and using open-source lab datasets for
The ResNet-18 network pretrained on ImageNet is used as
Since the samples of different categories in the original dataset are uneven, the crop tomato with sufficient samples is selected, in which categories with fewer samples and images with poor quality are eliminated. Then, data enhancement methods such as horizontal flip are used to adjust the sample size of each category to the same. Finally, a total of 8 categories of pests and diseases/1 healthy leaf are defined; meanwhile, the image size is adjusted to
Properties of the PlantVillage dataset.
Category | Sample set | Training set |
---|---|---|
Tomato healthy | 1,592 | 1,500 |
Tomato bacterial spot | 2,127 | 1,500 |
Tomato early blight | 1,000 | 1,500 |
Tomato late blight | 1,910 | 1,500 |
Tomato Septoria leaf spot | 1,771 | 1,500 |
Tomato spider mites | 1,653 | 1,500 |
Tomato mosaic virus | 373 | Unused |
Tomato leaf mold | 952 | 1,500 |
Tomato target spot | 1,404 | 1,500 |
Tomato TYLCV | 5,357 | 1,500 |
Deeper-level pruning is carried out. 15 rounds of iterative pruning are performed to find the optimal WLT-net, with at least 3.6% of the parameters being retained (406,495 compared with 11,173,962 in the original dense net). Other experimental settings are the same as them in the previous section.
The experimental results are shown in Figure
Experimental results of WLTs-SDTL on the PlantVillage dataset.
The highest accuracy is up to 97.69% in pruning level 5, when 67% of the parameters are removed. By and large, WLTs-SDTL can guarantee the accuracy in plant identification while reducing parameter computation. When pruning properly, the accuracy can be higher than that of the original dense net. If the optimal performance is required, fine-grained pruning can be gradually carried out between levels near the best accuracy, which is between 30% and 50% in this set of experiments. For example, we can set the pruning rate to 10% or less in each round, to find a balance between performance and volume of the model.
When only 3.6% of the parameters are retained, the sparse network is still able to achieve an accuracy of 93.16%, with no other than 406,495 parameters to be trained. Considering that the accuracy is acceptable in daily identification tasks, it verifies that the proposed WLTs-SDTL can generate a small efficient network suitable for mobile terminals or edge computing devices with low computational power in the practical application of pest and disease identification.
In this section, a small scale of datasets collected in Chongqing, China, is used in training the identification model to detect the citrus greening disease (Haunglongbing), combined with lab data. Citrus greening (Haunglongbing) is a devastating disease in the world citrus production, which seriously restricts the development of the citrus industry [
We have collected 1,266 images from Chongqing, China, as well as from Internet and monographs, in which 238 samples are Haunglongbing. In the process of photographing samples in the field, there may be more than one leaf in a photo, and more training samples can be obtained through clipping, as illustrated in Figure
The composition of the dataset: (a) samples in PlantVillage; (b) collected samples; (c) process of clipping.
15 rounds of deep-level iterative pruning are performed, while the other experimental settings are the same as them in the previous section. About the initial weight used for late reset, two kinds of options are chosen: (1) the weight of ResNet-18 was pretrained on ImageNet, which has been used as
After training on the original dense network, when performing a traditional intensive DTL (no parameters pruned), the initial weight (1) achieves 97.14% accuracy while the final weight (2) achieves 98.06%. Then, the proposed WLTs-SDTL is used, respectively; the relationship of accuracy and remaining parameters is shown in Figure
Experimental results of using WLTs-SDTL in identifying Haunglongbing.
The experimental results show that, compared with training directly, DTL can achieve better initial performance in identifying citrus Haunglongbing disease under the help of collected data. And regarding the proposed sparse DTL, the following are obtained:
As the parameters decrease, the overall accuracy declines. However, it is still within an acceptable range and higher than no-transfer dense network. When the weight from a similar identification task is used in (2) shown in Section Although fewer parameters are used, the model can achieve higher accuracy in some pruning levels: in (1) shown in Section Fine-grained pruning in the optimal range can be proceeded for the best performance. And in the experimental results of this paper, note that the range is always between 50% and 30%. Therefore, we speculate that in the proposed WLTs-SDTL, the priority could be given to these pruning levels when models’ performance is preferred. And when the volume of the model needs to be compressed as much for widely deploying, it shows that the sparse model can use about 10% of the original parameters (accuracy 96.67% when using 8.59%) to maintain an acceptable performance close to that of the original. The limiting small model with only 3.6% of the original parameters is also taken into account, whose accuracy is able to achieve 94.01%, still higher than that of the dense net without transfer, and is more likely to be used for low-computing-power devices or edge computing devices
To sum up, when the proposed WLTs-SDTL is used in an actual solution of identifying diseases of plants, the sparsification of the network can be realized through pruning to save the computational overhead of parameters while maintaining or even improving performance. Thus, the balance between performance and model size can be dynamically adjusted, and the deployment possibility of low-computational-power equipment is provided.
In this paper, a sparse deep transfer learning model is proposed. The method is aimed at modeling the identification of plant pest and disease with limited collected data in the field, and a sparse DTL strategy is designed to transfer only the most important architecture and optimize models’ size.
Specifically, (1) the sparse transfer hypothesis is proven, which succeeds in modifying LTH to reduce the parameter computation in DTL by generating sparse transferable WLT-nets. (2) The sparse transfer method named WLTs-SDTL is formally proposed, in which the compressing strategy is designed to construct a deep sparse network, distills useful information from the auxiliary domains, and improves the transfer efficiency. (3) The proposed method is applied to detect pests and diseases with few data samples in training deep identification models. The hypothesis is verified by the benchmark dataset; meanwhile, the proposed method is evaluated on the representative datasets.
Experimental results show that when the proposed method is used in actual solutions, the sparsification of the network can save the cost of computing parameters while maintaining or sometimes improving the performance, thereby dynamically adjusting the balance between the model’s accuracy and size, providing the deployment possibility in low-computational-power devices.
Moreover, the sparse strategy can be promoted in identifying new pests and diseases of the plant with few data and even widely used in other tasks based on image recognition and lack of data. On that occasion, depending on specific tasks, it is supposed to wisely choose the suitable network architecture and balance the accuracy and volume of models.
In the future, the proposed method will be studied on more other domains to overcome the scarcity of data and the redundancy of model parameters, improving the effectiveness of sparse deep transfer learning.
The open-source datasets used in this paper, such as IMGnet, are freely available in various deep learning frameworks such as PyTorch and TensorFlow.
The authors declare that they have no conflict of interest.
This work is supported by the National Natural Science Foundation of China (Nos. 61672123 and 62076047) and the Fundamental Research Funds for the Central Universities (Nos. DUT20LAB136 and DUT20TD107).