Object Detection with the Addition of New Classes Based on the Method of RNOL

Object detection plays an important role in many computer vision applications. Innovative object detection methods based on deep learning such as Faster R-CNN, YOLO, and SSD have achieved state-of the-art results in terms of detection accuracy. (ere have been few studies to date on object detection with the addition of new classes, however, though this problem is often encountered in the industry. (erefore, this issue has important research significance and practical value. On the premise that the old class samples are available, a method of reserving nodes in advance in the output layer (RNOL) was established in this study. Experiments show that RNOL can achieve high detection accuracy in both new and old classes over a short training time while outperforming the traditional fine-tuning method.


Introduction
Object detection involves the two distinct tasks of object recognition and location. It is not only necessary to identify the class of the object in the image but also able to locate the object within a rectangular area. In [1], only the object is recognized, but the object is not located in the rectangular area. Object detection is a common component of artificial intelligence and information technology systems including robot vision, unmanned aerial vehicle surveillance, automatic driving, intelligent video surveillance, and medical image analysis.
Many scholars have studied object detection. Most of the traditional methods are based on background subtraction [2][3][4]. Recently, many scholars have developed numerous object detection methods based on deep learning, such as Faster R-CNN [5], YOLO v3 [6], and SSD [7] and achieved state-of the-art results in regard to detection accuracy. When adding new classes, however, it is very time-consuming to train an object detection model from scratch on the premise that the old classes are available. How can the model training time be improved without sacrificing high detection accuracy in both new and old classes?
is problem is often encountered in the industry; this issue has important research significance and practical value. Fine-tuning [8] is the method most commonly used to solve the new-class addition problem at present. e finetuning method uses the weights of the old model except for the last output layer. Although this method can train the model in a short time, its detection accuracy is relatively low.
In this study, we developed the reserving nodes in advance in the output layer (RNOL) method to solve the object detection problem when adding new classes based on the Faster R-CNN and fine-tuning method. We conducted a series of experiments on the PASCAL VOC 2007 to validate the proposed method. e results show that, on the premise that the old classes are available, RNOL can train the model well and quickly when new classes are added. RNOL also demonstrated higher detection accuracy on both new and old classes than fine-tuning, discussed in detail as follows.
network and support vector machine techniques, object detection methodology has transformed from geometric to statistical. In recent years, advancements in computing and deep learning technology have brought about object detection frameworks based on deep learning such as R-CNN [8], Fast R-CNN [9], Faster R-CNN [5], YOLO [10], YOLO v2 [11], YOLO v3 [6], and SSD [7]. e new-class addition problem has a long history in the machine learning and artificial intelligence field [12][13][14][15]. e problem may be approached when old classes are not available [16,17] or when old classes are available; there has been considerably less research centered on the latter scenario. Rebuffi et al. [18] researched the problem using a small number of old classes. Extant methods are not ideal as far as the detection accuracy of new or old classes, so it is difficult to meet industrial needs at present. In this paper, we discuss only scenarios wherein old classes are available.

Reserving Nodes in Advance in the Output Layer
For object detection problems considering the addition of new classes, the RNOL method primarily works by reserving the number of nodes in the output layer appropriately; the number of nodes in the output layer in this case exceeds the number of old classes. To operate RNOL, we first use a Faster R-CNN to build a model, then reserve the correct number of nodes in the output layer, and train the model on the old classes before saving the model and weight. Next, when new classes are added, the saved models and weights are loaded and the models are trained on both the new and old classes. Finally, we use the fully trained model to detect the test samples.
A diagram of the RNOL method is shown in Figure 1. Hollow dots in the output layer represent reserved nodes. e number of nodes in the output layer is larger than the number of the old classes, as mentioned above. e number of reserved nodes can be set artificially.
is method is effective as long as the number of new classes is not greater than the number of reserved nodes. e person in Figure 1 belongs to the old class and the horse belongs to the new class. e proposed method resolves the problem of the coexistence of new and old classes. Compared to fine-tuning, the advantage of RNOL is that it can effectively utilize more weight information of the old class model, including the weight of the output layer.

Datasets and Evaluation.
We evaluated our method on the PASCAL VOC 2007 dataset, as mentioned above. VOC 2007 consists of 5K images in the trainval split and 5K images in the test split for 20 object classes. We used the standard mean average precision (mAP) at 0.5 IoU threshold as the evaluation metric; evaluation of the VOC 2007 experiments was conducted on the test split.

Implementation Details.
We randomly initialized all new layers by drawing weights from a zero-mean Gaussian distribution with a standard deviation of 0.01. We used the stochastic gradient descent (SGD) with Nesterov momentum [19] to train the network in all experiments. We set the learning rate to 0.001, decay to 0.0001 after 50K iterations, and momentum to 0.9. In the second stage of training, i.e., learning the extended network with new classes, we used a learning rate of 0.001 and decay to 0.0001 after 10K iterations. e A(C A ) network was trained for 70K iterations on PASCAL VOC 2007. e B(C B ) network was trained for 20K iterations when only one class was added and 30K iterations when 10 classes were added simultaneously. For the Faster R-CNN, we took batches of two images each. All other layers (i.e., the shared convolutional layers) of the A(C A ) network were initialized by pretraining a model for ImageNet classification [20]. We implemented this in Tensorflow [21].

Effects of RNOL.
We sought to determine whether reserving nodes in advance in the output layer increases the computing time or affects the object detection accuracy compared to the traditional method.
We took 10 classes in alphabetical order from the VOC2007 dataset and ran two respective types of experiment. Experiment 1 involved reserving 10 nodes in the output layer (that is, the number of neurons in the output layer was 20). ere was no reserved position in the output layer, in Experiment 2; that is, the number of neurons in the output layer was 10. e detection accuracy of the two experiments is shown in Table 1, and the training times are shown in Table 2. We observed no significant difference in test results and training time between the two experiments, which suggests that RNOL does not increase the training time nor affect the detection accuracy.  Table 3, and the full results are listed in Table 4.
As shown in Table 3 ese results suggest that the RNOL method outperforms both the fine-tuning method and TFS method. When adding one class, the RNOL method yields higher accuracy in a shorter training time than fine-tuning or TFS.
We next compared the RNOL and fine-tuning methods in the new network B(1 − 20) when adding one class. Each was trained 30K times, and the weights were saved every 5K iteration; each saved weight was loaded on the detection set.
e test results are shown in Figure 2, where the RNOL method achieves its highest detection accuracy when training 20K iterations and then begins to decline. e finetuning method accuracy increases slowly over the experiment but does not readily exceed that of the RNOL method.

Addition of Multiple Classes.
In this experiment, we took 10 classes in alphabetical order from the VOC 2007 dataset as C A and the remaining 10 classes as C B . We then trained the A(1 − 10) network on C A and the B(1 − 20) network on the VOC trainval containing the 20 classes. A summary of the evaluation of these networks on the VOC test set is shown in Table 5, and the full results are listed in Table 6.
As shown in     achieves higher accuracy in a shorter training time than finetuning or TFS. For adding 10 classes, we have compared RNOL and finetuning methods in the new network B(1 − 20). Each was trained 30K times, the weights were saved every 5K iteration, and each saved weight was loaded on the detection set. e results are shown in Figure 3. e detection accuracy of RNOL is higher than that of the fine-tuning method when training 30K

Conclusion
For object detection considering the addition of new classes when the old classes are available, we improved the Faster R-CNN model in this study by reserving nodes in advance in the output layer. Our experimental results show that RNOL can achieve high detection accuracy in both new and old classes in a short training time. Although the proposed method outperforms the fine-tuning method, its detection accuracy still has room for further improvement. One possible way to do this is to increase the number of training iterations, but it will increase the cost of training time.

Data Availability
We evaluated our method on the PASCAL VOC 2007 dataset.

Conflicts of Interest
e authors declare that they have no conflicts of interest.