A Vehicle Reidentification Algorithm Based on Double-Channel Symmetrical CNN

It has become a challenging research topic to accurately identify the vehicles in the past from the mass monitoring data. ,e challenge is that the vehicle in the image has a large attitude, angle of view, light, and other changes, and these complex changes will seriously affect the vehicle recognition performance. In recent years, the convolutional neural network (CNN) has achieved great success in the field of vehicle reidentification. However, due to the small amount of vehicle annotation in the dataset of vehicle reidentification, the existing CNNmodel is not fully utilized in the training process, which affects the ability to identify the deep learning model. In order to solve the above problems, a double-channel symmetric CNN vehicle recognition algorithm is proposed by improving the network structure. In this method, two samples are taken as input at the same time, in which each sample has complementary characteristics. In this case, with limited training samples, the combination of inputs will be more diversified, and the training process of the CNNmodel will be more abundant. Experiments show that the recognition accuracy of the proposed algorithm is better than other existing methods, which further verifies the effectiveness of the proposed algorithm in this study.


Introduction
In recent years, the society pays more and more attention to the public security problem, and the monitoring equipment is more and more popular. A large number of surveillance cameras are used in crowded places prone to public security incidents, such as traffic intersections, parks, large shopping malls, stations, and airports. e emergence of surveillance cameras has brought great convenience to the case detection of public security organs, such as suspected vehicle chase, cross-scene vehicle search, abnormal event detection, and so on [1,2]. A large number of surveillance cameras form a huge network of surveillance. Although the monitoring system has developed rapidly, it has brought great challenges to the management and analysis of monitoring data [3,4]. At present, the monitoring system mostly adopts the method of real-time camera and human participation to monitor. e massive monitoring data is a big problem for the personnel who are in charge of monitoring the video. ere are two reasons: (1) the monitoring system generates data in real time, resulting in a huge amount of data; (2) the real-time monitoring data records a scene with random changes, and it is difficult for the monitor staff to pay attention for a long time during the long-time observation. It can be seen that this kind of monitoring mechanism with human participation is no longer applicable to the management and analysis of monitoring data. However, the emergence of vehicle reidentification technology overcomes the deficiency in the supervision mechanism of human participation.
In recent years, deep learning models represented by the convolutional neural network (CNN) have achieved great success in the field of computer vision. At the same time, CNN also led the research in the field of vehicle reidentification. Compared with the traditional vehicle reidentification method designed by hand, the vehicle reidentification method based on CNN can overcome the complex changes of vehicles more effectively and achieve higher performance. However, vehicle reidentification is different from other computer vision tasks because it is very difficult to annotate the vehicles, resulting in a small amount of vehicle annotation in the existing dataset. On the limited training set of the current picture training set, the training of the existing single-channel CNN model will make the training process of the CNN model insufficient. In order to make the combination of input images more diversified, multiple combinations of images can be used as input to fully train the CNN network. At the same time, the recognition rate will be improved because the double-channel CNN network can input more features.
is study attempts to design a double-channel symmetrical CNN structure for vehicle reidentification by improving the network structure. In this double-channel structure, two samples are input at a time. At this time, compared with the previous single-channel CNN model, the input combination form of this double-way CNN model is more diversified, which is suitable for the deep learning model with stronger ability of learning and obtaining discrimination.

Related Works
e task of vehicle reidentification [5,6] is to study how to accurately identify the vehicle that has appeared in a particular occasion in the mass monitoring data, in which the monitoring data are mainly image data. e challenge of the task is that the vehicle in the image has a large attitude, angle of view, and other complex changes. In addition, during the shooting process, different lighting will also make the appearance of the vehicle change greatly. e above changes will seriously affect the performance of vehicle recognition. At present, the research on target reidentification mainly focuses on the field of pedestrian reidentification [7][8][9][10] and is rarely applied to other targets. Since 2015, a small number of scholars have tried to enter the field of vehicle reidentification, but they can only be applied to images of the same scale and angle, with weak robustness to environmental changes or based on small datasets.
In order to improve re-ID capability, some methods utilize additional attribute information such as the model/ type and color to guide vision-based representation learning [11]. For example, [12] introduced a two-branch retrieval pipeline to extract differences between models and instances. Yan et al. [13] studied the multiparticle relationship of vehicles with multilevel attributes. Other works study temporal and spatial associations, which derive additional benefits from the topological information of the camera [14]. In addition, some methods use GAN [15] to generate images from the required viewpoint, so as to achieve viewpoint alignment. It can be said that these works solve the problem of viewpoint change through viewpoint alignment.
In addition, [16] claimed that in addition to the dataset used for training, the features made by traditional handwork were easy to generate deep features, so the two features were combined to achieve an improved representation. Liu et al. [17] used a multimodal finite element analysis including visual features, license plate, camera position, and other contextual information in a coarse to fine vehicle retrieval framework. In order to enhance training data and achieve robust training, [18] used the generated countermeasure network to synthesize vehicle images with different directions and appearance changes. Zhou and Shao [15] through the attention model of advertising learning and visual perception, the visual perception representation of vehicle re-ID is learned. Zhang et al. [19] proposed an improved joint optimization of three-one-loss execution and an auxiliary classification loss as a regularization to represent the in-sample variance.

Single-Channel CNN Structure
e single-channel CNN structure is introduced in this section first; then, double-channel CNN structure is detailed in the next section.
In the training set of vehicle reidentification, the singlechannel CNN model based on identification is used for learning, so that the deep learning model obtained after training can distinguish different vehicles. Based on the existing classic CNN model, all the convolutional layers and full connection layers in the AlexNet [20] and ResNet-50 [21] models are used. e default parameters provided in literature [20,21] are adopted, and the output of the last full connection layer is modified to be the total number of different vehicles in the vehicle reidentification training set. e CNN model of the single-channel method is the finetuning of the pretraining model obtained on the ImageNet dataset [22], at which time the convergence rate of the CNN model is faster. Especially, in the case that the scale of the vehicle reidentification training set is not very large, this training strategy is more effective and achieves the purpose of distinguishing different vehicles. e network training process is described as follows. e vehicle reidentification training set is recorded as D � x i , y i N i�1 , the vehicle image is x i , and the identity (ID) is y i . e vehicle image is first processed to a size of 256 × 256 pixels and then randomly cropped to a fixed size (AlexNet is 227 × 227 pixels and ResNet-50 is 224 × 224 pixels). e processed vehicle image is sent to the data layer of the CNN model as an input to the network. e goal of network training is to get a deep learning model M through deep learning. It is equivalent to mapping: f(x, θ) ∈ R C , where θ represents the parameters of each layer in the CNN model. In the process of each minibatch iteration, the parameter Q is updated using the random gradient descent (SGD) algorithm. In the t iteration, the current parameter Q is updated as follows where c is the learning rate, D t is a set of minibatch samples taken randomly from D, ∇ is the gradient operation, and l is the loss function, which is the softmax loss function. e softmax loss function acts as a supervisory signal to guide the network training process. As the training process progresses, the value of the loss function gradually decreases. At this point, the trained network is convergent.
In the process of vehicle recognition, the deep learning model M obtained from network training is used as the feature extractor. e middle layer of the probe set and gallery set of the vehicle image is processed, and the response of the middle layer is extracted as the feature. AlexNet is set as the response of the FC7 layer, and ResNet-50 is set as the response of the Pool5 layer. On the basis of image features, cross-camera retrieval is performed, that is, the distance of image features between the samples in the probe set and the gallery set is calculated. e distance is sorted, and the final vehicle rerecognition performance is evaluated against the sorted list.

Proposed Double-Channel Symmetric
CNN Structure e vehicle recognition method proposed by the doublechannel symmetric CNN structures is described in this section.
e overall structure of the model is shown in Figure 1 (taking the AlexNet model as an example). Compared with the existing single-channel CNN model, the proposed double-channel symmetric CNN model inputs two samples at same time, and the input combination forms are more diversified. Each middle layer has the same structure and can be considered symmetrical but does not share parameters with each other. By connecting the last fully connected layer in the double-channel model, each layer in the double-channel model interacts with each other and promotes each other, which can be considered as complementary.
e goal of the network training process of the identification model is to learn an optimal mapping for a given training set, so that the prediction results of vehicles are closer to their real identity (ID). On the one hand, the richer the sample in the training set, the stronger the generalization ability of the model obtained. On the other hand, for a particular vehicle, the difference in appearance is more obvious because it is a vehicle image collected under cross-camera. By combining different vehicle images within a specific vehicle, the samples can complement each other and narrow the differences in appearance. erefore, the designed structure is more suitable for the deep learning model with stronger discrimination ability to be learned, so as to improve the performance of vehicle rerecognition.
In the proposed double-channel symmetry CNN structure, two vehicle images are input at the same time each time, and the two images belong to the same vehicle. ese sample pairs are a pairwise combination of all samples corresponding to the same carrier in a full permutation form. e preprocessing of vehicle image before sending to the network data layer is consistent with the single-channel method. Each convolution layer and the full connection layer have the same structure and settings, and each CNN model is fine-tuned by the pretraining model obtained on the ImageNet dataset. An example of the AlexNet model is showed in Figure 1. e full connection layers of FC6 and FC7 in each road are, respectively, connected with their convolution layers. e full connection layers of FC7 in the two channels are connected in series, denoted as FC7_concat. N 1 � 4096 dimensions; N 2 � 8192 dimensions. ree fully connected layers (double FC7 layers and one FC7_concat layer) are, respectively, connected to the fully connected layer FC8. e number of outputs of the FC8 layer N3 is the same as the total number of vehicles in the training set. e three softmax loss functions are used as the supervisory signals to guide the network training process, and the sum of the three loss functions is used as the network loss. If the intermediate skeleton of the proposed two complementary symmetrical CNN structures is replaced by the ResNet-50 network, since the last layer of the network is the pool layer Pool5 and not the full connection layer FC7, then Pool5 is used instead of FC7, and the connected Pool5 layer can be denoted as Pool5_concat, where N 1 � 2048 dimensions and N 2 � 4096 dimensions. e network training strategy and the process of the proposed doublechannel symmetric CNN structure are the same as the single-channel method. e process of vehicle reidentification is to use the deep learning model obtained in the process of network training as a feature extractor. It extracts the response of the middle layer (AlexNet is the response of the FC7_concat layer and ResNet-50 is the response of the Pool5_concat layer) as the feature representation of the vehicle image in the probe set and the gallery set. On the basis of image features, the crosscamera search is performed to calculate the distance between the image features in the probe set and the gallery set, and the distance is sorted. e final vehicle recognition performance is evaluated according to the ranking list.

Dataset Construction.
e test vehicle dataset is collected by 4 different intersection monitoring platforms, and the installation location is shown in Figure 2. e same angle video is taken every 2 hours at an interval angle of 30°, and a total of 7 angle mp4 format video images are obtained from the front to the back. Finally, a total of 20,160 complex scene multivehicle image sets T are extracted from the video at intervals of 10 seconds. Since the data acquisition takes full account of the problem that the positive sample number encountered by most datasets is zero, the design monitoring installation location is on each exit section of the loop. As shown in Figure 2, the number of image captures for the same vehicle is 2 times regardless of the intersection of any intersection from a to d (except for vehicles that repeatedly enter the road segment). A total of 45,742 identifiable vehicles with pixels greater than 128 are extracted from T and denoted as D. 80% of them are randomly selected to generate D train of the training set and 20% to generate D test of the training set.

Experimental Setup and Evaluation Criteria.
Deep learning framework CAFFE [23] is used to implement the proposed method. e hardware configuration used in the experiment is as follows: GTX 1080 GPU, 8 GB video Advances in Multimedia memory, 128 GB memory, Intel core 8-core i7 processor CPU, and main frequency 3.60 GHZ.
Cumulative matching characteristic (CMC) curve, rank-1 accuracy, and mean average precision (MAP) were selected to evaluate the performance of the vehicle reidentification method. e CMC curve represents the probability that the truth value image to be queried will appear in a candidate sequence of different lengths. e rank-1 recognition accuracy rate represents the probability that the queried truth value image appears at the first position of the candidate sequence. e MAP is the average area under the curve of the accuracy rate of all query samples and recall rate, which reflects the overall performance of the vehicle reidentification method.

Experimental
Result. By using the AlexNet and ResNet-50 model framework, the experimental results of the singlechannel CNN method and the double-channel symmetric CNN method are compared, as shown in Table 1.
e results show that the double-channel symmetric method has a stable improvement over the single-channel method. In the AlexNet model, the accuracy of rank-1 increased by 5.13%, and the accuracy of MAP increased by 4.26%. In the ResNet-50 model, the accuracy of rank-1 was improved by 0.71%, and the accuracy of MAP was improved by 2.44%.
In addition, on the ResNet-50 model, the accuracy of rank-1 and MAP of the proposed method in this study is 74.36% and 49.55%, respectively. At this time, the performance of this proposed vehicle recognition has reached a higher level. e proposed method in this study is compared with some existing vehicle reidentification methods, including the traditional manual design method and the deep learningbased method. e specific comparison ends are shown in Table 2. e results show that the proposed method in this study has achieved a competitive performance of vehicle reidentification, which is better than some existing vehicle reidentification methods.
In order to further verify the effectiveness of this algorithm, the existing VeRi-776 dataset [28] is used for validation. e VeRi-776 dataset was captured by 20 cameras over a 24-hour urban area, containing 49357 images of 776 vehicles. Images are captured in a real-world unconstrained monitoring scenario and tagged with different attributes, such as type, color, and brand. Each vehicle is photographed by 2-18 cameras at different points of view, lighting, resolution, and occlusion. In the experiment of this study, each image of 2 cameras was selected as experimental data. e results are shown in Table 3. According to the data, the method proposed in this study achieves a better vehicle rerecognition performance, which is superior to other algorithms.

Conclusion
In order to further improve the performance of vehicle reidentification, this study proposes a double-channel symmetric CNN structure vehicle reidentification method. Under the original training samples, this algorithm inputs two samples at the same time, among which each sample has complementary characteristics. At this point, with limited training samples, the combination of inputs will be more diversified, which will enrich the training process of the CNN model. erefore, the CNN model can be trained more fully, and a deeper learning model with stronger recognition ability can be obtained. e vehicle training map library was extracted from the monitoring video of different intersections, and then, the algorithm in this study was compared with other algorithms. Experimental results show that the vehicle recognition accuracy of the proposed algorithm is higher than other existing algorithms, which verifies the effectiveness of the proposed method.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.