Recognition of Taxi Violations Based on Semantic Segmentation of PSPNet and Improved YOLOv3

,


Introduction
In recent years, with the rapid development of economy and urbanization, the total amount of roads and vehicles in various cities in China has also shown a trend of continuous growth [1].Among them, as a convenient and efficient way of travel, taxis are also welcomed by the general public.But we have noticed that illegal parking of rental vehicles is also increasing.However, taxis have the characteristics of mobility, dispersion, and nonspecific service groups [2][3][4].So relevant managers urgently need a fast and accurate method of judging taxi violations to facilitate the management of urban traffic and support the green and efficient demand for intelligent transportation.
e current existing supervision methods are mainly through manual screening.
e staff conduct randomly checks on the past videos in the monitoring center database to observe whether the drivers spotted have any violations [5,6].However, due to human eye fatigue and lack of concentration, this method cannot guarantee continuity and reliability, and it also consumes a lot of manpower.
erefore, how to realize the automatic detection and recognition of taxi vehicle violations has become one of the research focuses and difficulties.
With the advent of the era of the Internet of Everything, the urban transportation network is also covered with a variety of heterogeneous detection terminals, forming a powerful Intelligent Transport System (ITS).ITS effectively integrates the Internet of ings, big data, cloud computing, and other high-tech technologies [7,8].It constitutes a smart city traffic monitoring system for terminal situation awareness-cloud decision analysis.is also provides new ideas for the detection of taxi violations.e ITS terminal monitor collects road condition images in real time and uploads them to the cloud for accurate analysis of taxi violation images [9,10].erefore, an efficient vehicle parking violation image detection method is particularly important.
As a fusion product of big data technology and artificial intelligence technology, deep network provides a new solution for the detection of complex traffic image violations [11,12].e deep network model uses the multilayer network structure to train and learn the image data set collected by ITS through the continuous training and learning of the multilayer network to realize effective judgment of violations.But it should be noted that due to the multilayer network learning mechanism of the deep network, the network parameter setting is cumbersome, and there are many elements in need of being processed in video surveillance, such as vehicle classification, road segment discrimination, license plate recognition, and vehicle speed detection [13,14].However, current taxi violation detection methods can only analyze simple images.For complex images, it is difficult to achieve orderly and effective extraction of image features.
In view of the low performance of current vehicle violation detection methods, this study proposed a new taxi violation detection method based on the improved YOLOv3 network and PSPNet network.e content is as follows: (1) In this study, the spatial pyramid pooling of semantic segmentation is merged into the traditional YOLOv3 network.Based on the improved YOLOv3 network, a new method for detecting taxi violations is proposed.e problem of information distortion in the collected images is avoided, and accurate detection of taxi and taxi license plate occlusion behavior can be realized.
(2) is study proposed a detection method for illegal taxi parking based on PSPNet semantic segmentation network.It can comprehensively collect the global information of the road condition image to realize the orderly extraction of the image characteristics of the violation behavior.It further enhances the accuracy of taxi parking violation detection.
e rest of this article is organized as follows.e first section introduces corresponding researches on vehicle image detection.e second section shows the taxi recognition method based on the improved YOLOv3 network model.e third section continues to introduce the method of identifying illegal taxi license plate occlusion based on the improved YOLOv3 network model.e fourth section dicusses the PSPNet network model and the detection method of violation behavior.e fifth section realizes the simulation verification of the proposed taxi violation detection method based on the actual traffic collected images.
e sixth section concludes the whole article.

Related Works
As the main mode of urban travel, taxis have strong mobility.At the same time, due to the randomness of the service target, the phenomenon of illegally rolling over the solid line parking is more likely to occur when taxis pick up passengers.In addition, some taxi owners also hide their license plates in order to avoid legal liability [15]. is has further increased risk factors of urban traffic travel.erefore, effective taxi detection and analysis methods are particularly important to ensure urban traffic safety.
Traditional vehicle detection methods are based on moving target detection technology, including methods such as ViBe background interframe difference and codebook modelling [16].Among them, the interframe difference method is one of the most commonly used methods for moving target detection and segmentation.e pixel-based time difference is used between two or three adjacent frames of the image sequence, and the motion area in the image is extracted through the closed value.Cao et al. [17] used unsupervised static recognition and dynamic tracking methods for vehicle dynamic tracking.However, it should be noted that if the color of the moving target is close to the video background color, or the moving speed of the moving target is relatively slow, the frame difference method will often detect a more obvious hole.Even the moving target is regarded as noise, resulting in missed detection [18].Aiming at jittery traffic video, Xu et al. [19] proposed a hybrid algorithm of codebook algorithm and binary mode (LBP).Among them, the codebook modelling used the codebook to represent the background pixels according to the color distortion degree of the continuous sampling values of the pixels and the brightness range thereof.
en, used the background difference method to compare and judge the new input pixel value and its corresponding codebook.Combined with the advantages of the binary mode, the foreground target pixels can be extracted.But the disadvantage of this method is that it is difficult to balance noise and foreground holes.Moutakki et al. [20] also used the codebook background analysis method to realize the positioning of the vehicle in the image.However, multiple modules have been added, such as vehicle segmentation, vehicle classification, and vehicle counting.
As a product of computer technology in a new generation, deep networks have achieved good applications in many fields due to their powerful image processing capabilities [21,22].At present, relevant researchers in the field of intelligent transportation are also paying attention to it.Some improve the YOLOv2 network based on the residual network and use multiscale information to improve the accuracy of target detection [23].Also, based on the Elu activation function, the Kelu activation function is designed to ensure the accuracy of license plate detection.Reference Tang et al. [24] proposed an improved, single-shot, multiframe detector based on deep learning.e attention mechanism is introduced through the spatial transformation module, so that the neural network can actively perform spatial transformation on the feature map.Also, adding context information transmission in the designated layer can achieve accurate illegal parking detection.Abbas [25] constructed a pretrained convolutional neural network model with a four-layer architecture and used Shenxin network model to detect vehicle overlimit, speed limit overlimit, and yellow line driving and other illegal phenomena.Liu et al. [26] [27], the accuracy of the detection and recognition of illegal taxi behaviors will be more accurate.e technical differences of some literatures are summarized in Table 1.
In response to the above-mentioned problems, this study proposed a new method for detecting violations of taxis based on the improved YOLOv3 network model and the PSPNet network model.It can further extract the effective information in the actual road condition images to realize efficient image detection and analysis.

Methodology
e inspiration of this method comes from modular programming, which divides the whole research into several modules, including analyzing each module, considering the reuse technology between modules, and improving and optimizing the module appropriately, so as to obtain better system performance.
e whole framework of this method is shown in Figure 1.As shown in Figure 1, this method is mainly divided into three modules: taxi detection module, license plate occlusion detection module, and taxi illegal parking detection module.Because the research object of this study is one taxi, it is necessary to first detect whether the vehicle is a taxi.In the first module, this article improved the traditional YOLOv3 network structure (as shown in Figure 2), enhanced the expression ability of the network for feature information, and realized the efficient detection of taxis.e second module is based on the first module and realized the behavior judgment of illegal shielding of license plate based on the improved YOLOv3 model.e methods of shielding license plate include full shielding and partial shielding, and the process is shown in Figure 3. e third module is to detect the illegal stop behavior.Its method framework is shown in Figure 4. e detection methods include offline detection and real-time detection.
e main problems solved by the three modules are as follows: (1) how to accurately detect that the vehicle in the current frame is a taxi.(2) How to judge the illegal act of license plate occlusion.(3) How to quickly and accurately detect taxi parking violations.
e uniqueness of our research is to detect multiple illegal acts of taxis, and the module can be reused for different situations.

Taxi Detection Based on Improved YOLOv3
In this study, the traditional YOLOv3 network structure is improved for the higher accuracy requirements of the test in the actual road scene in order to enhance the network's ability to express feature information and then to implement efficient taxi detection based on the improved YOLOv3.e YOLOv3 network mainly includes 3 branches.e original image of the video to be detected is used as input, and the result is 3 feature maps with different resolutions.
e branch at the highest level processes the original image using a multilayer convolution method, and the resulting feature-map resolution is usually 13 × 13, and the number of channels is usually 256.e branch in the middle layer will obtain the result of the middle convolution calculation of the high-level branch and use the upsampling method to connect the convolution results of the original image.e resulting feature map has a resolution of 26 × 26, and the number of channels is 256.After the branch at the bottom layer obtains the convolution result of the middle layer branch, the same operation method as the above middle layer is performed.
e final feature-map resolution is usually 52 × 52, and the number of channels is usually 256.In the YOLOv3 network, the convolutional neural network uses residual connections to effectively alleviate the problem of gradient disappearance in the training phase.In the upsampling process, the intermediate convolution result is enlarged by 2 × 2, so that the intermediate convolution result can be better connected with the direct convolution result of the original image.
Finally, after obtaining three feature maps with different resolutions, the target frame prediction strategy of anchor points is adopted.at is to predict the location range of abnormal behaviors through a fully convolutional neural network.In this process, s kinds of scale anchor points need to be set for all possible target positions.e length and width (x, y, h, w) of the rectangular box and the coordinates of its upper left corner are the results that need to be output.In addition, the confidence of the target frame is the estimated range of the candidate frame.If the search range of the target is the grid area of L × L, it can be obtained that the resolution of the final output decision diagram is L × L and the number of channels is 5 × s.
In the YOLOv3 model, its network weights are generally obtained through the training and testing of the ImageNet data set.e target of this data set is quite different from the abnormal behavior to be detected.erefore, the YOLOv3 algorithm and the feature map are improved, the receptive field is expanded, and the loss of semantic features is avoided as much as possible.
In order to strengthen the detection of small targets, YOLOv3 algorithm draws on Feature Pyramid Network (FPN).e high-level feature and the shallow feature information are fused, and the multiple-scale fusion method is used to perform position and category prediction on multiple-scale feature maps [28].However, the three-scale feature fusion method adopted by the YOLOv3 network structure has an adverse effect on the detection of smaller targets in the surveillance video.e semantic loss of the 13 × 13 feature map is serious, which can easily cause the loss of small targets, considering that the resolution of the feature will directly affect the detection of small targets and the overall performance index.erefore, on the basis of Darknet-53, the three-scale resolutions of the original feature map of 13 × 13, 26 × 26 and 52 × 52 are modified to two larger-scale resolutions of 26 × 26 and 52 × 52. e network structure is shown in Figure 2. In order to avoid image distortion caused by operations such as image scaling, stretching, and cropping, the Spatial Pyramid Pooling (SPP) of semantic segmentation is introduced into it, and the module structure of SPP is shown in Figure 5. rough the SPP module, the feature map of any resolution can be converted into the designed feature vector of the same dimension as the fully connected layer.At the same time, SPP solves the problem of repeated extraction of image features, so it also improves the calculation speed.Its specific effects and operations can be found in the reference.
It is worth mentioning that the feature map resolution of the original network structure of 26 × 26 is used as the first scale; the L61 (61 layer) result is subjected to five convolution operations.First, in order to improve computational efficiency, 1 × 1 convolution operation is performed to reduce dimensionality, then upsampling, and then fused with L36 (36 layers).After the final fusion, a 3 × 3 convolution kernel is used to convolve the fusion result.
e purpose is to eliminate the aliasing effect of upsampling, so a new feature map of 52 × 52 is obtained as the second-scale feature.e improved YOLOv3 network has the advantages of high resolution and large receptive field, which can effectively improve the reliability of small target detection.
Figure 6 is a schematic diagram of some samples of the training set.e trained classifier is tested.e test video is collected from the traffic law enforcement camera and included 96 hours.e experimental results are shown in Tables 2 and 3.
From Tables 2 and 3, it can be seen that the taxi face classifier has the best detection effect when the positive samples are normalized to 48 × 36 and the number of iterations is 20, with a precision rate of 94.8%.e taxi body classifier has the best detection effect when the positive sample is normalized to 72 × 4, and the number of iterations is 20, with a precision rate of 96.8%.

Detection of Illegal Acts Concealed by Taxi License Plates
On the basis of successfully inspecting the taxi, we continue to implement the behavior determination of illegal license plate occlusion based on the improved YOLOv3 network model.By analyzing the traffic monitoring video, it can be known that the ways that taxis conceal the license plate include full occlusion and partial occlusion and then the license plate occlusion determination algorithm is correspondingly designed.e flow chart is shown in Figure 3. Firstly, the license plate area is determined according to the positional relationship between the license plate area and the vehicle face area.Secondly, we can extract the blue pixels of the license plate area and generate the smallest bounding rectangle.Finally, the corresponding judgment method can be set according to different occlusion methods.e details are shown in Table 4.

Taxi Parking Violation Detection
As a traditional deep semantic segmentation network, ResNet can achieve effective information extraction from simple images.However, for actual traffic scenes, there are too many elements in the image, and it is difficult to ensure the accuracy of the results using the ResNet network model for semantic recognition.e PSPNet network integrates the pyramid pooling module with the ResNet network, which can aggregate image information from different regions, thereby improving the ability to obtain global information in an orderly manner.erefore, this article is based on the PSPNet network to test the illegal parking behavior of taxis.As shown in Figure 4, in order to extract the multiscale information of the image, the feature map of any size is converted into a fixed-length feature vector.e model combines four parallel pooling features of different scales.In order to extract global features, a 1 × 1 convolution is used after the pooling operation of each scale to reduce the channel of the corresponding level to 1/4 of the original.It is then restored to   Based on target detection and semantic segmentation, this study proposed a sidewalk parking detection method.

PSPNet Offline Detection.
e same image is sent to the two subnetworks of target detection and semantic segmentation.On one hand, the target detection network outputs a detection image that contains all the vehicles in the image.On the other hand, the semantic segmentation network outputs a semantic segmentation map to divide the sidewalk and the road   6 Scientific Programming area in the urban road.For each category in the target category detection frame, in order to highlight its position information, the lower half of it is first intercepted, and then, it is compared with the segmentation map.
After that, the overlap area A sidewalk of the lower part of the detection frame with the sidewalk and the overlap area A road with the road, respectively, and the overlap ratio according to the following formula can be calculated.If P > 1, it is judged as sidewalk parking [25].
e traditional cross entropy loss function will cause the network to tend to easy-to-learn samples due to the accumulation of simple pixels and a large number of types of pixels, and the large accumulation of classification loss errors.As a result, the network learns better and better for simple types and a large number of samples, but it learns worse and worse for complex types and a small number of samples. is creates a vicious circle of training, which is not in line with the original intention of model training.In order to solve this problem, we carry out weight control on the types of imbalances, and the hierarchical loss calculation is given as follows [12]: whereS i is the value of the Softmax function.μi is an adjustable parameter.i is the number of training sessions.Changing μ i controls the proportion of different types of pixel loss error in the total error.However, this approach only controls the balance of the proportions of unbalanced samples, and fundamentally, it is still impossible to distinguish difficult samples.In the middle and later stages of training, the gradient is still updated in the direction of sample types that are easier to learn.erefore, it cannot promote the neural network to learn from difficult samples.e larger the output probability value of the Softmax function, the greater the probability that it is a certain category.At this time, the network has a relatively high degree of credibility in determining the pixel, that is, the input at this time is a simple sample.Conversely, when the output probability of the Softmax function is small, it means that it is difficult for the network to distinguish the exact category of the input.at is, the input at this time is a difficult sample.erefore, the distinction between simple and difficult samples can be determined based on the output probability of the Softmax function.With reference to the Focal Loss function, we adopt the form of fusing multilevel loss and take the average and propose an inhibitory cross entropy loss function to solve the problem of sample imbalance in autonomous driving.e function is defined as follows [10].
wherec is an adjustable parameter.When a sample is easier to distinguish, S i is larger, and then, (1 − S i ) c is smaller.e product of the two is equivalent to suppressing the loss of the sample, and the proportion of the total loss error is also smaller.Relatively speaking, if the loss error value of the difficult sample is amplified to a certain extent, its proportion in the total loss error will increase.e model will also be more inclined to learn difficult samples.
It should be noted that the PSPNet semantic segmentation network cannot achieve real-time detection.erefore, this article will discuss how to perform real-time sidewalk parking violation detection.e network structure diagram of offline detection proposed in this study is shown in Figure 7.

Real-Time Detection.
We have found that for the same fixed camera, the pictures after semantic segmentation are very similar.erefore, only one semantic segmentation is required for the road background extracted by each camera.
e semantic segmentation map can be provided for all subsequent detections.At the same time, the real scenes in surveillance videos are often intricate and fickle.Background modelling of surveillance scenes is the basis for subsequent processing, such as target detection, segmentation, tracking, classification, and behavior understanding. is article first

Illegal occlusion of license plate Judgment method Total occlusion Proportion of blue pixels Single area partial occlusion Aspect ratio of blue area Multi area partial occlusion
Relative position of centroid in blue area uses the Gaussian mixture background model to extract a clean road background from a video.Semantic segmentation of the background image is more conducive to locating the location and space information of the vehicle.Gaussian mixture background modelling is a background representation method based on the statistical information of pixel samples.It is considered that the color information between pixels is not related to each other, and the processing of each pixel is independent of each other.For each pixel in the video image, the change of its value in the sequence image can be regarded as a random process of continuously generating pixel values, that is, Gaussian distribution is used to describe the color presentation law of each pixel (single peak, multiple peak).So, the mixed Gaussian background model can overcome the problems of image jitter, noise interference, light changes, and moving target movement and extract a clean background from the video stream.
In this study, a mixture of Gaussian background model training is performed on a video of about two minutes, and a relatively clean background image is extracted.Semantic segmentation is performed on the video screenshot and background image, respectively.As shown in Figure 8, the above image is a screenshot of a certain frame of the video and its semantic segmentation diagram.e following figure shows the background extracted by the Gaussian mixture model and its semantic segmentation.It filters all vehicles in motion, and the generated images are more suitable for semantic segmentation.Before using the convolutional neural network to perform semantic segmentation on the image, the mixed Gaussian background modelling is used to preprocess the image, which can achieve a better segmentation effect.
Figure 9 is a diagram of the real-time detection network structure of the Gaussian mixture background model.In the offline part, for a fixed camera, the first step is to input a piece of video into the Gaussian mixture background model to extract a clean background image.en, the background image is sent to PSPNet for semantic segmentation, and the semantic segmentation map is obtained.In the real-time part, YOLOv3 detects the vehicle in the surveillance video in real time and compares it with the semantic segmentation map to calculate the overlap area of the target and the region.Finally, the result of sidewalk parking detection is output.

Experiment and Analysis
In this section, to verify the superiority of the proposed method's recognition performance, based on the works of some authors [23,25,26] as a comparison method, a simulation experiment for the identification of taxi violation behaviors is realized under the same experimental scenarios and conditions.
Experimental evaluation indicators are receiver operating characteristic (ROC) curve and Equal error rate (EER).
e ROC curve takes the false-positive rate P fp as the horizontal axis, and the true-positive rate P tp is the image obtained on the vertical axis.It can intuitively reflect the relationship between the false-positive rate and the truepositive rate and then judge the pros and cons of the model.
where TPandFPrepresent the abnormal samples detected correctly and incorrectly, respectively.e true-negative TN and false-negative FN, respectively, represent the negative and normal samples detected correctly and incorrectly.e frame level standard is evaluated using EER, that is, the value when the error acceptance rate and the error rejection rate are equal.e lower the EER, the better the performance.e pixel-level features are evaluated by the detection rate (DR).In the pixel-level standard, abnormal frames are detected only when 40% of the abnormal behavior area is recognized.e higher the DR, the better the performance.

Configuration and Parameters.
e taxi violation detection experiments are all run in the same environment.e hardware environment configuration is as follows: CPU is Intel Core i7 1165G7 processor, graphics card is NVIDIA Geforce MX450, memory is 16 GB, and video memory is 12 GB.e experimental operating system is the Linux operating system (Ubuntu 16.04.4version), and the development environment is the Pytorch 1.0 deep learning framework.
e experimental data set comes from the actual images collected by the ITS system in a certain city.e experiment assigns the sample data set randomly.It is divided into violation training data set and experimental data set.e training data set accounts for 80%, and the experimental data set accounts for 20%.In the proposed method, the YOLOv3 network model uses the Darknet-53 network.e experiment uses the weight coefficients trained on ImageNet as the pretraining weights.Also, on the basis of it, transfer learning is realized with the taxi vehicle sample data set.Fine-tuning the network parameters to ensure that the loss function can achieve effective convergence.
Limited by the hardware memory capacity, this article sets the batch size to 20, a total of 120 epochs for training, and all training optimizers use Adam optimizers.e learning rate decline curve adopts a fixed long decline curve.
e training process adopts freezing training, which means that in a certain training generation, the parameters of the backbone feature extraction network are not updated, and only the parameters of the prediction network are updated.After thawing, the next step is to update the parameters of the backbone feature extraction network and the prediction network.

Taxi Detection.
is study first realized the comparison of the taxi detection test based on previous studies [23,25,26].e results are shown in Table 5.
As shown in Table 5, the method proposed in this article can effectively identify and determine taxis.e accuracy rate of vehicle face recognition is 94.8%, which is 2.3% higher than that reported by Abbas [25], which is similar to the accuracy reported by Liu et al. [26].For vehicle body recognition, the accuracy of all recognition methods is higher than that of the vehicle face.
e accuracy of the vehicle body of the method proposed in this article is 96.8%, which is 1.9% higher than repoted by Abbas [25], and is also similar to that reported by Liu et al. [26]. is also shows that the addition of the SSP module can effectively support the YOLOv3 network to achieve accurate taxi identification.

Taxi License Plate Occlusion Detection.
In order to verify the efficiency of the proposed method for the detection of illegal taxi obscuration license plates, this study conducted simulation experiments based on the video collected from actual traffic scenes.e experimental data set contains 700 experimental sequences; of which, a total of 200 data images of illegal occlusion of license plates is included.e results of license plate violation detection under different recognition methods are shown in Table 6.
As shown in Table 6, the method proposed in this study can achieve more accurate license plate occlusion detection, and the accuracy rate of license plate occlusion recognition in actual scenes is 95.1%.Compared with the report of Liu et al. [26], the accuracy is improved by 1.1%, which further proves that the method proposed in this study is superior to the existing vehicle state recognition methods.
At the same time, we also conduct ROC curve analysis on random areas in the license plate violation recognition experiment.e ROC results of different methods are shown in Figure 10.
It can be seen from Figure 10 that the method proposed in this study has a significantly higher true-positive rate than the comparison method in the ROC curve graph, and the convergence speed is faster than the comparison method, showing excellent detection performance.At the same time, the results of EER and DR of the vehicle license plate detection experiment are shown in Table 7.
It can be seen from Table 7 that compared with other methods, the detection performance of the method proposed in this study has been significantly improved.Compared with rate determined by Zhang et al. [23], the detection rate DR of the proposed method increases by 3.5%.is shows that the method in this study is better in the detection of license plate occlusion and can achieve the best effect in the scene of detecting smaller targets.At the same time, because Scientific Programming some false alarms will be generated in the methods proposed in earlier sudies [25,26], the obtained equal error rate EER value is relatively higher.e proposed method uses an improved YOLOv3 network to generate candidate regions, and each candidate region contains the entire object.In addition, the addition of the SSP module further improves the accuracy of positioning and eliminates invalid feature information.erefore, the obtained EER value is the lowest.8.
As shown in Table 8, the PSPNet detection method proposed in this study can guarantee an accuracy of more than 96% for the detection of taxi parking violations.Compared with other methods, it can ensure that accurate violation judgments are maintained under actual road conditions.Figure 11 shows the situation of vehicle parking violation detection under different methods.It can be seen from the figure that the convergence speed reported by Zhang et al. [23] is faster than the method proposed in this study.But for the test accuracy, the method in this study is closer to 1 when the true-positive rate converges.erefore, it is proved that the detection performance of the method proposed in this study is more outstanding.
At the same time, Table 9 shows the analysis results of the EER and DR indicators of the vehicle parking violation experiment.It can be seen from the table that the detection rate DR of the PSPNet network-based parking violation detection method proposed in this study is 88.2%, which is 17.3% higher than the DR reported by Liu et al. [26].At the same time, EER is 14.2%, which is also about 20.1% lower than that reported by Liu et al. [26].e reason is that the PSPNet network model uses pyramid pooling to fuse global and local features in the feature extraction of road condition images.To a certain extent, this makes up for the shortcomings of traditional pooling operations that can only capture fixed window feature information and can further improve its segmentation accuracy.

Ablation Study.
In order to clarify the impact of various network components on the performance of taxi illegal parking detection, ablation experiments are carried out based on ITS image data set, and more complex task of taxi illegal parking detection are considered.
e proposed method includes the following important parts: feature extraction module, YOLOv3 module, SPP module, PSPNet module, and classification module.Because the classification module is necessary for the framework of this study, the classification module is retained.YOLOv3 module is used to detect taxis.Because the research object of this study is taxis, this module must be retained.en, we delete the feature extraction module, SPP module and PSPNet module, respectively, and then conduct the ablation experiment.e results are shown in Table 10.It can be seen that when one of the modules is deleted or replaced, the detection accuracy decreases to a certain extent compared with the complete framework.Especially, without feature extraction module, the recognition rate decreases the most, to only 52.7%.Generally, the initial features obtained are too rough and directly entering the subsequent processing will seriously affect the subsequent results.erefore, the feature extraction module is necessary in the network.Deleting the PSPNet module, we use the mixed Gaussian background model to extract the clean road background of a video and then segment the background image semantically.is can achieve the effect of real-time detection, but the detection accuracy is greatly reduced.Deleting the SPP module will reduce the taxi detection accuracy, so as to reduce the accuracy of illegal parking detection.erefore, each module plays an irreplaceable role in promoting the final result.

Effects of Different Loss Functions and Training times.
Different loss functions and training times will have a direct impact on the whole experiment.erefore, this section will discuss the relationship between two different loss functions and training times.In the detection of taxi illegal parking, PSPNet uses the ordinary cross entropy loss function (equation ( 2)), and the suppression cross entropy loss function (equation ( 3)) proposed in this study aim to carry out two groups of comparative experiments.e results are shown in Figure 12.In Figure 12, CE Loss represents ordinary cross entropy loss, ICE Loss represents inhibitory cross entropy loss, and one epoch represents a complete learning process.It can be seen that ICE Loss converges earlier than CE Loss.e inhibitory cross entropy loss error remains stable after 50 epochs, whereas the  [23] 93.5 Abbas [25] 92.7 Liu et al. [26] 94.0 10 Scientific Programming ordinary cross entropy loss begins to remain stable after more than 80 epochs, which proves that the inhibitory cross entropy loss function can accelerate the training speed of the model and ensure the rapid convergence of the model.At the same time, because each pixel loss error in the consistent cross entropy loss will be multiplied by a factor less than 1, the inhibitory cross entropy loss error is always less than the ordinary cross entropy loss error.The proposed method Reference [23] Reference [25] Reference [26] Figure 10: Frame level ROC curve of license plate violation under different methods.[23] 92.8 Abbas [25] 92.1 Liu t al. [26] 93.5 The proposed method Reference [23] Reference [25] Reference [26] Figure 11: Frame level ROC curve of vehicle illegal stop under different methods.

Operation Efficiency
Analysis.In order to analyze the time efficiency of the proposed method, the training time, testing time, and video frame rate are considered.As can be seen from Table 11, the training time of the proposed method is 5367 ms, ranking second, and less training parameters are required.e test time was 1720 ms, ranking third.
e video frame rate is 21 fps.Also, there are two options of illegal stop detection, offline detection, and realtime detection.erefore, compared with other methods, the operation efficiency of the proposed method is acceptable.

Conclusion and Outlook
Based on the improved YOLOv3 network and PSPNet network, this study proposed a new method for detecting taxi violations.e proposed method can detect two different taxi violations at the same time, including license plate occlusion and illegal parking, and the method can be easily extended to other types of vehicles.Adding SPP module in YOLOv3 can avoid image distortion to a certain extent, solve the problem of repeated extraction of vehicle image features, and effectively distinguish whether the license plate is blocked.In addition, another novelty of this study was to propose a method to give PSPNet network for illegal parking.By aggregating the image information of different regions, we can improve the acquisition ability of global information.To achieve the real-time detection effect, we first use the mixed Gaussian background model to extract the clean road background of a video and then semantically segment the background image Because it is helpful to locate the position space information of the vehicle.e experimental analysis on the ITS collected data set shows that the proposed method has excellent network performance for license plate occlusion and vehicle parking violation behavior.e recognition accuracy of license plate occlusion is 95.1%, and the detection accuracy of taxi illegal parking behavior is more than 96%.However, the actual application scenario will be affected by natural weather conditions (such as fog and rain), which may cause deviations in detection and analysis.erefore, the next step will be oriented to more complex actual scenes to achieve accurate and efficient road vehicle detection and analysis.
Figure 4 shows the overall structure of the PSPNet network model.e PSPNet network model uses the convolutional neural network (CNN) model to extract the features of the input image and sends the feature map to the pyramid pooling model.

Figure 1 :Figure 2 :Figure 3 :
Figure 1: e overall flow chart of the proposed method.

Figure 7 :
Figure 7: Network structure of illegal parking detection.

7. 4 .
Taxi Parking Violation Detection.In addition, we also have carried out corresponding taxi parking violation detection based on the actual ITS image data set.e parking violation data set contains 850 experimental sequences, of which about 250 violation behavior image data.e results of the illegal stop detection experiment are shown in Table

Figure 12 :
Figure 12: Relationship between training times and loss function.

Table 1 :
Technical differences in some relevant literature.

Table 2 :
Precision rate of taxi face.

Table 3 :
Precision rate of taxi body.

Table 4 :
Illegal occlusion mode and judgment mode of license plate.

Table 5 :
Taxi detection results under different methods.

Table 6 :
License plate violation detection results under different methods.

Table 7 :
Comparison of license plate violation detection performance under different methods.

Table 8 :
Detection results of taxi illegal parking under different methods.

Table 9 :
Comparison of vehicle stopping detection performance under different methods.

Table 10 :
e ablation experimental results.

Table 11 :
Time complexity analysis of different methods.