An End-to-End Deep Learning Approach for Plate Recognition in Intelligent Transportation Systems

Accurate and fast recognition of license plates is one of the most important challenges in the ﬁ eld of license plate recognition systems. Due to the high frame rate of surveillance cameras, old license plate recognition systems cannot be used in real-time applications. On the other hand, the presence of natural and arti ﬁ cial noise and di ﬀ erent light and weather conditions make the detection and recognition process of these systems challenging. In this paper, an end-to-end method for e ﬃ ciently detecting and recognizing plates is presented. In the proposed method, vehicles are ﬁ rst detected using a single-shot detector-(SSD-) based deep learning model in the video frames and the input images. This will increase the speed and accuracy in identifying the location of the plate in the given images. Then, the location of the plate is identi ﬁ ed using the proposed architecture based on convolutional networks. Finally, using a deep convolutional network and long short-term memory (LSTM), the characters related to the plate are recognized. An advantage of our method is that the proposed deep network is trained using di ﬀ erent images with di ﬀ erent qualities that leads to high performance in detecting and recognizing plates. Also, considering that in the proposed method the vehicles are ﬁ rst detected and then the plate is detected in the vehicle image, there is no limit in the number of identi ﬁ ed plates. Moreover, plate detection in the vehicle rectangle, instead of the whole frame, speeds up our method. The proposed method is evaluated using several databases. The ﬁ rst part of the evaluation focuses on robustness and recognition speed. The proposed method has the accuracy of 100% for vehicle detection, 100% for plate detection, and 99.37% for character recognition. In the second part of evaluation, the proposed method is evaluated in terms of overall speed. The experimental results witness that the proposed method is capable of processing 30 frames per second without losing any data and also outperforms several methods proposed in recent years, in terms of time and accuracy.


Introduction
Due to the increasing number of vehicles, manually controlling and monitoring traffic is time-consuming, costly, inaccurate, and sometimes impossible. This makes automatic vehicle plate recognition a recurrent research topic. Since the plate is the unique ID of vehicles, several prominent applications are found for automatic plate recognition, including traffic control, driving offence detection, vehicle speed estimation, self-driving vehicles, and surveillance [1]. To this end, many cameras are installed in cities, roads, highways, borders, parking lots, and protected areas for better and more accurate control of vehicles. These cameras are constantly monitoring the images of passing vehicles. Vehicles and their plates cannot be detected and recognized without processing and analyzing these images.
There is a need for a system based on image processing and machine learning to detect vehicles and extract other information such as plate number [2]. In vehicle plate detection and recognition systems, the quality of the input images has a direct impact on the result accuracy and processing speed. Many parameters are influential in the quality of the captured images including environmental and weather conditions such as light projection angle, light intensity, rainfall, fog, dust, humidity, dark, glare, occluded, rainy, snowy, tilted, or blurred scenarios.
There are other concerns in automatic plate recognition systems, including different plate alignments on vehicles, size, plate aspect ratio, various shapes of plates, various angles of camera placement, too low or too high lighting, presence of several plates in the image, various plate background colors, excessive plate dirtiness, and various arrangements of letters and numbers in plates [3,4]. As shown in Figure 1, a vehicle plate detection and recognition system consists of the three main steps: (i) plate detection, (ii) segmentation of characters in the plate, and (iii) character recognition [5].
The most important and challenging step is plate detection. If plates are not detected properly, the subsequent steps will not work as expected and the final result will be entirely wrong.
There have been many attempts to efficiently detect plates. In order to make this step more efficient, a phase of preprocessing is performed to denoise and improve image quality. The location of the plates is then identified in the input image. According to the previous studies on plate detection, existing approaches fall in five categories: edgebased methods, color-based methods, texture analysis methods, methods based on image global features, and hybrid methods. Different methods of image processing and machine vision have been used in each of these categories [6][7][8][9][10][11]. In the next step, plate segmentation is performed. In this step, the detected plate image is first converted to a binary image. Then, based on methods such as morphology [12], connected component analysis algorithm [13], and histogram-based methods, the parts related to characters are separated. The problem with this step is that most binarization methods have acceptable results only for clean plates. Using these methods on dirty plates leads to huge data loss. More precisely, some character parts of the plate may not be recognized as characters or some noncharacter parts may be recognized as characters. As can be seen in Figure 2, some noncharacter parts are considered as characters.
The next step in plate recognition systems is character recognition. This step is divided into two phases. In the first phase, different features are extracted from the segmented parts of the previous step. Then, the training of machine learning models is done based on the extracted features. In the feature extraction phase, various methods are proposed including active regions [14], HOG [15], horizontal and vertical mapping [16], and multiclass AdaBoost methods [17]. Some algorithms use key points locating methods such as SIFT [18] and SURF [19]. After creating the feature vector, classification is done for each section. In various works, artificial neural networks, support vector machine, Bayesian classifier, K nearest neighbor, etc. have been used for character recognition [20,21]. Extracted features can be numerous and not all of them are useful for classification purposes. In other words, some features are redundant and irrelevant. The existence of large number of features makes the machine learning models prone to overfitting. To address this problem, feature selection and dimension reduction are performed in some works.
Nowadays, due to the massive production of labeled data in different phases and increasing computing power, the use of deep learning methods has received considerable attention. Some works propose end-to-end methods for plate detection and recognition. In these works, only one or two of these steps are performed using deep learning methods due to lack of required data to train the deep learning model.
Gou et al. [22] proposed a method based on characterspecific extremal regions and hybrid discriminative restricted Boltzmann machines (HDRBMs). Vertical edge detection, morphological operators, and different evaluations have been used for top-hat transfer plate detection. Specific character regions are identified as candidate regions for plate characters. Then, a trained model of HDRBM is used for character recognition. Their proposed method is resistant to changes in light intensity and weather conditions.
Wang et al. [23] proposed a multifunctional convolutional neural network (CNN) to detect and recognize Chinese vehicle plates with better accuracy and lower computational cost. In their proposed method, the detection network consists of three layers (P-Net, R-Net, and O-Net) and also a fully connected layer. The output of this part is the input of the detection network. In this part, the features are normalized and then the plate characters are recognized using a CNN. Wen-bin et al. [24] used an improved convolutional recurrent neural network to recognize Chinese plates. In this method, deep CNNs, recurrent neural networks (RNNs), spatial transformer networks, and connectionist temporal classification model are combined. In this method, there is no need for plate segmentation that causes erroneous plate detection.
Li et al. [25] proposed a plate recognition method that is a combination based on deep neural networks. In this method, recurrent neural networks with LSTM were used for plate feature extraction. The extracted sequence features in this part are given to a 37-class CNN for character detection. The advantage of using this method is that there is no need for plate segmentation.
The differences between the character features in terms of width, height, distance, and presence of noise such as heavy shadows, uneven light, different optical geometries, and poor contrast are challenging in plate recognition systems. Bulan et al. [26] used a two-stage classification method plate detection. For this purpose, the plate candidate points are first classified using a weak classifier, and then, the plate main points are detected using a strong classifier based on deep convolutional networks. After that, using the ALexnet architecture, features are extracted from the regions extracted from the previous step, and using support vector machine classification, the character detection is done based on the features obtained from the previous step.
In [27], an end-to-end method is proposed to detect and recognize vehicle plates. In this method, features are first extracted from the input image using CNNs. Plate regions are then identified based on the deep recurrent network.

Wireless Communications and Mobile Computing
After that, the plate characters are recognized using convolutional and recurrent networks. Performing these steps makes the model to perform with acceptable speed. In [28], fully convolutional networks are combined with a broad learning system. A fully convolutional network, which has been designed as a two-stage classification method at the pixel level, is used to detect objects by combining multiscale and hierarchical features. The AdaBoost cascade classifier was used to classify characters, and an extensive learning system was used to recognize characters.
Chen et al. [29] proposed a method based on CNNs for plate detection. In this method, an end-to-end network is proposed that simultaneously detects vehicles and plates. Different convolutional layers have been used in this method. Gao et al. [30] proposed an end-to-end network for plate detection and recognition based on encoder and decoder. The efficiency of traditional plate detection methods is affected by several factors including light intensity, shadow, and complex background. Using deep learning methods, plate recognition algorithms can extract deep features and improve the plate detection and recognition rate [31].
An efficient shared adversarial training network has been proposed for plate detection in [32]. This model can learn environment independent semantic features without perspective from real plates using prior knowledge of standard plates.
Authors in [33][34][35] considered using YOLO3 architecture for plate detection and recognition. In [33], YOLO3 is used for Brazilian plate detection and a three-layer convolutional network is used for character recognition. Authors in [34] used a two-stage convolutional layer in YOLO3 architecture in order to extract temporal features from plates and to reduce false positive detection rate. Authors in [35] used seven convolutional layers for plate detection and character recognition.
In the current paper, due to the massive production of labeled data in different parts, an end-to-end method for plate detection and recognition is proposed. Vehicles are first detected using the SSD-based deep learning model in the video frames and the input images. This increases the speed and accuracy in identifying the location of the plate in the given image. Vehicle detection in the proposed method makes it suitable for smart transportation systems. In this mode, traffic volume measurement can be done based on vehicle detection. Moreover, vehicle movement direction and speed and also stopped vehicles in highways can be detected. In the next step, the location of the plate is identified using the proposed architecture based on convolutional networks. Finally, the characters related to the plate are detected and recognized using the deep convolutional network and LSTM.
An advantage of our approach is that the proposed deep network is trained with different images with different qualities that leads to high performance in detecting and recognizing plates. Apart from that, the vehicles are first detected and then the plates are detected in the vehicle images, instead of the whole image that speeds up our method. Moreover, there is no limit in the number of identified plates.
In real-time systems, analysis is performed on video frames captured by monitoring cameras. These cameras capture between 30 and 90 frames per second. Conventional systems can process 1 to 2 frames per second, while the proposed method is capable of processing 30 frames per second without losing any data. The rest of this paper is organized as follows. Section 2 presents the proposed method and describes its architecture in detail. Section 3 describes the experimental evaluation and compares the proposed method to other methods. Finally, Section 4 concludes the paper.

Proposed Method
Most of plate recognition methods fail to detect all the plates when there are many of them in the frame being processed. This makes these methods inapplicable for real-time situations when there can be many plates in every frame and the response time is limited. To address this issue, an SSD architecture is used in the proposed approach to make the processing speed fast enough for real-time plate recognition. In the proposed method, first vehicles are detected. Then, based on the deep learning architecture, in two phases, plates  are first detected and then the text of the plates are extracted using a deep network with convolutional layers and recurrent layers. The proposed method is an end-to-end method that detects vehicles and plates simultaneously and recognizes the plates. The following sections explain different steps of this method.

Vehicle Detection.
In order to achieve a better detection rate in real-time applications, it is better to start the process with vehicle detection. Conventional systems based on deep learning look for plate in the whole frame. Since plates compose a very small part of a frame, this process makes the existing approach slow. In the proposed method, vehicles are first detected in order to avoid processing of the parts of the frame that are unrelated to vehicles. This step eliminates a great deal of unnecessary processing effort and therefore speeds up the overall process. Vehicle detection is done using a deep network composed of several convolutional layers.
Vehicle detection is done based on the high-level features that are extracted by the designed convolutional network. In fact, the proposed method combines low-level featured in primary convolutional layer and high-level features extracted by convolutional and recurrent network to detect vehicles. Moreover, several layers are used for various feature extractions from input images. Since the kernels in various layers vary in size, features are extracted with different details. This helps to have images with different details and to extract relevant and discriminant features. Figure 3 depicts the proposed architecture for vehicle detection.
In [36], we proposed a practical method that produces data for various phases of Iranian plate recognition system. Using deep learning methods requires sufficient data for training models. Due to the lack of data for Iranian plate recognition, the current paper uses the proposed approach in [36].
In the proposed architecture, a model based on VGG16 is used for vehicle detection, that is followed by convolutional layers. These layers decrease in size as we move forward such that we are able to detect vehicles with different sizes. Each layer extracts a different high-level feature from vehicle for vehicle detection.
In the proposed model, training objective is derived from the multibox objective [37,38] but is extended to handle multiple object categories. Let x p i,j = f0, 1g be an indicator for matching the i-th default box to the j-th ground truth box of category p. In the matching strategy above, we can (1) presents the overall objective loss function is a weighted sum of the localization loss (loc) and the confidence loss, where N is the number of matched default boxes.
If N = 0, the loss is set to 0. The localization loss is a smooth L 1 loss [33] between the predicted box (l) and the ground truth box (g) parameters. The weight term α is set to 1 by cross validation.

Plate Detection.
Most of plate images are noisy due to environmental conditions. Using Gaussian filter is a common way to address this issue. Gaussian filter is used as a preprocessing step in image processing and is a linear filter that smooths and removes noise. The first activity in this step is to improve quality of the vehicle image using Gaussian filter. Then, a deep convolutional network-based method is proposed in order to detect the plate location. Figure 4 shows the proposed architecture for plate detection. The input resolutions in deep network are 320 × 320, 416 × 416, and other multiples of 32. Therefore, the input vehicle image is resized to a 1 : 1 square, in the proposed method. Deep network preserves the aspect ratio of the images, and if the input image is not in a square shape, it automatically adds black bars to conform to a square shape.
In the proposed method, 416 × 416 resolution is used for plate detection. Plate detection works the best using this value, which is obtained from several experiments on the size of Iranian vehicles. Deep architecture of proposed method performs segmentation of size M × M, where M is the size of window that can have different values. Two parameters should be given in prior for plate detection process: N that is the number of bounding boxes and S that is the confidence score for each bounding box and determines whether there is an object inside the box or the probability of its existence. We should calculate evaluation metric such as the Intersection over Union (IoU) to compare predicted bounding boxes and the dimensions and location of ground truth, to measure the S parameter. The IoU is computed as shown in Equation (2)   Wireless Communications and Mobile Computing a similar location and dimension, to a large extent.
For the sake of better performance, other parameters such as confidence threshold and anchors are constant based on the default setting. There are special features that are extracted by the deep architecture of the proposed method. These features include maximum and minimum length and height, maximum and minimum number of pixels, and area. Spall plates with unusual resolution are ignored in the proposed architecture. Classification is done in the late stage of process. Those

Character Recognition. Character detection and recognition is one of the most important steps in plate recognition.
In this paper, a method based on convolutional and recurrent networks is used for plate detection. LSTM architecture is very efficient here due to the fact that characters in Iranian plates follow a specific pattern. This method is fast enough for real-time applications since plate images enter to the network, and in the end, characters are recognized.
Unlike classic methods, this method does not require separate stages of segmentation, feature extraction, and classification. Figure 5 depicts the proposed deep architecture for plate detection that contains convolutional and recurrent layers. The network configuration used in our experiments is summarized in Table 1. The main advantage of the proposed architecture is that it only requires plate images and their label sequences and there is no need for plate segmentation. Moreover, various features must be extracted from plate for classifying characters to 37 possible classes. Convolutional networks help to extract these discriminant features. Features extracted from plates using CNN enter LSTM networks. Using LTSM units in RNN is very useful since characters and digits have a special order in Iranian plates. In this method, features are extracted from plate image based on shape and sequence using convolutional layers. These features are used as the input of recurrent layers in the end part of network. These layers consider all the sequence record. LSTM units are used that contain memory cells and gates instead of RNN units, in order to avoid gradient vanishing. In this stage, characters are classified in 37 classes by the network. These classes include 10 digits (0 to 9) and 24 characters that are shown in Table 2. There is also a special character for disabled drivers. In this kind of plates, an image of a wheelchair replaces a Persian character. Moreover, there are two English characters, S and D, that are used for special purposes. It is worthy to mention that scarcity of plates with special characters prevents training networks with these characters. To address this issue, we used deep networks to produce plates with these special characters. Samples of these plates are shown in Figure 6. This paper also proposes a method based on generative adversarial networks (GAN) to produce plate images with different qualities. Existing data is used to train the network in the proposed method; then, this trained network is used to produce various plate images.

Experimental Results
Since the proposed method is based on deep learning, there is a need for a large volume of data that are labeled in various phases. To this end, data are gathered from many cameras in streets and highways during several days and nights under various illumination and weather conditions. There are over four million frames, and each of which contains several, one, or no vehicle.
Classic methods are used for labeling data in vehicle detection, plate detection, and plate recognition process. The proposed model is trained using these frames that are labeled for vehicle detection. Three million frames are used for plate detection and recognition. Images have various illu-mination, shadow, reflection, weather conditions, and qualities in order to better train the model. It is worthy to mention that all hyperparameters are chosen based on numerous random search tests. First, a set of hyperparameters was chosen and the model was trained using the training data. The set of hyperparameter was then evaluated based on the testing data. This process was repeated, and the hyperparameters with highest accuracy were chosen. All hyperparameters keep their values in different datasets.
In this step, the neural network presented in Section 2.1 is used in order to address the problem of scarcity of plates with special character; some of which are shown in Figure 6. Plates containing these characters are rare and belong to few organizations. Figure 7 shows several plate samples in our database. It should be noted that background color of Iranian plates can be white, red, blue, yellow, or green. The advantages of deep learning-based methods over other methods such as color-based methods are that they are not dependent on color, have lower fault rate, and can be used under different illumination situation such as high, low, and uneven illumination. In the proposed method, vehicles are first detected using deep networks to expedite the plate detection process.
The output of the proposed method in the vehicle detection phase with unfixed shooting angles and different qualities in real scenes is shown in Figure 8. As it can be seen, the proposed method is capable of detecting vehicles under various illumination conditions and from front, back, and different angles. The proposed method is able to detect vehicles with high accuracy owing to extraction of highlevel features for vehicle detection process. The advantages of this method is that there is no limitation in the number Table 1: Network configuration summary: "k", "s", and "p" stand for kernel size, stride, and padding size, respectively. Wireless Communications and Mobile Computing of vehicles and the vehicles can even occlude in the image. The proposed model is trained using images with different qualities, shooting angles, and illumination conditions in order to improve detection rate. Training images also include images taken during day and night and different weather conditions. This phase of the proposed method is    7 Wireless Communications and Mobile Computing based on labeled data, and the detection rate is 100%. This detection rate guarantees that no vehicle is missed by the proposed method.
The next step is plate detection where the detected vehicle image from the previous step is searched instead of the whole image. This considerably speeds up the plate detection process. Training the proposed deep network is done using three million images with labeled plates. Image processing methods are used for labeling images and detecting plates. There are images with different qualities and shooting angles in the training set in order to improve the detection rate in the proposed method. Images with different illumination condition are also present in the training set. In the proposed method, different plate features including maximum and minimum length and width and plate area are extracted based on the training dataset. Figure 9 depicts output of the proposed method for six images.

Wireless Communications and Mobile Computing
As it can be seen, the proposed method is able to detect plates in images with different shooting angles (including front and back), distances, and illumination conditions. Detection rate in this step is 100%. In the next step, the characters of the detected plates should be recognized. To this end, the convolutional and recurrent network mentioned in Section 2.3 needs to be trained using different plates. Then, the proposed model is evaluated based on the test data.
Since the proposed model is trained using different datasets, different features from plates are extracted in various layers. These features include edges, corners, and structures and are extracted based on CNN layers. Moreover, in LSTM layers, sequence features are extracted. This architecture helps to successfully extract discriminant features for character recognition.
The experiments on the test data show that the detection rate of all characters in the proposed method is 99.37%. Moreover, character detection rate is 99.92% that is calculated based on each character and not the whole plate. Choosing a proper optimizer function is a determining factor. The proposed model is tested with several optimizers.
The validation results using different optimizers are shown in Figure 10. As it can be seen, Adma-based optimizer functions outperform other functions and have a recognition rate of more than 99%. Ftrl optimizer does not  properly fit in the proposed method and shows plate recognition rate of 0% and character recognition rate of 37%. Therefore, Adadelta optimizer function that is based on Adam is used in the proposed method. In order to examine the convergence of the proposed method, loss diagram of the proposed method is shown with two optimizer functions of SGD and Adagrad in Figures 11 and 12, respectively. As it can be seen, loss rates gradually converge to zero on the training and testing dataset. It shows that the model parameters are well learned based on the input data and lead to loss rate reduction during repetitive process of learning. In Figures 11 and 12, the horizontal axis presents epoch in the evaluation steps in the training and validation sets, and the vertical axis presents amount of loss in these sets.

Comparison with Other
Methods. Several datasets are used in order to compare the proposed method with other methods in terms of efficiency. Table 3 shows this comparison using several datasets containing images of Iranian vehicles in terms of efficiency in different steps. According to the results, the proposed method performs well for dirty and low-quality images. In this part, used datasets contain images of already cropped vehicles in order to evaluate the proposed method in the plate detection step. Plate detection rate and character recognition rate are 100% in the proposed method. The images in the used dataset are captured by handheld cameras and have good quality. Table 4 shows the result of experiments with four criteria to evaluate the proposed method and compare it with other methods. These criteria are plate characters, number of plates, image size, and processing time. Plate detection and recognition process are considered in this set of experiments. The advantage of the proposed method over other methods, that do not use deep learning, is that it is segmentation-free. This improves recognition rate and also speeds up the plate detection and recognition process.
As it can be seen in Table 4, the proposed method shows a great performance in vehicle and plate detection, and its accuracy is 100%. It should be noted that the overall accuracy (Overall Acc) is the amount of accuracy when plate detection and character recognition are done simultaneously.

Real-Time Evaluation.
This set of experiments focuses on applicability of the proposed method in real-time situations. Here, the input is a video with 30 frames per second. Plate detection is a time-consuming step in plate recognition process and it gets more complicated as the number of plates increases in the input image. In the proposed method, vehicles are first detected to speed up the plate detection process. In this way, exploration is done only in the detected vehicle image instead of the whole image. The character recognition process becomes slower as the number of vehicles in the image increases; therefore, inputs with different number of vehicles are used in order to demonstrate applicability of the proposed method. Table 5 shows average processing times for a 10-time execution of the proposed method in each step for different scenarios. The total processing time increases as the number of vehicles increases. As it can be seen, the proposed method is able to be used in real-time situation even with many vehicles in a single image.

Conclusion
Monitoring systems are necessary in highways and roads, especially due to the increasing number of vehicles and increasing traffic volume. Plates are used to recognize vehicles in videos and images captured by monitoring cameras, since plates are the unique identification of registered vehicles. In this paper, vehicle plate recognition contains three steps: vehicle detection, plate detection, and character recognition. The proposed method in this paper detects incoming vehicles in each frame that are inputs for the next step (plate detection and extraction). CNN is used for vehicles and plate detection, and deep model with CNN and RNN layer is used for character recognition.
The proposed method is evaluated using the frames of an input video. The experimental evaluations show that the proposed method is capable of performing all three steps of vehicle detection, plate detection, and character recognition efficiently with negligible error rates. Moreover, the proposed method is shown to be viable for real-time applications based on its response time. The proposed method in this paper can substantially improve plate recognition accuracy based on the obtained results.

Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.