Automatic Traffic State Recognition Based on Video Features Extracted by an Autoencoder

Video surveillance has become an important measure of urban traﬃc monitoring and control. However, due to the complex and diverse video scenes, traﬃc data extraction from original videos is a sophisticated and diﬃcult task, and corresponding algorithms are of high complexity and calculation cost. To reduce the algorithm complexity and subsequent computation cost, this study proposed an autoencoder model which eﬀectively reduces the video dimension by optimizing structural parameters; thus several traﬃc recognition models can conduct image processing work based on dimension-reduced videos. Firstly, an optimal autoencoder model A ∗ with ﬁve hidden layers was constructed. Then, it was combined with a linear classiﬁer, support vector machine, deep neural network, DNN linear classiﬁcation method, and the k-means clustering method; thus, ﬁve traﬃc state recognition models were constructed: A ∗ -Linear, A ∗ -SVM, A ∗ -DNN, A ∗ -DNN_Linear, and A ∗ -k-means. Train and test results show that the accuracy rate and recall rate of A ∗ -linear, A ∗ -SVM, A ∗ -DNN, and A ∗ -DNN_Linear were 94.5%–97.1%, and the F 1 score was 94.4%–97.1%; besides, the accuracy rate, recall rate, and F 1 score of A ∗ -k-means were all approximately 95%, which suggests that the combination of the autoencoder A ∗ and common classiﬁcation or clustering methods achieve good recognition performance. Comparison was also implemented among the ﬁve models proposed above and four CNN-based models such as AlexNet, LeNet, GoogLeNet, and VGG16, which shows that the ﬁve proposed modes achieve F 1 scores of 94.4%–97.1%, while the four CNN-based models achieve F 1 scores of 16.7%–94%, indicating that the proposed light weight design methods outperform more complex CNN-based models in traﬃc state recognition.


Introduction
With the rapid development of urbanization and the sustainable development of a national economy, traffic congestion has increasingly become a key issue in large and medium-sized cities. To effectively control traffic operations and alleviate traffic congestion, it is very important to achieve traffic state recognition in a timely and effective manner. Currently, extensive studies have been conducted on traffic state recognition using a variety of methods, such as fuzzy logic in [1], the fuzzy C-means clustering algorithm in [2,3], machine learning in [4,5], and the K-means clustering algorithm in [6]. Data used in these studies were mainly from fixed surveillance systems such as cameras and vehicle detectors, as well as vehicle GPS in [7,8], etc. With the enrichment of urban traffic surveillance video resources and the progress of image processing technology, traffic state recognition based on video images has gained wide attention.
A large number of studies have been conducted on traffic state recognition based on video images. Traditionally, traffic state recognition was achieved by establishing certain algorithms to extract the traffic image characteristics in [9,10]. Since these algorithms had high complexity, they were not able to cope with the rapid increase in mass traffic surveillance videos. To avoid the extraction, analysis, and modeling of complex image features, deep learning methods have recently been used to achieve rapid and effective image pattern recognition using large-scale image training. e convolutional neural network (CNN) is one of the most common deep learning methods and has achieved great success in image classification and object detection tasks in [11,12]. However, to the best of our knowledge, a few studies have used the CNN in traffic state recognition based on video images.
Video images often have high dimensionality and contain complex and redundant information. To recognize the traffic states timely and effectively, attempts have been conducted to reduce the dimensionality of video images, thereby performing image pattern recognition based on the dimension-reduced data in [13]. Encoders, which are nonlinear and unsupervised neural network models that include an input layer, a hidden layer, and an output layer, can effectively reduce the dimensionality of video image data and realize image classification in [14].
Traffic data extraction from original videos is a sophisticated and difficult task because of the complex and diverse video scenes, and corresponding algorithms are of high complexity and calculation cost. In this study, to reduce the algorithm complexity and subsequent computation cost, an autoencoder model was proposed which effectively reduces the video dimension by optimizing structural parameters.
en, a traffic state recognition method was established by integrating the autoencoder with common classification and clustering methods.
e train and test result shows that the proposed method has the advantages of a lightweight model structure and low calculation cost, which outperforms CNN models such as AlexNet and GoogLeNet. erefore, it is suitable for traffic state recognition from videos.

Background
is section explains the relevant background of the videobased traffic state recognition, and related work are reviewed, including traditional traffic video processing, convolutional neural networks, and autoencoders. e research results of video-based traffic state recognition are very fruitful. Among them, the earlier research mainly achieved traffic state recognition by establishing certain algorithms to extract the traffic image characteristics. By extracting a discrete cosine transform and its movement characteristics based on traffic surveillance videos, the hidden Markov model of fuzzy C-means was used to learn and recognize the traffic congestion state. A fuzzy logic method for traffic state recognition based on camera images was proposed. A traffic state recognition method was proposed, using traffic monitoring videos based on the fuzzy C-means clustering algorithm. A method recognizing three traffic states (free traffic flow, steady traffic flow, and forced traffic flow) based on images and video data was achieved in [15], and an image-based traffic density estimation method was proposed, in which a support vector machine classifier was used to classify traffic states into heavy, medium, and light densities. In general, this type of algorithms has high complexity and poor realtime video processing ability. e convolutional neural network (CNN) has achieved rapid and effective image pattern recognition using largescale image training, which can avoid the extraction, analysis, and modeling of complex image features. It is one of the most common deep learning methods and has achieved great success in image classification and object detection tasks. In 2012, Krizhevsky et al. brought out AlexNet, which ranked the first in the ImageNet image classification competition in [16]. Since then, the network structure has been improved, and other CNN models, such as visual geometry group (VGG) and GoogLeNet, have been proposed. A UAV image vehicle recognition method based on CNN and SVM was proposed in [17]. A satellite image vehicle detection method based on CNN and hard case mining was put forward in [18]. e authors applied MIT and Caltech vehicle datasets, and proposed a vehicle detection method in traffic scenes based on improved Faster RCNN in [19]. e CNN has gained extensive interest in large-scale image classification, but less study has used the CNN in traffic state recognition based on video images.
Encoders, which are nonlinear and unsupervised neural network models, can effectively reduce the dimensionality of video image data in [20,21]. A supervised stacked autoencoder to extract image features was constructed in [22]. In [23], the authors integrated a random forest classifier into the stacked sparse autoencoder for hyperspectral image classification, which produced promising generalization performance, prediction accuracy, and operation speed. A dual adversarial autoencoder for image clustering was proposed in [24], which achieved comparable clustering accuracy to CNN models. In [25], an image compression architecture based on energy compression using a convolutional autoencoder was established, which achieved high coding efficiency. In [26], the authors used an autoencoder to project high-dimensional image feature vectors into lowdimensional potential space and proposed an image selection method based on autoencoder neural networks and then applied it to semisupervised image classification. An enhanced fast NSGA-II based on a special congestion strategy and an adaptive crossover strategy, namely ASDNSGA-II, was proposed to improve the selection strategy in [27].
e studies above demonstrate that autoencoders can effectively extract the image features through dimension reduction and show good image recognition results when combined with classification or clustering methods.

Methodological Framework.
In this study, an autoencoder with multiple hidden layers was used to compress the image features. Based on the compressed low-dimensional data, classification and clustering methods were used for traffic state recognition. e proposed method consisted of four steps (Figure 1), described as follows: (1) Establishment of image datasets. Traffic images were generated based on traffic surveillance videos. e traffic state was determined manually for each frame, and an image dataset with three traffic states (i.e., free traffic flow, steady traffic flow, and congested traffic flow) was established. (2) Image data preprocessing. e image matrix was transformed into row vectors, and the vector element values were normalized and taken as the input of the autoencoder (3) Construction of the autoencoder. e influence of the main structural parameters, such as the dimensionality of the input data, the number of hidden layers, and the dimensionality of the dimensionreduced data, on the performance of the autoencoder was evaluated, and as a result, an autoencoder suitable for traffic image processing was constructed. e autoencoder was trained, tested, and optimized based on feedback. (4) Traffic state recognition. e support vector machine (SVM), deep neural network (DNN), and k-means clustering method were used to classify the feature data obtained using the autoencoder, thereby achieving traffic state recognition

Autoencoder Construction.
e autoencoder was composed of an encoder and a decoder. e encoder extracted the features of the original image data, and the decoder restored the feature data. e smaller the difference between the restored data and the original data was, the better the autoencoder, and the more effective the extracted feature data.
Traffic surveillance video images were taken as the input of the autoencoder. Both the encoder and decoder had N hidden layers, and the network had a symmetrical structure.
ere was a fully connected weighting structure between each pair of layers, as shown in Figure 2. Taking the q th image in an image dataset with a sample size Q as an example, the operation of the autoencoder is as follows: (1) e q th image is converted into a 1 × τ 0 vector where q ∈ [1, Q]. Here, τ 0 � l × h and l, h are the width (pixels) and height (pixels) of the image, and i is the element number of X q 0 . (2) Encoding: N encoding hidden layers were used to perform a nonlinear transformation of X q 0 to obtain N encoding vectors X respectively. e calculation equation is as follows: where Γ is the encoding nonlinear activation function, which was a hyperbolic tangent function tan hx in this study; w q n is the encoding weight matrix of the n th encoding hidden layer in the q th image; X q N represents the encoding vector of the q th image; z q n represents the encoding offset matrix of the n th encoding hidden layer in the q th image; n represents the encoding hidden layer number, and n ∈ [1, N]. e calculation equation of tanhx is as follows: where x is equal to w q n X q n−1 + z q n . (3) Image decoding: N decoding hidden layers were used to perform a nonlinear transformation of X q N , to obtain N decoding vectors, with the dimensions of X q N , . . . , X q φ , . . . , X q 2 , X q 1 , respectively. e decoding vectors were restored to the decoding vector X q 0 with a dimension of 1 × τ 0 . e calculation equation is as follows: where Γ is the decoding nonlinear transformation function, which was a hyperbolic tangent function Mathematical Problems in Engineering tanh() in this study; w q φ is the decoding weight matrix of the φ th decoding hidden layer in the q th image; z q φ is the decoding offset matrix of the φ th decoding hidden layer in the q th image; φ is the decoding hidden layer number, and φ ∈ [1, N]. (4) Feedback optimization of encoding-decoding training: with the least mean square (LMS) as the objective, through continuous feedback propagation training, the autoencoder minimizes the error between the input X q 0 and output X q 0 , to obtain the optimal autoencoder structure and encoding data. e error loss calculation equation is as follows:

Traffic State Recognition.
In this study, the dimensionreduced data X N obtained using the optimal autoencoder A * was used as the input of the classification and clustering methods for traffic state recognition, including linear classifier in [28], SVM in [29], DNN in [30], and DNN linear combined classifier in [28].
Here, X q N is the encoding vector of the q th image ob- i is the value of the i th element, and Q represents the sample size of the dataset, i.e., the number of images.
(1) e linear classifier, a typical supervised learning model, finds the optimal hyperplane that separates the two classes of samples according to the characteristics of the samples. Taking ) as an example, let the optimal hyperplane be equation (14).
e left part of the equation can be defined as a linear discriminant function, as shown in (15).
Let the weights κ � κ 1 , κ 2 , . . . , κ τ N and κ 0 be the offset. en, the inner product is g(x q ) � κ (X q N ) T + κ 0 . e linear classifier g(x q ) divides the feature space into two regions: g(x q ) > 0 and g(x q ) < 0; in other words, the data points are divided into two categories: σ 1 and σ 2 . e optimal hyperplane can be obtained by training the sample data.
(2) As a linear classifier, SVM is widely used in image processing, face detection, speech recognition, and so on. e dimension-reduced data that corresponds to each image in X N was labeled with a category label π, π � −1, 1 { }, which indicates that the data are to be divided into two categories. en, a training set U � (X 1 N , π 1 ), (X 2 N , π 2 ), · · ·, (X q N , π q ), · · ·, (X Q N , π Q ) was established to train the SVM, thereby obtaining the optimal hyperplane to classify the samples.
Here, μ ∈ R n , b ∈ R represent the normal vector and intercept of the hyperplane, respectively. By introducing a slack variable α, the matrix form of the SVM can be expressed as follows: where (X N ) T is the transpose of the matrix X N , ] is the penalty parameter, and e is the vector in which Encoding Decoding each element is 1. By solving the optimization problem of (18), μ and can be solved.
(3) e DNN mainly includes an input layer, a hidden layer, and an output layer. ere is a fully connected structure of weights between each pair of layers. is method is widely used in image recognition, natural language processing, speech recognition, and so on. Its structure is shown in Figure 3. e input layer has Υ N neurons, which are connected with the element values of Υ N and correspond to X ). After being processed by M hidden layers (the dimension of each layer is 1 × Ω 1 , 1 × Ω 2 ,. . ., 1 × Ω M ), the probability of the ς category is obtained. e sample category is obtained using the softmax function.

Dataset.
e original data consisted of traffic surveillance videos of a main road in a Chinese city, and the image resolution was 704 × 576 pixels. e images were extracted and divided into three traffic states: free traffic flow (free), steady traffic flow (steady), and congested traffic flow (congested). Two datasets were constructed: A1 has a sample size of 1500, which was used to evaluate the influence of the autoencoder structural parameters, and A2 has a sample size of 6000, which was used to optimize the autoencoder and traffic state recognition. e number of images was the same for each traffic state in both datasets. e datasets are shown in Figure 4.

Model Training.
e training parameters included the basic learning rate, batch training size, and the number of iterations. In the process of model training, the adaptive moment estimation (Adam) iterative optimization algorithm in [31] was used to dynamically adjust the learning rate of each parameter using the first-order moment estimation and second-order moment estimation.
e equations are as follows: where g t is the gradient of time step t; χ t is the first-order moment estimation of, g t i.e., the exponential moving average of g t ; ϑ t is the second-order moment estimation of g t , i.e., the exponential moving average of g t ; η and ρ are the exponential decay rates; χ t is the deviation correction of χ t ; ϑ t is the deviation correction of ϑ t ; υ t is the parameter vector of time step t; the default learning rate is β � 0.001; and ε is the residual term, ε � 10 − 8 . e Adam algorithm has a low requirement for memory during the computation. After deviation correction, each iterative learning rate was within a certain range, which results in stable parameters. erefore, the Adam algorithm was suitable for training and optimization of the large datasets and high-dimensional data that were addressed in this study.

Evaluation of the Autoencoder.
According to the loss distribution curve of the autoencoder, two indexes were used to measure its convergence and evaluate the performance of the autoencoder.
(1) A threshold Θ was set for the loss. When the training loss of the autoencoder became less than Θ after performing iterations to step θ, the average loss L Θ between steps θ-D was calculated, as shown in Figure 5. e smaller the value of L Θ is, the smaller the difference between the original input data and the restored data of the autoencoder, and the better the performance of the autoencoder.
where D is the total number of training iterations, L j is the loss of the j th iteration, j ∈ [θ, D], and Θ is the loss threshold. (2) e loss distribution curve of the autoencoder was fitted, and the tangent slope at the end of the fitting curve was calculated.
where y ′ is the loss fitting curve function and y ′ (D) is the first-order derivative at the D th iteration, i.e., the slope. As shown in Figure 6, when k D was less than 0; the closer it is to 0, the gentler the loss curve, and the better the convergence effect.

Mathematical Problems in Engineering
Under the same parameter settings and training conditions, as L Θ decreased, more effective feature data could be extracted and restored by the autoencoder; as k D became less than 0 and approached 0, the convergence effect of the autoencoder loss was better, which indicates high encoding-decoding stability and reliability.
(3) Structure optimization of the autoencoder. To design an autoencoder for video traffic state recognition, 36 sets of crossover tests were conducted based on the dimension of the input data, the dimension of the dimension-reduced data, and the number of hidden layers, as shown in Table 1.  Mathematical Problems in Engineering e 36 models were trained using the sample dataset A 1 with the same training parameters. e basic learning rate was 0.0001. e batch training size was 100, and the number of iterations was 15,000. e distribution of the loss was analyzed, and the average loss when Θ was 0.01 was calculated, as shown in Figure 7. e average losses of the 3rd, 6th, 9th, 12th, 15th, 18th, 21st, and 24th groups were at a lower level compared with other groups of tests. In these eight groups of tests, the input data dimension of the first four groups was 32 × 32 and that of the last four groups was 64 × 64. e number of hidden layers in each test was 10. erefore, when the input data dimension decreased and the number of hidden layers increased, L 0.01 decreased; the effect of the dimension of the dimension-reduced data on the loss was not significant. However, as the dimension of the dimension-reduced data increased, the model training loss tended to decrease. e eight groups of models were preliminarily selected as the candidate models. en, curve fitting was performed on the loss of the 36 groups of tests. e coefficients of determination R 2 of the fitting functions were all between 0.84 and 0.95. Only 13 groups of tests had R 2 lower than 0.9. erefore, the goodness of fit was high on the whole. e k D of the loss fitting function of each group of tests was calculated, as shown in Figure 8. In general, the tangent slopes of the loss fitting curves of each group of tests were all less than 0 and close to 0. e curve convergence effect was good, and the tangent slopes of the eight candidate models were all small. erefore, the autoencoder models of the 3 rd , 6 th , 9 th , 12 th , 15 th , 18 th , 21 st , and 24 th groups of tests were selected as the candidate models and numbered as model I-model VIII.
e structure is shown in Table 2.
Using the sample dataset A 2 , the candidate models I-VIII were tested. e basic learning rate was 0.0001. e batch training size was 300, and the number of iterations was 100,000. According to the distribution of the training loss, Θ was set to 0.005. L 0.005 and k D of each model were calculated. e results are shown in Table 3. Compared with other models, model III had the smallest value of L 0.005 and the tangent slope k D . erefore, given the performance of model III, it was selected as the optimal autoencoder model A * .

Traffic State Recognition and Evaluation Methods.
We assume that there were C types of samples (i.e., traffic states) in the traffic video image dataset. For an arbitrary image sample, its traffic state can be accurately recognized or can be misjudged as other traffic states. erefore, in the recognition results of certain types of traffic states, there might be errors and omissions. To effectively evaluate the traffic state recognition model, the accuracy rate and recall rate were used to measure the error and omission situations. Furthermore, a comprehensive evaluation index F 1 was proposed. e larger the value of F 1 is, the better the performance of the model. e principle and calculation process of the three indexes are as follows: (1) e recognition accuracy rate, recall rate, and F 1 value of each traffic state in the image datasets were calculated as follows: Here, C is the total number of traffic states in the traffic image dataset. In this study, C � 3 (i.e., free, steady, and congested traffic flow); c is the category of the traffic state, c ∈ [1, C]; P c is the accuracy rate of the model on the c th type of sample in the image dataset; R c is the recall rate of the model on the c th type of sample in the image dataset; F c 1 is the F 1 value of the model on the c th type of samples in the image dataset; T c is the number of samples of the c th type correctly recognized using the model; S c is the number of samples of the c th type obtained using the model; and Q c represents the actual number of samples of the c th type in the image dataset.
(2) e average accuracy rate, recall rate, and F 1 value of C-type traffic state all were calculated.
where P is the average accuracy rate of the model on the C-type traffic state in the image dataset, R is the in the average recall rate of the model on the C-type traffic state in the image dataset, and F 1 is the average F 1 value of the model on the C-type traffic state in the image dataset.

A * -Classifier Recognition
Results. e traffic images were encoded using the autoencoder A * , from which the dimension-reduced data were obtained and taken as input to the SVM, DNN, linear classifier, and DNN Linear classifier for traffic state recognition. In this study, the models that integrated A * with the SVM, DNN, linear classifier, and   Table 4. e P and R values of the four models were all in the range of 94.5%-97.1%, and the average F 1 value was 94.4%-97.1%, which indicate that the integration of the A * autoencoder with the classification methods produced excellent traffic state recognition results. us, A * can effectively extract traffic image features to achieve traffic state recognition.
To further analyze the recognition results of the model, the accuracy rate, recall rate, and F 1 value of each traffic state all were calculated, and the results are shown in Figure 9. e accuracy rate, recall rate, and F 1 value varied among the four models. e accuracy rates ranged between 89.0% and 100%. Only the accuracy rate of A * -SVM for the steady traffic flow and that of A * -Linear for the congested traffic flow were below 90%; the recall rates were 86.4%-100%, and specifically, the recall rate of A * -SVM for congested traffic flow and that of A * -Linear for steady traffic flow were below 90%; the F 1 values ranged between 91.1% and 100%, which indicates that the overall performances of the four models were all good.
From the perspective of traffic states, the accuracy rate, recall rate, and F 1 value for free traffic flow were the best in all of the models, compared with steady and congested traffic states. e values were close to 100%, which indicates that almost all free traffic flow images were correctly recognized. e accuracy rate and recall rate for steady and congested traffic states were mostly higher than 90%, and the accuracy rate of A * -SVM for steady traffic flow and the recall rate of A * -SVM for congested traffic flow, as well as the accuracy rate of A * -Linear for congested traffic flow and the recall rate of A * -Linear for steady traffic flow, were below 90%. Figures 10-12 show the recognition results of the four models for the three traffic states in dataset A 2 . e autoencoder A * showed a good effect on the image dimensionality reduction. Integration of the autoencoder with the SVM, DNN, linear classifier, and DNN Linear classifier achieved overall good recognition results for the different traffic states.

A * -K-Means Clustering Results.
e autoencoder A * was integrated with the k-means clustering method, to make a model that is referred to as A * -k-means, which was trained using dataset A 2 . e clustering accuracy rate, recall rate, and F 1 value for the different traffic states were calculated, and the results of which are shown in Table 5. e clustering accuracy rate, recall rate, and F 1 value of the A * -k-means model all were between 90.2% and 100%, which suggests good recognition performance for the different traffic states. Moreover, the accuracy rate, recall rate, and F 1 value for the steady traffic flow were all 100%; in other words, all of the samples were recognized accurately. e recognition results for the free and congested traffic flow were almost the same, and the F 1 values were approximately 93%. us, the optimal autoencoder A * constructed in this study can achieve a good traffic state recognition effect when combined with common clustering methods, which indicates that A * can effectively extract the traffic image features.

Comparison between the Proposed Model and CNN Model.
e CNN is a typical deep learning method. Common CNN models include AlexNet, VGG16, GoogLeNet, and LeNet in [32], which have been widely applied in the field of image pattern recognition and have achieved remarkable results. In this study, we used dataset A 2 to train and test the four CNN models, and we calculated the accuracy rate, recall rate, and F 1 value, and the results of which are shown in Table 6.
Among the four CNN models, AlexNet had the optimal performance; its accuracy rate, recall rate, and F 1 value all reached approximately 94%. e F 1 values of the proposed A * classifier and A * -k-means clustering model were 94.4%-97.1%, which was higher than that of AlexNet. e performance of LeNet in traffic state recognition was average, with an accuracy rate, recall rate, and F 1 value of 82.3%, 62.4%, and 71.0%, respectively. Moreover, the F 1 values of GoogLeNet and VGG16 were all below 40%, which suggests poor performance in traffic state recognition.
From the perspective of the model network structure, A * used only five encoding hidden layers to obtain the image feature data, whereas AlexNet, LeNet, GoogLeNet, and VGG16 had 12, 9, 23, and 22 network layers, respectively. e complexity of A * was much lower than those models. erefore, by integrating a simple and practicable autoencoder with common classifiers, the models proposed in this study were able to achieve or surpass the traffic state

Conclusions
In this study, an autoencoder model for urban traffic surveillance videos was proposed that can effectively reduce the data dimension by optimizing the input data dimension, number of hidden layers, and dimension of the dimensionreduced data. Taking the low-dimensional image features obtained using the autoencoder as the input, five models were constructed based on common classification methods, including the linear classifier, DNN, SVM, DNN Linear, and k-means clustering method. e performances of the models in traffic state recognition were compared with those of commonly used CNN models. e results show that the average F 1 value of the four models A * -DNN, A * -SVM, A * -Linear, and A * -DNN Linear was 94.4%-97.1%, and the average F 1 value of A * -k-means was 95.3%. Among the CNN models, AlexNet has the best performance, with an F 1 value of 94.0%. us, the autoencoder constructed in this study can effectively extract the image features. When integrated with common classification and clustering methods, it can accurately recognize the traffic state, and it achieves better results than common CNN models, such as AlexNet, LeNet, GoogLeNet, and VGG16.
e traffic state recognition model was established in two stages to avoid the problems of high algorithm complexity and calculation cost. First, an autoencoder was proposed which effectively reduces video dimension; then, the traffic state recognition model was established by integrating the autoencoder with common classification and clustering methods. e train and test result presents that our method has advantages of a lightweight model structure and a low calculation cost, which outperforms CNN models such as AlexNet and GoogLeNet. e method can also be applied in other fields of video detection, such as image compression, moving target detection, and image classification.
Due to the variability and complexity of traffic scenes and the difficulty in building large-scale traffic image datasets, we conducted an exploratory study on the construction and optimization of an autoencoder model using limited traffic scenes and sample size. In the future, we will focus on the train and test dataset construction, autoencoder model architecture and parameter optimization, and classification method selection.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.