Road-Type Classification with Deep AutoEncoder

Machine learning algorithms are among the driving forces towards the success of intelligent road network systems design. Such algorithms allow for the design of systems that provide safe road usage, efficient infrastructure, and traffic flow management. One such application of machine learning in intelligent road networks is classifying different road network types that provide useful traffic information to road users. We propose a deep autoencoder model for representation learning to classify road network types. Each road segment node is represented as a feature vector. Unlike existing graph embedding methods that perform road segment embedding using the neighbouring road segments, the proposed method performs embedding directly on the road segment vectors. The proposed method performs embedding directly on the road segment vectors. Comparison with state-of-the-art graph embedding methods show that the proposed method outperforms graph convolution networks, GraphSAGE-MEAN, graph attention networks, and graph isomorphism network methods, and it achieves similar performance to GraphSAGE-MAXPOOL.


Introduction
Troughout the world, the number of vehicles and road users is increasing, and this has created trafc problems such as trafc congestion, accidents, and fuel cost. Te rise of such trafc problems has led to the need to design and develop smart cities. Smart city design integrates physical, digital, and human systems in the built environment to facilitate the planning, construction, and management of the city's infrastructure [1]. Smart cities cover a wide range of applications within the transport and health industries. One key element of smart city design within the transport industry is the intelligent road network system design, which aims to ensure efcient trafc fow by minimising trafc problems. Intelligent road networks have also seen a wide range of applications in the domain of autonomous vehicles.
Te use of intelligent road networks system is widely accepted in many countries, and its use is not only limited to trafc fow control and information but also expands to efcient infrastructure and road safety usage. Machine learning algorithms have been the driving force behind the successes of intelligent road networks; indeed, access to big data has opened doors to the development of various intelligent road network models. One such application of machine learning in road networks is classifying diferent road types. Road-type classifcation models are becoming important, as they can be embedded in interactive maps to provide helpful trafc information to road users. Other benefts of road-type classifcation include efcient trafc fow management, avoidance of congested routes, avoidance of routes where accidents are likely to occur, avoidance of routes with many intersections, and model integration to autonomous vehicles.
However, modelling road networks with machine learning is complex due to the lack of available feature extraction methods for representing road-types as feature vectors. Tus, researchers have introduced deep learning embedding methods to learn the spatial information of road networks to automatically extract features in the network data. Tese embedding methods are termed graph representation learning (GRL) as they rely on the spatial connection of diferent objects within the road network structure; thus, each object's feature vectors are constructed by leveraging its spatial connection with neighbouring objects. Te main goal of GRL methods is to achieve automatic feature extraction on the non-Euclidean graph data space without relying on the actual object attributes.
In this work, however, a method for representing different road types as feature vectors is proposed, such that machine learning classifcation algorithms can be trained and evaluated on these features. We take full advantage of the state-of-the-art baseline road network feature extraction method proposed in [2]. Furthermore, we introduce a deep autoencoder (DAE) embedding method to reduce the dimensions of feature vectors obtained by the baseline method. We then pass the feature vectors extracted by our DAE method to several machine learning classifcation algorithms; we select the classifer with the highest performance measure.
Te rest of the paper is organised as follows. In Section 2, we present the literature study and related works. Section 3 provides the materials and methods used to build our model. In Section 4, we present the experimental results obtained by the proposed method, and we further compare the results to some of the state-of-the-art methods found in the literature. Finally, in Section 5, we conclude our work and provide recommendations for future work.

Background and Related Work
Graph theory is the fttest paradigm for modelling road networks, as it embraces all the topological information of any road network. Apart from the spatial road networks, graphs diagrammatically represent all transport networks, including highway, transits networks, air, and water. Tus, attributes such as speed, travel times, number of lanes, and head ways can be represented. A network's topological spatial structure is represented by graphs composed of lines and points. Lines are also called edges, while points are nodes or vertices. Terefore, graphs can represent the topology and spatial structure of a road network such that nodes represent intersections, dead-ends, and locations of interest on the roads, while edges represent the road segments between such nodes.
Machine learning in road networks has had many successes in facilitating important trafc information such as trafc forecasting [3][4][5], speed limit annotation [6][7][8], and travel time estimation [9][10][11]. However, machine learning in road networks for modelling road-type classifcation is often challenging due to a lack of attributes representing diferent road types. Tus, it sounds reasonable to apply deep learning methods to automatically learn the network's structure and represent every road segment by aggregating its neighbouring road segments. However, solving a learning problem on graphs is challenging. Tis is because many widely used data types, such as images and texts, are not structured as graphs. Also, the underlying connectivity patterns on the graph-structured data are more complex and non-Euclidean.
Te fundamental solution to modelling complex, non-Euclidean patterns is to learn the graph representation from a low-dimensional Euclidean space using the GRL methods. Once the low-dimensional representations are learned, graphs-related problems such as node and link predictions can be achieved. Also known as graph embedding functions, the main goal of the GRL methods is to pack the properties of every road segment into a vector with a smaller dimension; this enables road segment similarity in the original complex graph space to be quantifed in the embedded feature space using standard metrics. Several embedding methods have been proposed in the literature for modelling road networks. In [12], a hybrid graph convolution neural network (HGCN) method is proposed for trafc fow prediction in highway networks, where nodes represent the toll stations while edges represent road segments between two toll stations. In addition to modelling the spatial feature of the highway, the authors achieved better trafc fow prediction by considering factors such as time, space, weather conditions, and data type of each toll station.
It is worth noting that the HGCN method proposed in [12] uses local neighbourhood aggregation to learn the spatial connection of toll stations, and it cannot integrate road segment features into the learning process. Tis is valid since many state-of-the-art GRL methods rely on node features only. However, road segment features in road networks not only provide the connectivity information of two nodes but can also provide important, descriptive information that could be signifcant for the learning representation. To tackle this problem, the notion of relational fusion networks (RFN) is proposed in [13], for the speed limit classifcation and estimation tasks. RFN integrates edge information on the representation learning using the novel graph convolution operator. Te RFN operator aggregates information over the relations between nodes instead of aggregating the information over neighbouring nodes.
To the best of our knowledge, the work proposed in [2] is the only available work in the literature that classifes different road types on a graph dataset extracted from Open Street Maps (OSMnx). Similar to RFN, the authors used the dual graph generated by the line graph transformation of the original graph to incorporate the edge features into the learning process. Tereafter, a method for generating road segment features is proposed based on information such as the length of the road segment, speed limit, and midpoint coordinates of the adjacent start and end nodes. Te authors further compared the performance of learning representation using several embedding methods, including graph convolution networks (GCN) [14], GraphSAGE [15], graph attention networks (GAT) [16], and graph isomorphism network (GIN) [17] in inductive and transductive tasks, and in supervised and unsupervised learning tasks. In addition, a new GRL method, graph attention isomorphism network (GAIN), is proposed.
In our work, we attempt to improve the robustness of road segment features extracted in [2], by using the deep autoencoder (DAE) model as the embedding function; furthermore, we focus on the transductive and supervised learning settings only since these are the settings that achieved the highest accuracy in [2]. Unlike most graph embedding methods proposed in the literature, our DAE model does not construct the vector representation of the target road segment by aggregating over its neighbouring 2 Computational Intelligence and Neuroscience segments; instead, it operates directly on the high dimensional feature vectors of each road segment and produces compact feature vectors in a much smaller dimensional space. We then pass these compact features into several machine learning algorithms and report the results using the microaveraged f1-score. Finally, we compare our highest f1-score to the f1-score obtained using the methods proposed in [2].

Materials and Methods
As depicted in Figure 1, our proposed method for road-type classifcation comprises 6 steps. First, we extract the original road network graph dataset of Linkoping city from OSMnx. Edges in the original graph represent the road segments, while nodes represent information such as intersections and crossroads. In the second step, we transform the original graph into a line graph representing road segments as nodes.
In the third step, we use the original and transformed graphs to derive attributes and represent every road segment as a feature vector. To the best of our knowledge, steps 1 to 3 of our proposed method follow a similar procedure proposed in [2]. In step 4, we introduce the deep autoencoder model as the embedding function, and dimensionality reduction is performed. In step 5, we use the feature vectors obtained by our embedding function to train, validate, and test the deep neural networks, support vector machines, and K-nearest neighbor classifers. We then select the classifer with the highest microaveraged f1-score and compare our obtained results to some of the state-of-the-art embedding methods for solving a similar task to ours.

Input Dataset.
Similar to the transductive setting in [2], the input dataset used to conduct the experiments in our work is the road network graph dataset of Linkoping city. Te dataset was extracted from OSMnx within a 14 km radius of the city centroid. Te obtained graph dataset is represented as G � (V, E), where V and E are set of nodes and set edges, respectively. Edges represent road segments, and nodes represent crossroads, intersections, and junctions. Some of the preprocessing steps on the obtained graph involved transforming G into an undirected graph, consolidating parallel edges, and intersections within a 10 m distance. Figure 2, the original graph G is converted to line graph L(G), such that edges (road segments) in G become nodes in L(G) and two edges (two road segments) that share a node (intersection) in G become an edge in L(G). Transforming G to L(G) has two signifcant advantages. Firstly, graph embedding methods in the literature are designed for nodes and not edges; thus, the transformed graph L(G) has road segments as nodes. Secondly, nodes (cross-roads, intersections, and junctions) on the original graph do not have the essential information required for road-type classifcation tasks. Algorithm 1 gives the steps used to transform G to L(G).

Class Distribution.
Road segments in OSMnx are tagged with their corresponding road-type labels, thus allowing for a supervised classifcation task to be accomplished. However, 15 road-type labels are obtained, and some of these labels rarely occur on our obtained dataset. Terefore, the distribution of data is highly characterised by extreme class imbalances. To tackle this problem, we follow the same technique as in [2], where the authors merged and relabelled road types as shown in Table 1.

Feature
Engineering. Feature generation of each road segment is conducted by extracting its descriptive attributes from the edges of the original graph and nodes of the transformed graph. Indeed, attributes such as the width, length, number of lanes, and speed limit of light vehicles and heavy vehicles provide useful road segment information required for feature generation. Nevertheless, we generate the road segment feature vectors using four main components as in [2] to compare the results fairly. As shown in Table 2, these four components yield a 58-dimensional feature vector for every road segment. Let l represent the road segment length, (x, y) be the midpoint coordinates of two nodes in longitude and latitude directions, respectively, and S � s 1 , s 2 , s 3 , . . . , s m be the one hot encoding vector of m speed limits. Ten, the fnal feature vector of each road segment is generated using Algorithm 2.

Embedding with Deep AutoEncoder.
We introduce the deep autoencoder (DAE) model to achieve the embedding task. In contrast to the graph embedding methods found in the literature, where road segment vector representation is obtained by aggregating over the neighbouring road segments, our DAE model performs embedding directly on the high-dimensional features of each road segment. As shown in Figure 3, our DAE model comprises three crucial components: the encoder, the embedding space, and the decoder. Te encoder component takes the D-dimensional road segment feature vectors as input and compresses these into the smaller dimension while preserving as much important information as possible. Te preserved N-dimensional feature vectors (where N ≪ < D) are stored in the embedding space. Te decoder component aims to reconstruct the original D-dimensional road segment features by decompressing the N-dimensional features in the embedding space. Taking the above objectives of each component, we can therefore defne the learning process of our DAE model into three steps. First, we compress the D-dimensional input road segment features (X) into N-dimensional feature space in the encoder component. Ten, we reconstruct the output Y from the small dimension using the decoder component. Finally, we calculate the error diference between the original inputs and the reconstructed outputs and adjust the weight parameters to reduce this diference.
Our DAE model is a fully connected network with an input layer, four hidden layers, and an embedding space layer on the encoder component. Te decoder component comprises four hidden layers and an output layer. Te output layer has the same size as the input layer in the Computational Intelligence and Neuroscience    encoder, while the size of the hidden layers in the decoder is similar to the size of the hidden layers in the encoder. We frst normalise the road segment feature vectors on the encoder before feeding them to the input layer. Tereafter, we obtain the value of each neuron in the next compressed layer by computing the sum of products of values in the previous layer and their corresponding weight parameters. We then introduce nonlinearities to the network by applying the rectifed linear unit (ReLU) activation function defned as ReLU(x) � max (0, x). On the decoder, we decompress values in the embedding space layer and obtain values in the next decompressed layer using a similar procedure; again, the ReLU function is used as the activation function. Furthermore, we normalise the values in the output layer to be between 0 and 1 through the sigmoid function defned as Sigmoid(x) � 1/1 + e − x . Tis normalisation is important since input features are also normalised. Finally, we measure the error diference between values in the input layer and their corresponding values in the output layer. Terefore, our optimisation problem is fnding the set of optimal weight parameters on the encoder component that achieves the smallest possible error diference. Finally, we extract features in the embedding space layer which we later use to train, validate, and test the machine learning algorithms. Algorithm 3 shows the step-by-step implementation of our DAE model for an embedding task. Te reasons for choosing the number of hidden layers and corresponding sizes will be given in greater detail in Section 4.

Road Segment Classifcation.
We use the obtained embedded features, Z, in N-dimensional feature space (where N � 8) to compare the performance of deep neural networks (DNN), support vector machines (SVM), and K-   (6) Convert to line geometry.
Divide l s into 20 equally spaced distanced points end for (11) else (12) Divide geometry into 20 equally spaced distanced points (lx i , ly i ) i�1,2,...,20 . (13) for i � 1 to 20 do (14) Subtract(lx i , ly i ) by midpoint coordinates (x s , y s ).     Computational Intelligence and Neuroscience nearest neighbors (K-NN) classifers for road-type classifcation of road classes mentioned in Section 3.3. Tese classifers were chosen for comparison as they are deemed adequate for multiclass classifcation tasks across various applications [18][19][20][21][22][23]. Furthermore, these classifers represent three unique learning methods: the artifcial neural networks, the hyperplane-based, and the instance-based learning methods. Te DNN classifer belongs to a family of artifcial neural networks where the network's underlying parameters are fne-tuned to match a given class label for each input vector. Te SVM is a hyperplane-based learning method that transforms nonlinearly separable input features into a highdimensional feature space where input features can be separated linearly. Te K-NN classifer belongs to the family of instance-based learning methods; unlike the SVM classifer, where two classes are trained simultaneously, the K-NN achieves multiclass classifcation tasks in one go, where feature vectors (with class labels) representing multiple classes are stored in a feature space. Te K parameter is used to decide the class label of the unlabelled vector. Tus, comparing these three classifers will signify the best learning method for a road-type classifcation task.
We initially divided the input features into train and test datasets. We perform the 10-fold cross-validation method on the training dataset to obtain optimal parameters for each classifer; then, we use the test dataset to obtain the microaveraged f1-score of each classifer based on the optimal parameters.

Deep Neural Networks.
Te DNN classifer is a fully connected network with the input layer, two or more hidden layers, and the output layer. Te size of the input layer corresponds to the number of components (m) of road segment feature vectors (X � (x i ) i�1,2,...,m ), and the size of the output layer corresponds to the number of road-type classes (Y � (y i ) i�1,2,...,n ). Te size and number of hidden layers are often fne-tuned for optimal results. Te embedded road segment features are passed into the input layer; the outputs from the input layer are fed into the 2 nd layer, the outputs from the 2 nd layer are fed into the 3 rd layer, and so on, and ultimately the outputs from the (L − 1) th layer are fed into the L th layer; equations (1) and (2) are used to obtain the value of the i th neuron of the l th layer, u l i , by taking the sum of products of values previous layer l − 1 and their corresponding weight parameters W � (W 1 , W 2 , . . . , W L ), where W i � (w i1 , w i2 , . . . , w iS i ), and S i is the size of the i th layer.
where w i0 is the bias term, S l is the size of the l th layer, and S L � n.
In equation (3), Sigmoid function, g, is applied to compress outputs to be between 0 and 1 and thus obtain probabilities that a given road segment feature vector belongs to a class.
Equation (3) determines the predicted class label for each road segment input feature vector X. Given a training sample representation (X k , Y k ), k � 1, 2, . . . N, such that Y k determines the class of X k , and N is the size of the training sample. Te training of our DNN classifer is made by frst obtaining the predicted class label, v 1 (X k , W), v 2 (X k , W), . . . , v n (X k , W) based on the randomly initialised weight parameters W for each road segment vector, X k . Te error incurred between predicted outputs and actual class labels Y k � (y k 1 , y k 2 , . . . , y k n ) for all the training samples is measured by the following formula: During the training process, the parameters vector W will be updated using the following formula: where λ is the learning rate parameter, and zE/zW is the gradient calculated using the backpropagation. Algorithm 4 shows the steps used to classify road segment features using the DNN classifer.

Support Vector Machines.
We perform the multiclass road-type classifcation task using the one vs. one support vector machines (SVM) formulation. In one vs. one SVM, we train two road-type classes at a time; thus, for Mroadtype classes, we obtain a total of M(M − 1)/2 SVM classifers; we then assign a class label to the unknown road segment feature vector based on the class with majority counts. For any pair of road-type classes with road segment features from the training dataset and the corresponding class labels (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x m , y m ) and y ∈ −1, +1 { }, we construct an optimal hyperplane that separates the two classes with the largest possible margin as shown in Figure 4. Tis margin is defned as the distance between vectors nearest to the optimal hyperplane (support vectors) from both classes.
Te H 1 and H 2 planes defned by wv i + b ≥ 1 and wv i + b ≤ − 1 represent the boundaries for feature vectors that belong to two distinct road-type classes. Te margin which must then be maximised for the optimal hyperplane is the distance: d � 2/|w| between the H 1 and H 2 planes. We maximize d by solving a dual optimisation problem defned as Computational Intelligence and Neuroscience 7 Subject to the following constraints: Furthermore, we introduce Lagrangian's multipliers to eliminate the constraints, and we obtain the dual SVM formulation defned as maximizing Solving the dual SVM yields the coefcients of α i , feature vectors where α i > 0 are the support vectors, and they lie directly on the H 1 and H 2 planes. Te dual SVM problem is 0 for α i � 0. Tus, the SVM optimisation problem is afected only by the support vectors. Te optimal hyperplane for assigning a class label to an unknown road segment feature vector v is done by evaluating the following function: Radial basis function (RBF) kernel is used to transform nonlinearly separable features to a higher features space φ Require: m: size of input layer; n: size of output layer Training set: TrFS � (X k , Y k ); i � 1, 2, . . . , m Test set: TeFS � T i ; i � 1, 2, . . . , p Output: DNN structure: W Opt : Optimal weight, L: number of layers List of labels such that l i is the class label of the Test set element t i : L � (l i ) i�1,2,...,p (1) Training phase: (2) Initialise weight parameters structure W (3) Defne number of hidden layers, L, and corresponding sizes (S i ) 1,2,...,L . (4) while optimal parameters are not obtained do (5) for each training sample (X, Y) ∈ TrFS do (6) Calculate neurons value using equations (1) and (2). (7) Obtain the predicted output using equation (3). (8) end for (9) Calculate loss using equation (4). (10) Obtain the updated weight parameters using equation (5). (11) end while (12) Store optimal weight parameters W Opt . (13) Classifcation phase: (14) for each road segment vector T k ∈ TeFS do (15) Predict class l k of T k (16) end for (17) Return L ALGORITHM 4: Road segment classifcation using DNN. Figure 4: Application of the SVM classifer on two road-type classes. 8 Computational Intelligence and Neuroscience where they can be separated linearly. We compute the transformation by taking the dot product between any pairs of feature vectors using a Kernel function: Te RBF kernel is defned as follows: Algorithm 5 outlines the steps used to classify road segment features using the SVM classifer.

K-Nearest
Neighbors. K-nearest road segment feature vectors from the training dataset are used to assign a class label to the unknown feature vector. Tus, given road segment features with their corresponding class labels from the training dataset as (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x m , y m ), we calculate the distance between the unknown vector v and vectors in the training dataset using the Euclidean distance defned as where c r (u) is the value of the r th component of the vector u. We then defne a set V � v 1 , v 2 . . . , v K of K features from the training dataset nearest to the unknown feature to assign its class label. According to equation (12), the unknown feature vector u is assigned to a class that appears the most within a set of K-nearest feature vectors.
where y i is the class of sample x i , l v is the class label of the vector v, and the function δ is defned as follows: Algorithm 6 lists the steps used to classify road segment features using the K-NN classifer. Inputs are the feature vectors initially divided into training and test datasets. We use the training dataset to train the classifer and obtain the optimal K through cross-validation. We use the test dataset to obtain the accuracy of the K-NN classifer based on the optimal K value.

Experimental Results and Discussion
Te Linkoping city road networks graph dataset [24] is used to train our DAE embedding method. Embedded features are used to train, validate, test, and compare DNN, SVM, and K-NN classifers for road segment classifcation tasks. Classifer with the highest microaveraged f1-score is selected, and results achieved are compared to some state-of-the-art embedding methods found in the literature for solving a similar problem. Experiments are designed mainly to obtain optimal parameters on our DAE embedding method and classifers.

Embedding with Deep
AutoEncoder. Input to our DAE embedding method is a total of 6761, and 58-dimensional road segment feature vectors described in Section 3.3. First, the dataset is divided into a 70/30 split, where 70% of the dataset is used to train the DAE model while the remaining portion of the dataset obtains optimal parameters of the DAE method. Optimisation is achieved through the Adam optimiser, while the batch size and maximum iterations were chosen as 1024 and 500, respectively. As depicted in Table 3, we defne several DAE models with varying numbers of hidden layers and sizes on the encoder and decoder component and the layer size on the embedding space. Tis is done to identify optimal parameters that achieve the lowest reconstruction error and the highest accuracy using the validation dataset.
We train each DAE model listed in Table 3 using different learning rates 1e − 4 , 1e − 3 , 1e − 2 , and we report the results in terms of the reconstruction error and accuracy based on the validation dataset after 500 iterations. We then select the DAE model with a corresponding learning rate that achieves the lowest average reconstruction error and the highest accuracy as our optimal DAE embedding method. Figures 5-7 show the performances of three DAE models at increasing learning rates in terms of reconstruction error and validation accuracy. It can be observed that the DAE model with 5 hidden layers and embedding space of 4 units achieved the lowest reconstruction error and the highest accuracy of 0.0013 and 98.82%, respectively, at a learning rate of 1e − 3 . Te DAE model with 4 hidden layers and embedding space of 8 units achieved the lowest reconstruction error and the highest accuracy of 0.000578 and 99.11%, respectively, at a learning rate of 1e − 3 . Finally, the DAE model with 3 hidden layers and embedding space of 10 units achieved the lowest reconstruction error and the highest accuracy of 0.000623 and 98.96%, respectively, at a learning rate of 1e − 3 . Based on these observations, we select our DAE embedding method using the model with 4 hidden layers and embedding space of 8 units as it achieves better performance.

Road-Type Classifcation.
Input dataset to the DNN, SVM, and K-NN classifers is 6761 road segment features of 8 dimensions obtained by our DAE embedding method. Te dataset comprises 5 classes of merged and relabelled road types according to the method described in Section 3.3. We initially divided the dataset into a 70/30 split, where 70% of the data are used to train and validate the classifers based on the 10foldcross-validation method. We use the remaining 30% of the data to test the classifers' performance in the microaveraged f1score based on the optimal parameters obtained by the 10foldcross-validation method. We obtain the micro f1-score of each classifer by frst computing the confusion matrix; thereafter, we calculate the sums of the true positives (TP), false positives (FP), and false negatives (FN) across all the classes.

Classifcation with DNN.
Road-type classifcation with DNN was achieved using the steps mentioned in Algorithm 4. Optimisation was performed using the Adam optimiser, and the ReLU function as the activation function.

Computational Intelligence and Neuroscience
Require: Training set: TrFS � (x i , y i ) i�1,2,...,m Test set: (TeFS � t i ) i�1,2,...,p Output: List of labels such that l i is the class label of the Test set element t i :  List of labels such that l i is the class label of the Test set element t i : L � (l i ) i�1,2,...,p (1) Classifer training.
We used the obtained optimal parameters to train and test the DNN classifer, and we report the results obtained from the test dataset using the microaverage f1-score shown in Table 5.

Classifcation with SVM.
Road-type classifcation with SVM was achieved using the steps mentioned in Algorithm 5. Te one vs. one SVM formulation was used, thus giving a total of 10 classifers. Trough the 10-foldcrossvalidation method, we obtained optimal RBF kernel width of σ � 5 and the error term parameter of c � 100 as indicated in Table 6.
We then used the obtained optimal parameters to train and test the SVM classifer; we report the results obtained from the test dataset using the microaverage f1-score as shown in Table 7.

Classifcation with K-NN.
Road-type classifcation with K-NN was performed using the steps mentioned in Algorithm 6. Trough the 10-foldcross-validation method, we obtained the optimal K value of 5 as shown in Table 8.
We used the obtained optimal parameter to train and test the K-NN classifer, and we report the results obtained from the test dataset using the microaverage f1-score as shown in Table 9.
Tables 5-7 show the performances of three classifers using microaveraged f1-score for road segment classifcation tasks. It can be observed that the DNN is the bestperforming classifer with the micro f1-score of 80.16%. Te second best performing classifer is the K-NN, with the micro f1-score of 76.13%. Te SVM is the worst-performing classifer with the micro f1-score of 70.98%.

Comparison to Other Methods.
We then compared the highest microaveraged f1-score obtained by our DAE model with some of the state-of-the-art embedding methods presented in [2] for the road-type classifcation task. Some of the similarities between our study and the study proposed in [2] are as follows: both studies were carried out using the Linkoping City road network graph dataset extracted from OSMnx, both studies transform the original graph to line graph to obtain more descriptive features for each road segment, and both studies similarly construct a 58dimensional feature vector representing each road segment. Training of embedding methods for both studies is achieved using 500 iterations and 1024 batch size while optimisation is achieved using the Adam optimiser. Te major diference between the two studies is how embedding is achieved. Our DAE method achieves embedding by reducing the dimensionality of each road segment feature vector, while the methods proposed in [2] achieve embedding on each road segment vector by aggregating information from its neighbouring road segments. Te fnal embedded vector obtained from our DAE method has 8 dimensions, while the methods proposed in [2] have the fnal embedded vectors obtained from one of the output dimensions 64, 128, 56 { }. Table 10 shows the comparison of the performance of the methods in terms of the micro f1-score. Our DAE embedding method achieves the micro f1-score of more than 20% when compared to raw features (original 58dimensional features without embedding). Furthermore, our DAE method outperforms the GCN, GSAGE-MEAN, GAT, and GIN methods by micro f1-score of 22%, 18%, 5%, Bold values indicate optimal DNN classifer parameters (hidden layers and learning rate) and the corresponding micro averaged f1-score.  Bold values indicate optimal SVM classifer parameters (sigma and error term) and the corresponding micro averaged f1-score.  and 2%, respectively. We also observe that our DAE method achieves the same micro f1-score of 80% as the GSAGE-MAXPOOL method. Finally, our DAE methods fall short by micro f1-score of 1% compared to the GSAGE-MEANPOOL and GAIN methods. One of the reasons why the proposed method outperforms the state-of-the-art graph embedding methods is that the two methods perform embedding diferently. Our DAE acknowledges that not all 58 features representing each road segment are necessary; thus, it performs embedding by reducing the dimensionality of each road segment from 58 dimensions to 8 dimensions representing the most prominent features. On the graph embedding methods, embedding on each road segment feature vector is performed by using the feature vectors of neighbouring road segments; while this allows for modelling spatial connection of road segments, the fact that some features in each road segment are not necessary is ignored, thus yielding less performance compared to our DAE method.

Discussion
Tis study presents a novel representation learning method for a road-type classifcation task. Compared to other methods found in the literature, which normally perform embedding on each road segment by aggregating information from neighbouring road segments, our method performs embedding by reducing the dimensionality of each road segment while preserving only the important features using the deep autoencoder (DAE) model. To compare the methods fairly, we conducted our experiments using the Linkoping city road networks graph dataset extracted from OSMnx. We then used the same line graph transformation and feature engineering methods as in [2] to represent road segments as nodes and obtain more descriptive features of each road segment, respectively. We then passed the road segment vectors to our DAE embedding methods, obtaining more robust features at much smaller dimensions than the original ones.
We then passed the vectors obtained by our DAE embedding method to the deep neural networks (DNN), support vector machines (SVM), and K-nearest neighbor classifer (K-NN) classifers to select best performing classifer using the microaveraged f1-score. As shown in Tables 5-9, we demonstrated that the DNN is the best performing classifer for road-type classifcation task of the vectors obtained by our DAE method. We compared our DAE method to some of the state-of-the-art methods experimented in [2] for solving a similar task. Tese methods include graph convolution networks (GCN), GraphSAGE (MEAN, MEANPOOL, MAXPOOL, and LSTM), graph attention networks (GAT), graph isomorphism networks (GIN), and graph attention isomorphism networks (GAIN).
In Table 10, we demonstrated that our method outperforms the GCN, GSAGE-MEAN, GAT, and GIN methods while achieving similar performance to GSAGE-MAXPOOPL. Furthermore, we observed that our method falls short by 1%, compared to the GSAGE-MEANPOOL and GAIN methods. It is worth mentioning that GSAGE-MEANPOOL and GAIN embedding methods achieve the best performances at much larger dimensions of the embedded feature vectors compared to our method, which achieves comparable performance at a much smaller dimension of the embedded vectors. We also note from Tables 5-9 that merging and relabelling diferent road types using the method shown in Table 1 is not ideal as several classes (class 2 and class 3) are characterised by many false negatives across all three classifers, thus, resulting in low micro f1-score in all classifers.

Conclusion
Tis paper proposes a novel deep autoencoder (DAE) embedding method for road-type classifcation tasks. We used the state-of-the-art feature extraction method found in the literature and represented each road segment as a feature vector. We then applied our DAE embedding method and obtained embedded road segment features which we later used to train, validate, and test several machine learning classifers. We compared our results to several state-ofthe-art graph embedding methods and demonstrated that our method outperforms some of these methods while achieving comparable results to others. It is worth noting that our method performs embedding by reducing the dimensionality of each road segment vector. In contrast, the Table 10: Method comparisons using micro f1-score.

DAE (proposed) 80
Bold values indicate the micro averaged f1-score obtained by our proposed method (DAE). graph embedding methods in the literature achieve road segment embedding using the neighbouring road segment features. Terefore, future work will employ a double embedding technique where the vectors obtained by our DAE method are fed as inputs to the graph embedding methods proposed in the literature.

Data Availability
Te datasets analyzed during the current study are available at https://planet.openstreetmap.org/.

Conflicts of Interest
Te authors declare that they have no conficts of interest.