Cluster-Based Pavement Deterioration Models for Low-Volume Rural Roads

The management of low-volume rural roads in developing countries presents a range of challenges to road designers and managers. Rural roads comprise over 85 percent of the road network in India. The present study aims at development of deterioration models for the optimum maintenance management of the rural roads under a rural road programme namely Pradhan Mantri Gram Sadak Yojana (PMGSY) in India. Visual condition survey along the selected low-volume rural roads considers parameters like condition of shoulders, drainage features, cross-drainage structures, and camber, and pavement distresses, namely, potholes, crack area, and edge break, are collected for a period of three years. The deterioration models have a significant role in the pavement maintenance management system. However, the performance of a pavement depends on several factors. Cluster analysis can be used to group the pavement sections so that the performance of pavements in different clusters can be studied. Nonhierarchical clustering technique of k-means clustering was considered. Separate deterioration models have been developed for each of the clusters. A comparison of the models developed with and without clustered sections reveals that the clustering of pavement sections are preferred for the efficient rural road maintenance management.


Introduction
Rural roads which connect the villages comprise over 85 percent of the road network in India.The rural roads stimulate overall development by providing access to economic and social infrastructure and facilities.Most of these rural roads were neglected in the past, but with the rapid increase of Indian economy and public demand for better road infrastructure, the Government of India has taken up several road building programmes.For the development of these rural roads, a megaroad development program called Pradhan Mantri Gram Sadak Yojana (PMGSY) was launched in December 2000 by the Government of India to provide all-weather connectivity to unconnected rural habitations.Maintenance of these low volume roads is a routine work performed for upkeep of pavement, shoulder, and other facilities provided for road users.Lack of maintenance of the rural roads affects the people in the villages significantly as the time for access to markets and other social infrastructure is increased.The implementation of different rural road projects improved the mobility and there is a need for further emphasis on preservation of the road assets, through timely and appropriate maintenance.
The scheduling of pavement maintenance and rehabilitation is a critical task.It is probably the key decision variable in any asset management system.This decision process relies upon the capability to predict the future pavement distress condition as a function of time.If the pavement performance prediction model can be developed to accurately reflect real pavement performance, the remaining service life for pavements can be more accurately estimated.Most of the pavement deterioration models use a single model or common model for all the pavement sections.It is not appropriate to have a single model for a variety of pavements.To solve this issue, the pavement sections are to be grouped into clusters of homogeneous groups.The clusters are homogeneous within clusters and heterogeneous between clusters.The main objective of this paper is to study the performance of rural road sections grouped in different clusters and the development of deterioration models for each of the clusters.
Wang and Li [1] used a gray clustering-based methodology to evaluate the existing pavements following the Mechanistic Empirical Pavement Design guide framework.Gray clustering method is one pillar part of the gray system theory, which was proposed in the 1980s to solve problems with partially known information.Luo and Chou [2] introduced cluster-wise regression modelling for the deterioration of the pavement condition and recommended this method for improving the accuracy of the pavement condition predictions.Yu et al. [3] proposed the use of Linear Mixed Effects Model (LMEM) to predict future conditions of a specific pavement section by a weighted combination of the average deterioration trend of the family and the past conditions of the specific pavement.Luo and Yin [4] compared the proposed model of weighted regression function consisting of all clusters with conventional Markov probabilistic model.Analysis of the data collected during the study is an important aspect while developing pavement management system.Among the many analytical tools that are available for modelling, the cluster analysis is found to be important.Studies in the field of sampling of traffic count data and the traffic accidents using cluster analysis are available [5][6][7].The defects in conventional methods of road section maintenance in practical work were analyzed and a more reasonable and operational method, which uses ordinal sample cluster to divide maintenance sections, was reported [8].Though clustering has been adopted as an analyzing tool in various traffic studies, it has not been incorporated with Pavement Management System (PMS) for low volume rural roads.So the present study investigates the application of cluster analysis in the development of appropriate maintenance management for low volume rural roads in India.

Study Area
The study area is Tiruchirappalli District in Tamil Nadu, India, which extends over an area of 4,404 sq•km with a population density of 549 per sq•km.The district consists of 14 blocks and 507 villages.The blocks are shown in Figure 1.This district is situated in the geographical coordinates of north latitude between 10 and 11-30 and east longitude between 77-45 and 78-50 .The number of study roads and the minimum and the maximum lengths of the roads in each block are given in Table 1.

Data Collection
Pavement condition data is a prerequisite for the development of pavement management system.Extensive visual condition survey of the PMGSY roads that were constructed prior to 2008 was carried out for a period of three years.The age of the pavement sections varied from seven to nine years.The pro forma for the pavement data collection is shown in Figure 4.The different pavement distress data considered during the visual survey are (i) pothole, (ii)  camber, (iii) crack, and (iv) edge break.The condition of the shoulder is another factor that plays an important role on the performance.The rating for the shoulder is based on the condition and the slope of the shoulder as well as the vegetation present over the section.The condition of the side drainage, the shape and side slope of the drain, and the presence of silt in the side drain are taken into consideration while assigning the rating.The condition of the cross drainage structures has an effect on the condition of road.These cross drainage structures are rated based on the settlement and erosion of these structures as well as closure of openings of these structures due to silts and other debris.The data is collected for every 200 m length of pavement stretch.The brief description of the above parameters are given in the subsequent section and the list of indices used is given in Table 2.  resulting in a poor bituminous surfacing which in turn leads to development of potholes.A rating system has been developed based on the number of potholes of 10 cm × 10 cm size per 200 m section and is shown in Table 3.

Camber.
Camber is the cross slope provided to the road surface in the transverse direction to drain off the rain water from the pavement surface.Stagnant water can cause the bitumen to lose its adhesive property and make it to wear off.Proper drainage and quick disposal of water from the pavement surface is important for longevity of the pavement.The camber should be optimal.An excessive camber is a hindrance to driving and causes problems at intersections and to slow moving and heavily loaded vehicles like trucks.Table 4 shows the rating for the camber considered during the survey of the roads.

Crack.
Wide and extensive cracks on pavement surface affects the riding quality of the pavement.Cracks can be of various types like alligator cracks, longitudinal cracks, cracks due to shear failure, and so forth.Table 5 indicates the rating provided based on the percentage of cracked area on the total paved surface.

Edge Break.
Edge breaks are considered as a parameter affecting the performance of the pavement.Edge breaks contract the paved area and reduce the operating speeds.The rating scale, shown in Table 6, is based on the total cumulative length of the edge breaks in a pavement section of 200 m.

Condition of the Shoulder.
Shoulders serve as an emergency lane for vehicle compelled to be taken out of pavement or roadway.While rating the condition of the shoulder, the slope of the shoulder is considered as an important factor as effective drainage depends on the cross slope of the shoulders.The presence of shrubs and vegetation also affects the sight distance in these roads.The rating for the shoulder is based on the condition of the shoulder as well as the vegetation present over the section.The rating adopted in the present investigation is described in Table 7.  factors.The first factor consists of the size, shape, and side slope of the drain.The second factor is the presence of silt in the side drain.Tables 8(a) and 8(b) show the rating scale adopted for the side drains.element in these road sections.Condition of these cross drainage elements affects the condition of road.These cross drainage structures are rated based on the settlement and erosion of these structures as well as closure of the openings of these structures due to silts and other debris.The details are shown in Tables 9(a) and 9(b).

Visual Condition Index
From the above visual survey data, 550 sections of 200 m each were selected for further analysis.The data collected for three year period was used in the analysis.factor analysis results, the four factors were identified which logically associate the twelve measures of pavement condition presented in Table 2.The factor loadings (correlation between the original variables and the factors) are shown in Table 10.For clarity, the highest factor loading for each indicator across all factors is shown in bold letters.Data on 12 indicators of pavement conditions were factor analyzed (Principal components, Varimax rotation) and four factors were extracted.These four factors were identified as distress factor (factor I), side drainage factor (factor II), Shoulder vegetation factor (factor III), and cross drainage factor (factor IV).A total variance of 77.8 percent was explained by these factors.The index numbers were standardized to the scale of 0 to 100 for ease of comparison across all sections and named as Visual Condition Index (VCI).The VCI of zero indicates poor condition and VCI 100 indicates good condition.The detailed procedure of computation of VCI is beyond the scope of this paper.The paper on VCI formulation by the same authors is in press [9].

Cluster Analysis-Multivariate Data Analysis Tool
While studying the output from the different visual condition survey conducted in the Tiruchirappalli District, a large number of basic data units (pavement sections) was encountered.These basic data units have to be classified on the basis of some homogeneity or similarity of various attributes chosen.Cluster analysis is a class of techniques used to classify cases into groups that are relatively homogeneous within themselves and heterogeneous between each other, on the basis of a defined set of variables [10].Its objective is to group data units and/or variables into clusters such that the elements within a cluster have a high degree of "natural association", while the clusters themselves are relatively distinct.Two basic philosophies followed while considering cluster analysis are as follows: Hierarchical Clustering.In this system, the data are not partitioned into a particular number of clusters in a single step.Data units are consecutively grouped or divided to form new clusters.
Nonhierarchical Clustering.This method is designed to cluster data units into a single classification of k clusters where k is either specified a priori or determined as a part of the method.

Nonhierarchical Clustering. Non-hierarchical clustering
technique was used in the present study.Over the years, a number of researchers have investigated methods to support identification of clusters in data.Due to its simplicity, k-means algorithm is one of the most extensively used methods.K-means is an algorithm based on the concept of choosing a preliminary set of centroids and assigning each point to the nearest centroid.Once the initial clusters are determined, new centroids are calculated and the points are again assigned to the nearest centroid.This process is repeated until optimal boundaries for each cluster are determined [11].In the present study, Squared Euclidean distance is used as the measure of dissimilarity.The inverse of the distance can be presented as a measure of similarity or proximity.Euclidean distance can be defined as the sum of the squared differences over all of the variables, that is, D i j between two cases, i and j with variable values (x i1 , x i2 , . . ., x ik ) and (x j1 , x j2 , . . ., x jk ) is defined by The objective of k-means is to minimise the squared Euclidean distance between the data points and the centroid of the cluster.The advantage of the k-means approach is that more specific grouping hypotheses can be evaluated and that the overall procedure can be performed more quickly than agglomerative approaches.In terms of computer programmes, cluster analysis is such a long-standing and popular technique that it is rare to find a commercial multivariate statistical package that does not include it in some form.
These ranges from inexpensive plug-ins for MS-Excel (e.g., UNISTAT, XL-STAT, StatistiXL) to sophisticated stand-alone packages.In this work, all calculations were carried out using an add-on function to Excel found in the XL-STAT.

Data for K-Means
Clustering.The data for the cluster analysis is the twelve pavement condition indices listed in Table 2.The data were collected for three years.For each section, 36 attributes (12 indices × 3 years) are considered.For clustering of the 550 sections, these 36 attributes are used.The attributes are weighted by the rotated factor loadings (specified in bold letters) shown in Table 10.The k-means clustering was done for a number of classes varying from 2 to 10.The within class variance, the between class variance, and the lowest number of sections in the clusters are given in Table 11.The variation of within class variance for the different cluster groups is shown in Figure 2. The minimum number of sections in any cluster is fixed as 28, which is 5% of the total sections.Considering the above two criteria, the number of clusters in k-means is considered as 5 for further analysis.The within class variance and the between class variance for the cluster group 5 are 18.195 and 12.09, respectively, and the minimum number of section is 28.

Results of K-Means
Clustering.In the 5 cluster group, the clusters are numbered as 1 to 5 according to the size of the cluster.Among the five cluster groups, cluster 1 consists of 150 sections, cluster 2 with 148 sections, 3 with 127 sections, 4 with 97 sections, and 5 with 28 sections.The within group variance of these five clusters varies from 12.312 to 28.872.The average and the maximum distance from the centroid of these clusters are also given in Table 12.The cluster centroids for each cluster group varies for all the attributes.The centroid of the clusters for each of these attributes is shown in Table 13.As already mentioned, there are 36 variables in the cluster analysis.The variable LDS 1 indicates left drainage shape in the first year of survey, that is, in 2009.Similarly, LDS 2 and LDS 3 represent the left

Pavement Deterioration Models
The prediction of pavement condition is important in any pavement management system.Many pavement prediction models like regression, Markov prediction models, stochastic models, and so forth, are generally available.In this work, regression models are used.Separate models are developed for each of the identified clusters.To compare the effect of clustering on the deterioration models, a model is developed for all the sections without clustering.The models would help to optimize the scheduling of the rehabilitation activities and to determine the funding level required to achieve a predetermined level of performance.The rate of deterioration of the pavement depends on the age of the pavement and the condition of the pavement.Hence, the parameters considered for projecting the condition of the pavement are the age of the pavement and the condition of the pavement in the previous year.Multiple linear regression analysis is used where there is one dependent variable and two or more independent variables.Separate linear regression models were developed for different cluster groups.The general form of the equation is where VCI in = Visual Condition Index of the pavement section i in year n, VCI i(n−1) = Visual Condition Index of the pavement section i in year (n − 1), and age in = age of the pavement section i in year n.The coefficients of (2) obtained for different clusters and without cluster model are given in Table 15.
To illustrate the effect of cluster models, a section with PCI 100 with the year of construction as 2011 is assumed.The deterioration of the section if it is in different clusters is shown in Figure 3(a).A graph is also drawn for a particular  From the above models given in Table 15 and from the graphs in Figures 3(a) and 3(b) it is well established that grouping all the pavement sections in one basket and developing a single model for all the road sections will result in either underestimation or overestimation of the pavement condition.

Conclusions
In this paper the pavement sections are clustered using kmeans clustering technique.The attributes considered are the condition of shoulders, drains, cross drainage structures, and camber, and pavement distresses, namely, potholes, crack area, and edge break, collected at every 200 m section.Five clusters were chosen considering the criteria of minimum number of sections and the variance of the clusters.Separate models were developed for each of these clusters and also one model is developed for the entire road sections without clustering.From the models and the examples shown in above sections, the need and the importance of clustering of road sections are highlighted.

Figure 1 :
Figure 1: Block wise map of Tiruchirappalli District.

Figure 4 :
Figure 4: Pro forma for pavement data collection.
Cluster.One of the assumptions in k-means clustering is deciding the number of clusters.In kmeans clusters, the numbers of clusters are to be specified apriori.In this work, the numbers of clusters are decided based on two criteria.(i) Minimising within class variance or maximising between class variance of clusters.(ii) Minimum number of sections in any class should not be less than 5% of the total sample.
for the without cluster model in Figure3(a).Similarly, Figure3(b) shows the deterioration models with clusters and without clusters for the VCI value of 80 in 2011 with the year of construction as 2005.In the Figures3(a) and 3(b), the notations k1 to k5 indicate the graphs for the models of clusters 1 to 5, respectively, and NC represents without cluster model.If the projection of the pavement sections is done according to the without cluster model, then the prediction of the pavement condition will be either higher or lower than the original VCI value.

Table 1 :
Number of study roads selected in each block of Tiruchirappalli District.

Table 2 :
List of indices.

Table 5 :
Rating for cracked area of paved surface.

Table 6 :
Rating for edge break.

Table 7 :
Rating for shoulder condition.

Table 8 :
(a) Rating for shape of side drain and (b) rating for the amount of silts and debris.

Table 9 :
(a) Rating for cross-drainage structure and (b) rating for the settlement of cross-drainage structure.
The pavement sections are usually rated based on Pavement Condition Index (PCI), Pavement Serviceability Index (PSI), and so forth.In the present study, as the parameters considered are different from the usual parameters required for PCI or PSI calculations, an index called Visual Condition Index (VCI) is formulated by the authors using Factor Analysis.From the

Table 11 :
Within class variance and between class variance.

Table 14 :
Distances between the cluster centroids.

Table 15 :
Deterioration model coefficients for K-means clusters.