Simultaneous Generation of Optimum Pavement Clusters and Associated Performance Models

With regard to developing pavement performance models (PPMs), the existing state-of-the-art proposes Clusterwise Linear Regression (CLR) to determine the pavement clusters and associated PPMs simultaneously. However, the approach does not determine optimal clustering to minimize error; that is, the number of clusters and explanatory variables are prespecified to determine the corresponding coefficients of the PPMs. In addition, existing formulations do no address issues associated with overfitting as there is no limit to include parameters in the model. In order to address this limitation, this paper proposes a mathematical program within the CLR approach to determine simultaneously (1) an optimal number of clusters, (2) assignment of segments into clusters, and (3) regression coefficients for all prespecified explanatory variables required to minimize the estimation error. The Bayesian Information Criteria is proposed to limit the number of optimal clusters. A simulated annealing coupled with ordinary least squares was used to solve the mathematical program.


Introduction
Typically, pavement performance models (PPMs) are developed using a two-step approach. First, pavement segments with similar characteristics are grouped into clusters using a few critical factors, such as pavement type, age, and traffic volume. Then, PPMs for each of the clusters are developed using statistical techniques. The objective of clustering is to group the pavement segments that perform similarly over time. However, in practice, the performances of pavement segments within a cluster differ significantly because clusters are formed using only a few critical factors [1,2]. A major challenge is to select characteristics that define clusters and the corresponding segments associated with them [3]. If inappropriate characteristics are used, clusters may include homogeneous segments with different performance behavior or heterogeneous segments with similar performance behavior [4]. The prediction accuracy of PPMs can be improved by subdividing the pavement segments into more uniform clusters. However, this subdivision is not always possible due to limited information [1].
In contrast, Figure 1(c) illustrates that segments "SR445 (SB), MP: 40-39" and "SR156 (WB), MP: 3-2" had homogeneous performance behavior. Similarly, segments "SR892 (SB), MP: 25-24" and "SR445 (NB), MP: 36-37" showed a consistent behavior (see Figure 1 could include subgrade type, traffic loading characteristics, or any hidden effects. To address the limitations of the two-step approach, Spath [5] proposed using Clusterwise Linear Regression (CLR) to determine clusters and associated regression models simultaneously. CLR assigns pavement segments that have similar regression effects on the pavement performance such that the overall sum of squared errors is minimal. Hence, CLR minimizes the overall prediction error by simultaneously determining the explanatory variables' coefficients and assigning each pavement segment into an appropriate cluster.
In the context of pavement management, CLR first was used by Luo and Chou [1] to model the deterioration of pavement conditions. First, the pavement network was clustered by using a few critical pavement characteristics. The subdivision was performed at the data-point level; that is, data points collected over various years for a pavement segment could be assigned to multiple clusters. Hence, there was a chance of pavement segments being associated with multiple performance models. An additional step was proposed to predict performance using the results from multiple models. Later, Luo and Yin [2] expanded their research, using CLR to formulate the development of pavement distresses. Both studies [1,2] used pavement age as the only explanatory variable.
To address some of the limitations present in the studies completed by Luo and Chou [1] and Luo and Yin [2], Zhang and Durango-Cohen [6] developed CLR models using Mathematical Problems in Engineering 3 multiple explanatory variables. Data used in the study were collected during the AASHO Road Test [7], conducted in the late 1950s in Ottawa, Illinois. The data were collected from a single site in a relatively controlled environment; clearly, the site characteristics were not representative of all other locations. For example, the data did not represent the varieties of soil and climatic conditions. In addition, construction techniques and materials have changed substantially since this test was conducted. In this study, the number of clusters was determined manually using an "elbow" criterion. Experiments were run only for a number of clusters equal to 1, 5, 10, 15, 20, and 25. One possible reason for not examining all the possible number of clusters might have been large amount of required computational time given that an exchange algorithm was utilized with 100 instances with various initial assignments.
The previous CLR methods used to estimate PPMs do not test the explanatory power of variables in both cluster and regression analyses. That is, all the explanatory variables used in regression models are assumed to be significant. The effects of any insignificant explanatory variables on the dependent variable are also accounted for during assignment of pavement segments into clusters and estimation of the regression coefficients. The presence of any insignificant explanatory variables distorts the underlying regression effects of other significant explanatory variables. This may lead to an incorrect assignment of pavement segments into clusters and estimation of PPMs. Khadka and Paz [8] developed PPMs using CLR while testing the significance of the explanatory variables and obtained superior results relative to the case when significance was assumed.
Another key limitation of the existing CLR is the need to prespecify the number of clusters. In order to avoid prespecifying the number of required clusters, this study proposes a mathematical program to simultaneously determine an optimal number of clusters, the assignment of segments into clusters, and regression coefficients for all explanatory variables. This mathematical program is flexible enough to handle multiple explanatory variables, multiple observations per pavement segments, and user-defined constraints on cluster characteristics.
In previous studies using CLR for pavement management [1,2,6], the objective function was to minimize the sum of squared errors of prediction (SSE). It is intuitive that SSE decreases monotonically as the number of clusters increases. That is, for a given dataset, the optimum number of clusters always is the total number of data points [9]. An optimum number of clusters are equal to the total number of pavement segments, and each pavement segment is the sole member of its own cluster. Such clustering structure is unlikely to provide statistical reliable models. In addition, SSE always decreases when a new explanatory variable is added to the model [10]. Usually, this leads to an overfitting problem [11]. Therefore, SSE is not the best objective function to use for searching an optimal number of clusters.
Even though SSE decreases as the number of clusters increases, the rate of improvement diminishes significantly after a relative optimal [12] number of clusters known as the elbow point. Increasing the number of clusters beyond the elbow point provides a very small reduction in SSE. An SSE versus the number of cluster curve might not exhibit an elbow point distinctly in all cases. Hence, it could be very challenging to choose the right number of clusters.
To address these limitations, this study extended the existing CLR framework by simultaneously (i) incorporating the Bayesian Information Criteria (BIC) [13] as the objective function, (ii) finding the optimal number of clusters, and (iii) finding the maximum number of clusters as required for model estimation. The BIC penalizes more for the inclusion of additional parameters relative to the Akaike Information Criteria (AIC) [14]. The BIC selects simpler models than the AIC when the sample size is greater than eight [15]. Hence, the optimal number of clusters found using BIC provides a balance between model complexity and goodness of fit [16]. On the other hand, several studies showed that the number of parameters in a model selected using AIC was overestimated [14,[17][18][19]. The search process for the best model specification using BIC has the property of consistency, which asymptotically selects this model with a probability of "1" [20][21][22][23]. Galimberti et al. [24] used BIC to propose a unified framework for model-based clustering, linear regression, and multiple cluster structure detection.
In this study, the data limitations in the existing literature were addressed using actual field data collected across a variety of environmental, traffic, design, construction, and maintenance conditions. Pavement data collected for the past 12 years over the entire State of Nevada were used. This data included significant variations across a large range of characteristics, e.g., pavement segments exposed to either extreme desert heat or to very low winter temperatures in the mountains.

Methodology
. . Problem Formulation. This section includes the definition of a pavement sample, notation and terms, proposed mathematical program, and a procedure to find upper bound of the range of the feasible number of clusters.
. . . Definition of a Pavement Sample. The condition of a pavement segment improves when intervention occurs by applying an M&R treatment. Such intervention alters the physical characteristics of the pavement. Hence, the performance of a pavement before and after the intervention differs, even though all other contributing factors remain constant. In this circumstance, the same pavement segment before and after intervention should be treated as two different samples. Given that the physical location of a pavement segment is the same, a different identifier is required to distinguish the set of consecutive observations before and after the intervention. In this study, the term pavement sample is used as an identifier to uniquely represent the set of consecutive observations that accounts for historical interventions made on a pavement segment. Figure 2 provides a simplified depiction of how consecutive observations of a pavement segment are divided into two pavement samples.
In this study, a pavement sample, instead of pavement segment, was considered as a single entity during cluster analysis. Hence, if a pavement segment consists of two or more pavement samples, these samples could be assigned to different clusters.
. . . Notation and Terms. The following notation and terms are used in describing the proposed model: . . . Mathematical Program. A major aspect of model development is to identify variables that explain an actual physical process. Pavement performance models (PPMs) should only include explanatory variables that affect pavement deterioration. In this study, explanatory variables from the existing literature were selected. For example, pavement age, traffic and environmental conditions, and structural and material properties of pavement were considered as factors affecting pavement performance [25][26][27][28].
PSI was chosen as the dependent variable, . PSI serves as a unified standard and is widely accepted for evaluating pavement performance, especially in terms of ride quality [29][30][31]. In addition, PSI reflects the human rider's response and is understood by highway users and legislators [32]. The adopted functional form for the regression model is expressed as This study proposes a mixed-integer, nonlinear mathematical program to determine an optimal number of clusters, assignment of segments into clusters, and regression coefficients for all prespecified explanatory variables. The problem was defined by the optimum number of clusters, ; the number of predefined explanatory variables, ; the number of pavement samples to be clustered, ; and the number of observation periods, 푖 , associated with each pavement sample. The formulation partitions pavement samples into an optimum number of clusters, with a PPM model fit to each cluster.

Mathematical Problems in Engineering 5
The objective function involved minimization of the BIC across clusters as expressed by .
= + * ln (2 ) + * ln ( ) Given that BIC is an increasing function of SSE and free parameters (the number of clusters and regression coefficients) to be estimated, a clustering with the lowest BIC provides the optimal solution. Minimizing BIC reduces unexplained variation in the dependent variable [13]. The total SSE was calculated using Each cluster was associated with a linear regression model with predefined explanatory variables. Deviations of predicted statistics from actual data were calculated separately for each cluster and summed to obtain the total SSE. Decision variables to be determined were the coefficients for all prespecified explanatory variables, 푘 and 푗푘 ; the optimum number of clusters, ; and the cluster membership, The following constraints were imposed to describe the proposed problem: Equations (4) and (5) ensure that each pavement sample was assigned to exactly one cluster. The indicator 푖푘 equals 1 if and only if a pavement sample belongs to cluster . Otherwise, it takes a value of zero. Equation (6) describes a minimum-size constraint, which is imposed to ensure sufficient observations in each cluster for a statistically reliable estimation of coefficients. The total number of observations in a cluster is required to be no less than the minimum number of observations, .
Equations (7) and (8) are imposed to determine the maximum number of potential clusters. If a pavement sample has more than observations, regression over these observations could generate statistically reliable estimates of the coefficients. Hence, a cluster could be formed with only one pavement sample that has more than observations. If all pavement samples have more than observations, each pavement sample could form a cluster. In this case, the maximum number of clusters would be the total number of pavement samples, . However, in reality, it is possible that none of the pavement samples would have more than observations. In this case, samples would be grouped to form a cluster at the sample level, but not at the observation level; that is, observations of a sample must not be assigned to more than one cluster. Equation (8) determines the maximum number of potential clusters in both cases. Function F in this constraint represents the following algorithm to determine 푚푎푥 . A flowchart for this algorithm is provided in Figure 3.

. . . Proposed Algorithm to Determine 푚푎푥
Step . If the total number of observations, , is less than the minimum number of observations required to form a cluster, , then set 푚푎푥 = zero and go to Step 6. Otherwise, create a matrix, M, of size ( 푚푎푥 x 2) with the following elements, where 푚푎푥 is maximum number of observations of a pavement segment(s) in the dataset: Step . If any segment has a number of observations greater than or equal to ( 휏,1 ≥ ), then (a) calculate 푚푎푥 = ∑ 휏≥푛 휏,2 (b) update 휏,2 with 0 for ≥ Otherwise, go to Step 3 to find the maximum number of clusters that could be formed.
Step . If the matrix M has all zeros in its second column (∑ 휏 휏,2 = 0), then return 푚푎푥 = 푚푎푥 and go to Step 6. Otherwise, (a) update M by removing all rows that have the number of segments equal to zero ( 휏,2 = 0) (b) initialize two indices as = = number of rows in M (c) make a copy of M and let it be represented by M (d) if the remaining total number of observations (∑ 휔 휏=1 휏,1 * 휏,2 ) is less than , then, 푚푎푥 = 푚푎푥 , and go to Step 6. Otherwise, initialize with the value of 휔,1 and 휔,2 = 휔,2 − 1 Step . Repeat the following steps until = .
Step . Return the current value of 푚푎푥 and stop.
. . Solution to the Mathematical Program. Simulated annealing (SA) coupled with an ordinary least square (OLS) algorithm was implemented to solve the above mathematical program. SA was used to cluster the dataset estimate, 푖푘 . For each accepted neighborhood cluster, OLS was utilized to estimate the regression coefficients, 푘 and 푗푘 . The fitting linear models (lm) function, available in the statistical software, R, was used to estimate these coefficients [33]. DeSarbo et al. [34] successfully implemented such an algorithm to solve the CLR problem.
The algorithm utilized to solve the clusterwise multiple linear regression is described as follows, and illustrated in Figure 4.
Step . . Set values of initial temperature ( 0 ), final minimum temperature ( 푚푖푛 ), cooling rate ( ), and the maximum number of neighbors to be generated ( 푚푎푥 ) at each temperature level. Set the iterator = 1.
Step . Calculate maximum number of potential clusters, 푚푎푥 , utilizing function F as described above, as part of Constraint 8.
Step . Initial estimation of regression coefficients: Step . . For a given number of clusters, , randomly assign cluster memberships to all pavement samples.
Step . . Count the number of observations of all pavement samples assigned to each cluster. If all clusters have at least observations, then go to Step 4; otherwise, reassign the cluster memberships until all clusters have at least observations. Let 푁 퐾 be the valid initial clusters.
Step . . Estimate 푘 and 푗푘 for all clusters using OLS.
Step . Generate a set of neighborhood clusters near to the previous one, using the following steps.
Step . . Randomly select a prespecified number of pavement samples ( 푝푠 ) to change their memberships.
Step . . For each of the samples selected, assign a new membership by generating a random number ∼ (1, ). If the new membership is same as the previous one, regenerate a random number 耠 ∼ (1, ) until it is different. Repeat this process until the memberships of all the selected pavement samples are different from those that were previously assigned.
Step . . Count the total number of observations of all pavement samples assigned to each cluster.
Step . . If all clusters have at least observations, go to Step 6; otherwise, repeat Steps 5.1., 5.2., and 5.3. until all clusters have at least observations. Let 푁+1 퐾 be a new set of valid neighborhood clusters.
Step . Search for a solution: Step . . For 푁 퐾+1 , estimate new 푘 and 푗푘 for all clusters using OLS.
Step . Counter and temperature update: Step 1. Initialization Step 2. Maximum number of potential clusters • Calculate the maximum number of potential clusters, K max Step 3. Initial estimation of regression coefficients • Randomly generate valid initial clusters, C N K • Estimate  0k and  jk for all clusters using OLS Step 4. Evaluation of an objective function • Evaluate the objective function BIC N K Step 5. Generation of a set of neighborhood clusters • Randomly generate a set of valid neighborhood clusters, C N+1 K • For C N+1 K , estimate  0k and  jk using OLS • Evaluate the objective function BIC N+1 Generate a random number u ~U (0,1) Step . . Repeat Steps 5 and 6 for 푚푎푥 times.
Step . . If < 푚푖푛 , stop the algorithm. Otherwise, reduce the temperature by multiplying the current temperature by the prespecified cooling rate, , set = 1, and go to Step 5.
Step . Stopping criteria: Step . . Update 푚푖푛 with the smallest between the one obtained in Step 7 and the current 푚푖푛 . Set 표푝푡푖푚푎푙 equal to .
Step . . Repeat Steps 3 to 7 for 푚푎푥 − 1 times. This algorithm seeks solutions using a probabilistic approach. The algorithm starts with a high temperature, , and a high probability of accepting a worse solution, 푎푐푐푒푝푡 . This enables occasional "uphill" moves, which help escape from the local minima. The algorithm builds up a rough view of the search space by moving with large step lengths. As drops, 푎푐푐푒푝푡 decreases to behave more closely as a greedy algorithm, with small step lengths slowly focusing on the most promising solution space. Theoretical studies have shown that, with infinitely slow cooling, the algorithm converges to a global minimum [35].

Experiment and Results
. . Data Resources. Data used in this study were extracted from the PMS database of NDOT [36]. The data consisted of various classes, location data, segment data, contract data, environmental data, traffic data, and pavement condition data, collected for flexible pavement in the entire State of Nevada.
A detailed data analysis was performed to check for inconsistent and missing information in the dataset, and some were found. Some of the missing data were synthesized based on associated information available in the dataset. In preparing the PMS data, the following filters were applied: (i) Only one-mile segments were selected for consistency.
(ii) Only pavement segments with the most recent maintenance contracts awarded in 2001 or later were used in the study.
(iii) PSI of a pavement should deteriorate over time if no M&R treatment occurred. If PSI of a segment in any year increased by 0.1 or more points from the previous year without any M&R treatment, all observations for that year were excluded from the analysis. However, if an increase in PSI in any year was less than 0.1 from the previous year, it was assumed to be a random error during the process of pavement evaluation or data processing. Therefore, those observations were included in the analysis.
(iv) If the PSI of any year decreased by one or more points from the previous year, all observations for that year were excluded from the analysis.
(v) In practice, the PSI range is between 4.5 and 1.5. Therefore, if a pavement segment had a PSI beyond these limits in any year, it was considered an outlier, and all observations for that year were excluded.
(vi) Only PSI values used were within the interval of the mean, minus three standard deviations to the mean plus three standard deviations.
(vii) Pavement samples that did not consist of data regarding conditions for at least two consecutive years were excluded.
(viii) Data analysis showed that an improvement in PSI was seen one or two years after the contract award date. Hence, the age of the pavement sample was set to 0 when the actual improvement occurred rather than when the contract was awarded.
After data preparation was completed, 4,138 flexible pavement samples with 17,642 observations were available. For CLR modelling, 14,637 observations, collected from 2001 to 2010, were used; the remaining 3,005 observations, collected in 2011 and 2012 (about 17% of the total number), were used as test dataset to check the accuracy of the CLR models. Tables 1 and 2 provide the descriptive statistics of continuous and categorical variables used in this paper, respectively.
. . Parameters of the Algorithm. Performance of the SA algorithm generally depends on the values of the optimization parameters utilized for a given problem. To ensure proper initialization and search for optimal solutions, selection of the most appropriate parameter values is critical [37,38]. A body of literature exists regarding various methodologies for finding the most appropriate values for annealing parameters in SA [37,[39][40][41][42][43].
If an SA algorithm is allowed to run for a sufficiently long time by setting a high initial temperature with a slow cooling rate, the algorithm performs well, as shown by Anily and Federgruen [44]. In such a cooling scheme, the selection of the most appropriate parameter values may not be critical. However, computation time cannot always be ignored. Hence, the algorithm has to find a good solution in a reasonable amount of time [39].
Effective values to be assigned to the optimization parameters depend on the type and complexity of the problem. These values may not be obvious to determine, but rather might be determined by trial and error for a given problem [40]. In this study, values assigned to the optimization parameters were determined using experience gained from previous research [45][46][47][48][49][50][51] that involved SA and other comparable algorithms. Table 3 lists the parameter values used in this study.
The minimum number of observations required in a cluster, , is one of the parameters to be defined by the analyst for each dataset and application as it is required for any other statistical analysis regarding the minimum sample size. That is, the proposed methodology and contributions are not restricted or affected by the sample size. The analyst must have available sufficient data for each cluster to be able to obtain reliable estimates. A too small will result in statistically unreliable models and a potentially time-consuming search process as the maximum number of feasible clusters; 푚푎푥 will be very large. In contrast, a too large will result in insufficient number of clusters required to provide the optimum goodness of fit. Hence, sensitivity analysis is required to achieve balance between reliable estimates and 푚푎푥 .
. . Results and Discussion. Given the constraints for feasible partitions defined in the problem formulation and the minimum number of observations required in a cluster, = 800, the proposed algorithm determined 16 as the maximum    Table 4. Figure 5(a) shows the smallest BIC for each of the clusters ( = 2 to 16) considered in this experiment. Figure 5(b) shows the trajectory of the objective function, BIC, when the CLR models were used. The initial value of BIC was 8,502. After 1,360 iterations, the BIC decreased to 3,008. This change was equivalent to an improvement of 65%.
It was observed that not all coefficients had associated p values less than 0.05. In this study, the significance level was considered to be 5%. As expected, coefficients differed in magnitude and sign across the clusters, which indicated that the deterioration patterns of pavement samples varied among the clusters. However, seven explanatory variables had the same sign across all clusters.
Different clusters had different numbers of significant explanatory variables. For example, Cluster 2 had 10 insignificant explanatory variables. In addition, among all seven clusters, five variables, age, adt, rut depth, category= , and category= , were significant. However, four variables, trucks, elevation, precip, and category= , were not significant in four different clusters.
Trucks play a key role in pavement deterioration because they transfer heavy loads to the pavement [52]. Hence, it is expected that variable "trucks" be significant. There could be various reasons precluding the statistical significance of this variable. Two of them include (i) multicollinearity effects, which are not addressed in the existing CLR literature and motivates the expansion of the framework as recommend in this paper under future research and (ii) different lanes mistakenly used to collect pavement performance and truck traffic data. The data used in this study does not include the lane used to collect the information.
The performance of the proposed CLR approach was compared with that of the existing CLR for pavement management. That is, in this section models estimated using the proposed approach were compared with those estimated using the existing CLR [6] described in the Introduction. Experiments using the existing CLR approach were run for all feasible clusters ( = 2 to 16). Figure 5(c) shows the smallest SEE for each of these clusters. As expected, SSE decreased with an increasing number of clusters, but at a very small rate after = 11. In this case, Figure 5(c) does not exhibit a clear elbow point. Hence, an optimum number of clusters needed to be decided by visual inspection while considering the trade-off between goodness of fit and model complexity (i.e., of the number of models and the explanatory variables). This inherent subjectivity when choosing an optimum number of clusters is a major drawback for the existing state-of-the-art CLR approach.
After careful assessment, 11-cluster CLR models were selected as the optimum solution. Figure 5(d) shows the trajectory of SSE, when 11-cluster CLR models were used, and Table 5 provides the corresponding regression coefficients. Similar to the results obtained from the proposed CLR approach, the coefficients differed in terms of magnitude and sign. In addition, some coefficients had p values larger than 0.05.
The BIC for these models are provided in Tables 4 and 5. To compare the goodness of fit, overall BIC values were calculated. The overall BIC for the seven-cluster models obtained from the proposed approach was 3,008, whereas the BIC for 11-cluster models, obtained from the existing approach, was 3,171. This difference was the result of similar or better explanatory power provided by the proposed approach with seven clusters versus 11. That is, the more clusters, the more coefficients for explanatory variables needed to be estimated for a similar goodness of fit; thus, the BIC increased.
It is observed that the individual BIC for the 11-cluster are slightly smaller than those for the seven-cluster models. Similar to SSE, the more the clusters are used, the smaller the individual model BIC is. However, this does not necessarily translate into a smaller overall BIC.

. . Model Assessment and Validation
. . . Potential Overfitting. Brusco et al. [53] noted that clusterwise regression models have great potential for overfitting. Often, a variation in the response variable is governed by clustering. Hence, they recommend investigating the potential presence of overfitting in CLR models. This study adopted a procedure proposed by Brusco et al. [53] and the test dataset to diagnose overfitting. For the optimum sevencluster models, the total sum of squares (TSS) was 2,419, the between-clusters sum of squares (BCSS) was 30, the within-clusters sum of squares (WCSS) was 2,389, the sum of squares due to regression (SSR) was 1,456, and the SSE was 933. The BCSS was around 1% of TSS and SSR was 62% of WCSS. These results indicated that there was no overfitting, as most of the variation in PSI was explained by within-cluster regressions. SSE accounted for 38% of TSS, which suggests that the models still have a relatively high rate of errors. A nonlinear functional form should be investigated to reduce the existing errors.
. . . Model Accuracy. The accuracy of the models obtained from both approaches was assessed by calculating the overall root-mean-square error (RMSE), as follows: where, 푘 푖푡 = the observed PSÎ 푘 푖푡 = the predicted PSI = the number of predictions Both models were applied to the test dataset. Memberships of the pavement samples were assigned by mapping the sample IDs and memberships determined by the CLR models. Associated regression models and observed data were used to estimate the PSIs. Predicted PSIs then were compared with the observed PSIs, as shown in Figure 6. Results indicate   that both CLR models overestimated the PSI. A possible reason might be the existence of multicollinearity among explanatory variables. The RMSE for 2011 and 2012 were calculated for models obtained using both approaches. The RMSE were, respectively, 0.429 and 0.439 for the 11-cluster models estimated using the existing state-of-the-art and the seven-cluster models estimated using the proposed CLR approach. The prediction accuracy of models estimated by the existing state-of-the-art was slightly higher than that of the model estimated by the proposed approach. However, this difference in accuracy is very small. The additional four clusters required 100 parameters corresponding to twenty-five explanatory variables including the intercept. Hence, sevencluster models estimated using the proposed approach were more parsimonious and preferred over the 11-cluster models.

Conclusions
This study proposed and implemented a clusterwise multiple linear regression to develop pavement performance models. A mixed-integer nonlinear mathematical program was formulated to explain the problem. The CLR approach simultaneously divided pavement samples into an optimum number of clusters, and estimated a PPM for each cluster.
In the experiments, various environmental factors were considered as potential explanatory variables, including elevation, annual precipitation, average minimum and maximum temperatures, the number of wet days, and freeze and thaw cycles. The proposed approach enabled consideration of other types of variables, such as economic and social factors. Formulation of the mathematical program developed in this study supports a number of explanatory variables, multiple observations per pavement segment, and userdefined constraints on cluster characteristics.
A simulated annealing coupled with OLS was used to solve the mathematical program. For the data used in the experiments, the algorithm found that 7-cluster models provided the optimum solution. Results obtained from the proposed CLR models were compared with results obtained from the state-of-the-art approach. This comparison showed that the proposed CLR approach performed better than the state-of-the-art approach in predicting the PSI of pavement samples.
The analysis showed that overfitting was not an issue for the resulting clusters and regression models. As expected, the use of the BIC as an objective function to determine the best model specification provided a more parsimonious structure compared with that obtained using SSE. This was a consequence of the consistency property of the BIC [10, 20-23].

Future Work
This study did not address all limitations of the state-of-theart CLR approach that were discussed in the Introduction of this paper. The following are potential extensions. One limitation in this study is related to the error that likely is caused by including insignificant explanatory variables during clustering and regression analyses. The proposed formulation needs to be extended to include only significant variables. Hence, the CLR models would not be restricted only to prespecified explanatory variables. Instead, the models could include cluster-specific significant explanatory variables. In addition, the multicollinearity among explanatory variables should be investigated to exclude highly correlated variables during the model estimation process.
The proposed mathematical formulation was limited to a linear functional form for PPMs. Luo and Chau [1] implemented a CLR approach using an exponential form. However, their study used only pavement age as an explanatory variable. An interesting aspect worthy of investigation would be to explore utilizing the proposed CLR approach by allowing nonlinear relationships between pavement performance measures and multiple explanatory variables.
The proposed SA with the OLS algorithm was designed to search for a global minimum. However, a large amount of computational time was required. Hence, another avenue for future research would be to develop faster and more efficient combinatorial algorithms that could guarantee global optimality.

Data Availability
The Nevada Department of Transportation (NDOT) provided the data that was used for analysis. These data is confidential and should be requested to NDOT.