Prediction of Bridge Component Ratings Using Ordinal Logistic Regression Model

Prediction of bridge component condition is fundamental for well-informed decisions regarding the maintenance, repair, and rehabilitation (MRR) of highway bridges. The National Bridge Inventory (NBI) condition rating is a major source of bridge condition data in the United States. In this study, a type of generalized linear model (GLM), the ordinal logistic statistical model, is presented and compared with the traditional regressionmodel.The proposedmodel is evaluated in terms of reliability (the ability of a model to accurately predict bridge component ratings or the agreement between predictions and actual observations) and model fitness. Five criteria were used for evaluation and comparison: prediction error, bias, accuracy, out-of-range forecasts, Akaike’s Information Criteria (AIC), and log likelihood (LL). In this study, an external validation procedure was developed to quantitatively compare the forecasting power of the models for highway bridge component deterioration. The GLM method described in this study allows modeling ordinal and categorical dependent variable and shows slightly but significantly better model fitness and prediction performance than traditional regression model.


Introduction
The highway bridge system is generally considered an essential part of the US transportation infrastructure.The efficient use of public funds for repairing and maintaining bridges requires an effective bridge asset management framework.Transportation management agencies worldwide have begun to adopt bridge management systems (BMS) to determine the optimum future bridge maintenance, repair, and rehabilitation (MRR) strategy at the lowest possible life-cycle cost based on forecasted bridge conditions [1][2][3][4].Use of various forecasting models has played a critical role in predicting future bridge conditions for decision makers.
In the United States, highway bridge ratings typically consist of three major components: deck, superstructure, and substructure.The current method for monitoring them relies heavily on visual inspections which only take into account the observed physical health of the bridge.During visual inspections, a condition rating of the three major components is given on an integer scale of 0 to 9 with 8 equal interim levels.On this scale, 0 is failure and 9 is near-perfect condition [5].Bridge components deteriorate as a result of operating conditions and external environmental loads [6].Because of the importance of these components for normal operation and safety, prediction models for component conditions are routinely developed to assess the condition of bridges for a given future time span.
Applications of deterioration models such as Markovchains and simulation are gaining popularity in forecasting bridge condition ratings; however they are limited by their inability to provide specific information on the deterioration of an individual infrastructure element [16].In addition, the assumption in the Markovian method that probabilistic deterioration in a given period is independent of history can be unrealistic for bridges [9].Artificial intelligence such as neural networks is advantageous because of their ability to model nonlinearities automatically.Neural networks can handle binary categorical inputs by using 0/1 inputs, but it would be difficult to handle multiple categories that are ordinal in nature.Moreover, neural networks are more of a "black box" method that produces results that are difficult to interpret [16].Multiple linear regression's simplicity and explanatory relationship explains its popularity in literature, but this method may not be appropriate to model bridge condition because it does not take into consideration the ordinal discrete dependent variable.Use of multiple linear regression in that case will result in a violation of the normality assumption [9,16].Logistical methods such as the probit model allow the capture of the latent nature of infrastructure performance and incremental discrete dependent variables but do not adequately account for the multilevel discrete and ordinal nature of bridge ratings [9].
Multinomial regression is a variant of nonlinear regression that is capable of handling discrete dependent variables with multiple levels.However, bridge condition ratings are commonly represented as variables that are both discrete and ordinal in nature.In multinomial logistic regression, values of the dependent variable do not indicate any order or ranking.Ordinal logistic regression is an extension of multinomial regression that is believed to be theoretically appropriate and practically feasible for modeling bridge component rating changes.Those logistic models have been widely adopted in modeling discrete choices in motor vehicle crash severity and, to a lesser degree, in pipeline deterioration and wastewater utility deterioration [16].However, use of the method to model bridge component or element rating changes has rarely been found in previous studies [17].Madanat, Mishalani, and Ibrahim [17] presented an ordered probit method for the estimation of infrastructure deterioration models and associated transition probabilities from condition variables.
Moreover, the accuracy of the decision-making relies heavily on the outcomes of a reliable bridge condition forecasting model ( [2,15]; Lu and Zheng 2017).Many recent researches focus on improving model forecasting accuracy and shed some light on improving forecasting accuracy [2,4,[12][13][14][15].However, many of the researches ignore the forecasting reliability and provide incomplete picture regarding forecasting accuracy.Accuracy reported through previous research are often the statistical relative closeness measurement of model estimation to the actual condition [12,18]; those measurements are critically important to demonstrate statistical soundness of the model's forecasting power; however, they do not provide the full picture of the forecasting capability.For example, for discrete values such as bridge component ratings, estimation closeness along with exact estimation and estimation within certain rating difference will provide more complete pictures.Moreover, many previous researches ignore the forecasting reliability issue [18].Their forecasting model can perform really well with certain data or dependent variables but work really poor with others.Thomas and Sobanjo [12] proposed a semi-Markov chain deterioration model, working really well with pourable joint seal element condition forecasting with relative closeness of 0.981 as 1 being perfect estimation, but working really poor with reinforced concrete abutment element condition forecasting with relative closeness of 0.154.
In this research, the authors will evaluate the model forecasting capability based on various measurements including relative closeness measurements and exact accurate.Moreover, this research will validate the model forecasting power with not only in-sample data but also external data validation for three bridge component ratings.

Objective
In this study, an ordinal logistic regression method was developed to predict network-level bridge component ratings with North Dakota 2012 NBI data.A multiple linear regression model was also developed with the same data set as a reference for comparing model fitness and forecasting skill.The model is not perfectly suited for handling ordinal data as stated earlier; however it can be used for comparison since this type of model is popular within engineers and are straight forward to develop and use.Five criteria were used to evaluate and compare the two models: prediction error, bias, accuracy, out-of-range forecasts, Akaike's Information Criteria (AIC), and log likelihood (LL).The developed model was validated with North Dakota 2013 and 2014 NBT data.The application of the model for predicting MAP-21 bridge performance indicator was conducted and discussed.

Ordinal Logistic Regression
Ordinal logistic regression is used to model the relationship between an ordered multilevel dependent variable and independent variables.In the modeling, values of the dependent variable have a natural order or ranking.One example of ordinal variables is bridge component ratings (ranging from 0 to 9, with 0 being fail and 9 being near-perfect).When the response categories are ordered, in ordinal logistic regression model, the event being modeled not only is having an outcome in a particular category but also preserves information about response categories which are ordered.Ordinal logistic regression models, also known as proportional odds models, utilizing proportional odds, have the following general form [19] shown in ln ( where Y is response variable with k ordered categories; j= 1,2,. ..,k-1; is cumulative probability (  ≤ ) = (  = 1) + (  = 2) + ⋅ ⋅ ⋅ + (  = ) for j=1,2,. ..,k-1.Note  ()  = ( ≤ ) = 1, so it should not be modeled;   are dependent observations which are statistically independent i=1,2,. ..,n;  1 ,  2 , . . .,   are p explanatory variables;  1 ,  2 , . . .,   correspond to the regression coefficients for the respective independent variables; are the cut-off points between categories.
Multinomial logit models do not consider proportional odds and ignore ordered response categories.For k possible outcomes, running k-1 independent binary logistic regression models in which one outcome, say k, are chosen as a reference and then the other k-1 outcomes are separately regressed against the reference outcome.The general form is followed by the following equation: The restriction of ordinal regression originates from the proportional odds assumption even though ordinal regression takes care of ordinal relationship between levels of the dependent variable [16].The proportional odds assumption is that  is independent of j.In other words, the effects of independent variables, , are constant between different levels of the dependent variable.The proportional odds assumption can be tested by using a likelihood ratio score test to determine whether allowing the effects of independent variables to change will result in significant improvements in model fitness [16].If the proportional odds assumption is not met, there are still several options, such as using the partial proportional odds model [20].Our models meet the proportional odds assumption, possibly because of the large sample size and continuous latent response.The proportional odds cumulative-logit model acts well with its connection to the idea of a continuous latent response.Bridge condition is actually a categorized version of a latent continuous variable.
The 9-point scale is a coarsened version of a continuous variable indicating degree of component condition.The continuous scale is dived into 9 regions by 9 cut-points: 0-9.If we have normal errors rather than logistic errors, or in other words when an error term is a random error from a logistic distribution with mean zero and constant variance, the coarsened version of a continuous variable will be related to the independent variables by a proportional odds cumulative-logit model.It worth mentioning that the 9-point scale of bridge component ratings is subjective and there is a great need to model the relationship between the inspection rating and the actual condition of the bridge components.However, in this research the main focus is to demonstrate the forecasting improvement of the proposed model with 9-point scale measuring components due to the data availability.

National Bridge Inventory Database
The National Bridge Inventory (NBI) ASCII database is a unified database compiled by the Federal Highway Administration (FHWA) for all bridges and tunnels in the United States that have public roads passing above or below them [21].The database provides the most comprehensive bridge information in the United States.Detailed information regarding NBI data can be found in the FHWA NBI reference report [22].The data in the NBI is collected by state highway agencies and reported to FHWA annually/biennially.
As stipulated in the National Bridge Inspection Standards, bridges are inspected at least once every 24 months.During these inspections, the conditions of the three major bridge components (deck, superstructure, and substructure) are rated using a standard scale developed by Federal Highway Administration (Table 1).One can tell easily that bridge component ratings are ordinal discrete data from Table 1.North Dakota 2012 NBI data is selected for model formulation and North Dakota 2013 and 2014 NBI data are used for external model validation purpose.
In this study, not only in-sample fitness assessment is conducted with the data set used to construct the model for the purpose of ensuring the model's in-sample fitness.External data forecasting validation is also conducted with two separate data sets along with MAP21 indicators to explore model's forecasting reliability.ND 2012 data set is selected for the purpose of constructing models and ND 2013 and 2014 data sets are selected for external forecasting validation purpose.However, it is easy to demonstrate validation procedures with any data set that makes available.
Bridge distributions by three component ratings for ND 2012 are displayed in Figure 1.As shown in Figure 1, most bridges are coded as 7 or 8 for all superstructure, substructure, and deck conditions.Very few bridges conditions are rated as poor or lower.
In this study, ordinal logistic regression and multiple linear regression models were constructed to forecast bridge conditions.Model fitness and forecasting skills are evaluated and compared between the two types of models.Several criteria were selected for evaluating and comparing the two models and are introduced in the following section.

Model Evaluation and Comparison Methods
The following measures were considered in this research: prediction error (PE), bias, accuracy, out-of-range forecasts, percent of correct estimation, Akaike's Information Criteria (AIC), and log likelihood (LL).The models were constructed with the same dataset and compared in two senses: model fitness and prediction performance.All seven proposed measurements can be used to assess model fitness with the same data set that was used to build the model.The first five measures can be used to evaluate model forecasting performance with external evaluation data.
The prediction error, also known as residuals, is a measure of the discrepancy between the observed data and an estimated value which can be mathematically expressed as (3).Two variants of the prediction error were selected for comparison: sum of absolute residuals (SAR) and sum of residual sum of squares (RSS).They are expressed as ( 4) and (5).
where y i and ŷ(i) are the observed and predicted values of the predictions for data point i.
Bias indicates, on average, how much a model overpredicts (where bias >1) or underpredicts (where bias <1) the observed data [23], with bias equal to 1 indicating zero bias and it is shown as The accuracy measurement indicates, on average, how much the prediction differs from observed data [23], with 1 indicating perfect accuracy.This measurement is shown as Accuracy = 10 Fit criteria such as Akaike's Information Criteria (AIC) are also selected to compare model fitness between the two models.AIC is a common measure of model fit that balances model fit against model simplicity.The model with the smallest AIC is deemed the "best" model based on apparent validation.In other words, a smaller AIC value indicates a better model/predictor.This can be mathematically expressed as where k is the number of free parameters; and n is the number of data points.Out-of-range forecasts were counted when the forecasted value is greater than 9 and less than 0 for bridge components.This issue only exists for multiple linear regression.For ordinal regression, any out-of-range forecast is always zero.Percent of correct estimation assesses model performance and fitness by examining the prediction and actual observation agreement ratio.

Development and Evaluation of Ordinal Logistic Regression Model
Multiple regression models for forecasting bridge component rating are still used by some transportation agencies such as North Dakota DOT to assist in bridge inventory management [8,16,23,24].In research on this subject, a few key explanatory variables were found to contribute to network bridge rating changes.The variables used in the analysis are summarized in Table 2.The parameters such as weather, freeze-thaw, deicing applications, etc. are not available for us and thus are not included in the model.Forward stepwise regression based on all adjusted rsquare, Akaike information criterion, Bayesian information criterion was used to select the "best" multiple regression model.Detailed regression model selection techniques and theories are out of the scope of this study and readers are referred to Draper and Smith [25].An ordinal logistic regression model is constructed through an explicit enumeration of all available explanatory variables available and the "best" fitted model is selected based on model selection.The same data and model selection methods are applied for the selection of a multiple linear regression model for the purpose of comparison.In this study, both in-sample fitness and external validation will be performed for both regression models and the indicators described in earlier sections will be used to evaluate model quality.(3) NBI inspection records in each single year contain the whole bridge population but not sample [27].Two candidate models were constructed for predicting deck, superstructure, and substructure component performance ratings, respectively, with North Dakota NBI 2012 data.To illustrate the model performance, the models were first evaluated and compared by an in-sample validation method with previously introduced measurements and then ND NBI data from 2013 and 2014 were used to further conduct external prediction validation.Significant parameters for the two sets of models were tested at 90% confidence level as shown in Table 3.
As shown in age and age squared are identified as significant contributors to condition ratings indicating a nonlinear polynomial effect of age.With positive age squared sign and negative age sign, one can tell the effect of age could be positive up to certain age and ten negative thereafter.For categorical independent variables the relationship is relative among independent variable categories as shown for "reconstruction" which only contains two levels.All models indicate that if a bridge has reconstruction history, the component ratings tend to be better.In other words, reconstruction will improve the bridge component ratings.Note that all models provide expected significant relationships between dependent and independent variables except for ADT's significance for superstructure rating with the multiple linear regression model.However, the relationship is identified as significant with the ordinal logistic regression model.
To assess how well the model fit the 2012 data, Table 4 shows the comparison results among all the models with indicators introduced earlier in the paper and the percentage of estimations within one or two condition-rating difference.Estimations within 1 condition-rating difference are the estimations that predict component ratings no greater than one above or below observed ratings.Estimations within 2 condition-rating differences are the estimations that predict component ratings no greater than two above or below observed ratings.From Table 4, three ordinal models for deck, superstructure, and substructure consistently show slightly but significant better results than the corresponding multiple linear regression models.
Of the predictions from linear regression models, 0.38%, 0.87%, and 1.07% are out-of-range (0 to 9).The percent of exact-match predictions by three multiple linear regression models (each with the same prediction and observation) are 44.18%,47.51%, and 41.66%, while the ordinal logistic predictions are much better: 48.3%, 56.74%, and 44.13%.The same conclusion is true for percentage of estimations within one condition-rating difference and within two conditionrating differences.One can tell that the three ordinal multinomial models have more percentage component rating predictions that are off by one or two observation ratings.The bias and accuracy indicators for ordinal logistic regression model are all slightly closer to 1 than those for multiple linear regressions.The sum of absolute residual, sum of residual squares, AIC, and LL, consistently indicate all ordinal regression models perform better than multiple regression models.The ordinal models improve the model performance in terms of the sum of residual squares by 20.97%, 27.04%, and 12.47% compared to the multiple regression models for deck, superstructure, and substructure, respectively.Detailed improvement percentage values for all the four indicators are shown in Table 5.

Model Validation with New Datasets
To further illustrate the external validation method result, the same ordinal logistic and multiple linear regression models from 2012 data are validated and compared with all ND NBI 2013 and 2014 deck, superstructure, and substructure observed data.Model performances were compared for ordinal logistic and multiple linear regressions by comparing sum of absolute residuals, sum of residual squares, bias, accuracy, out-of-range forecasts, and percentage of estimations which are within one or two rating differences compared with observed component ratings and exact forecasts.The performance results are shown in Tables 6 and 7 for 2013 and 2014, respectively.One can tell from Tables 6 and 7 that ordinal logistic models perform consistently better than multiple linear regression models for all deck, superstructure, and substructure models and for all indicators with both external validation data sets.The improvement percentages for 2012, 2013, and 2014 are shown in Table 8.Some interesting observations were obtained in the analysis.Table 8 indicates the superstructure ordinal model has the highest improvement consistently for all three of the yearly data sets and for each of the indicators, except for 2014 bias indicator, followed by the deck model and then the substructure model.The deck model has the highest improvement for bias in 2014, followed by the superstructure model and then the substructure model.There is no specific trend for model performance improvement by year.For example, deck model improvement for the sum of absolute residuals decreased from 13.47% in 2012 to 13.27% in 2013 and to 12.9% in 2014.The deck model improvement for the sum of residual squares increased from 20.97% to 21.39% then to 22.04% from 2012 to 2014.Finally, the deck model improvement for bias increased from 5.61% to 6.66% and then decreased to 5.7% from 2012 to 2014.

Analysis of MAP-21 Bridge Performance Indicator
The MAP-21 rules require all states to report percentage of national highway system bridges classified in good condition and poor condition.Bridge condition can be determined based on an assessment of the deck, superstructure, and substructure.The method used under the Highway Bridge Program is selected to determine bridge conditions: components with condition ratings of no less than 7 are rated as "Good" and no greater than 4 are rated as "Poor".When all three components are rated as "Good" the overall bridge condition rating can be coded as "Good" and when all three components are rated as "Poor" the overall bridge condition rating can be coded as "Poor".The observed bridge condition measures and the forecasted measures are listed in Table 9.
The values displayed in   The above analysis shows that the ordinal logistic models are always better at predicting bridge conditions and measurements with both the in-sample and external validation data sets.
It is worth noting that all the models are underestimate for poor conditions due to the nature of the data distribution.The bridge condition data is imbalanced and biased data set; in other words, the number of observations belonging to one category is significantly lower than those belonging to the other categories.In the situation, the predictive model developed using any GLMs or even conventional machine learning algorithms could be biased and inaccurate.To handle imbalanced classification or improve forecast of rare events data is an extended research and should be investigated in future research.

Findings and Discussions
Eight model evaluation criteria were used to compare the goodness of fit and the forecasting power of the models with both in-sample and external validation data sets for deck, superstructure, and substructure condition.The following are the main findings of the study: (ii) Some indicators show significant improvement such as sum of absolute residuals, sum of residual squares, AIC, and exact forecasts (about 10% improvement).However, some indicators show slight improvement such as bias, accuracy, and LL (less than 10% improvement).
(iii) Superstructure models show the greatest improvement for almost all performance criteria, followed by deck models and substructure models.To further investigate on this issue, time series data need to be tested to confirm that the superstructure model consistently performs better than the other two models (iv) There is no clear trend for model performance improvement by year.According to Table 8, for some indicators and certain models, such as sum of residual squares, bias, and accuracy for the deck model, the improvement sequence is 2014, 2013, and 2012 from the greatest improvement to the least improvement by switching from multiple linear regression model to ordinal logistic model.For bias and accuracy of the superstructure model, the sequence is 2013, 2012, and 2014 from the greatest improvement to the least.For sum of residual squares and exact forecasts, the greatest improvement from switching from a multiple linear regression model to an ordinal logistic model was achieved at year 2012 and then followed by 2014 and 2013.
(v) Ordinal logistic models will not predict out-of-range estimations which are not controlled by multiple linear regression model.

Conclusions
This paper proposes and demonstrates an ordinal logistic regression model for forecasting bridge component rating.The model is preferred for its ability to handle the ordinal nature of bridge component ratings, its explanatory power of the regression analysis, and its accurate prediction power.In this study, both ordinal logistic regression and multiple linear regression models have been generated for predicting three main bridge component ratings.The multinomial logistic model demonstrated in this research can be easily applied with element-level data when it becomes available.
In addition to assessing model performance, both in-sample and external validation analysis were performed for all eight evaluation criteria.Finally, it is determined that the ordinal logistic regression method is a better approach than the multiple linear regression method for forecasting bridge component ratings.It has the inherent advantage of always making meaningful predictions and its predictions are closer to the observations.

Figure 1 :
Figure 1: Bridge distribution by component ratings for ND 2012.
(i) The analysis shows agreement among all indicators, for all three component models, and for all three-year data sets.All the comparison results indicate the clear improvement of the ordinal logistic model over the multiple linear regression model.

Table 1 :
Condition ratings used in the National Bridge Inventory (NBI).
Major deterioration or section loss present in critical structural components or obvious vertical or horizontal movement affecting structure stability.Bridge is closed to traffic but with corrective action may put back in light service.0 Failed Out of service, beyond corrective action.Source: United States Department of Transportation.Recording and Coding Guide for the Structure Inventory and Appraisal of the Nation's Bridges.Washington, D.C., 1995, page 38.

Table 3 ,
reconstruction, bridge type, and district are categorical variables and ADT, age, and age squared are numerical variables.Positive and negative values in Table 3 indicate the corresponding variables' relationship to component condition categories.For example, ADT has a negative relationship with deck condition.In other words, deck condition ratings decrease as ADT values increase.Both

Table 2 :
Description of variables used in analysis.

Table 3 :
Significant parameters and statistics with 2012 data.
Note: all independent variables are significant at 90% of the confidence.

Table 4 :
Model comparison statistics with 2012 data.
Table 9 are estimated and observed percentage of bridges in good or poor conditions.By comparing estimated percentage of good or poor bridges to the observed percentages for 2012, 2013, and 2014, respectively,

Table 5 :
Performance improvements by ordinal model compared with multiple linear model.

Table 6 :
Model comparison statistics with 2013 data.

Table 7 :
Model comparison statistics with 2014 data.

Table 8 :
Performance improvements by ordinal model compared with multiple linear model.

Table 9 :
Bridge condition measures required by MAP-21 comparison results.