Methodology to Forecast Road Surface Temperature with Principal Components Analysis and Partial Least-Square Regression: Application to an Urban Configuration

A forecast road surface temperature (RST) helps winter services to optimize costs and to reduce the deicers environmental impacts. Data from roadweather information systems (RWIS) and thermalmapping are considered inputs for forecasting physical numerical models. Statistical models include many meteorological parameters along routes and provide a spatial approach. It is based on typical combinations resulting from treatment and analysis of a database from measurements of road weather stations or thermal mapping, easy, reliable, and cost effective to monitor RST, and many meteorological parameters. A forecast dedicated to road networks should combine both spatial and time forecasts needs.This study contributed to building a reliable RST forecast based on principal component analysis (PCA) and partial least-square (PLS) regression. An urban stretch with various weather conditions and seasons was monitored over several months to generate an appropriate number of samples. The study first consisted of the identification of its optimum number to establish a reliable forecast. A second aspect is aimed at comparing RST forecasts from PLSmodel to measurements. Comparison indicated a forecast over an urban stretch with up to 94% of values within ±1C and over 80% within ±3C.


Context and Introduction
The objective of winter maintenance is to ensure the access and the use of infrastructures in adverse weather conditions.It mainly consists in avoiding the occurrence of ice for temperatures close to freezing and the persistence of snow.The decision to treat or not a road network is often difficult and is eased by road weather outstations and forecasts, commonly in place since the 80s.These stations do provide measurements of atmospheric and road parameters such as road surface temperature (RST) to inform road managers about network status.Forecasts are obtained with numerical models either sites specific or for a whole network.A range of forecast models exists, usually site specific, and since the 1980s a significant body of literature has accrued (see [1] for a thorough review).Furthermore, because of RST variations up to 10 ∘ C, forecast interpolation is also required.This has traditionally been achieved via thermal mapping, but of all the components contained within road weather information systems (RWIS), this interpolation has frequently been identified as the least satisfactory [2].
The thermal mapping consists of a high-resolution survey of RST by means of infrared thermometry.Measurements are conducted in various winter atmospheric conditions, and providing thermal fingerprints, used as inputs for forecast models [3][4][5][6][7][8][9].These fingerprints are also used as a tool for the optimisation of routes for anti-icing [10] or to identify locations for the installation of road weather outstations.
The extent of RST variations along a route is controlled by atmospheric stability and local variations of geographic parameters and of traffic (profile and density) [3,8].The greatest variations along a route are observed during these stable anticyclonic conditions [3].With decreasing atmospheric stability, the amplitude of the thermal fingerprint subsequently reduces.Some authors [5] have shown that under certain weather conditions some consistency appears in the spatial variation of RST along a route, which enables thermal mapping surveys under only a few selected weather conditions (usually extreme, intermediate, and damped) [1].
A standard of five to six surveys was selected typically commissioned to provide coverage of the conditions encountered in a winter season.This approach is not optimum, and results in daily forecasts are too often "pigeon-holed" into one of the categories when used operationally [8].However, measurement campaigns are very timeconsuming.The measurement has to be partitioned into itinerary stretches done in the open time window.Some gradual change appeared in practice, moving toward a new spatial modelling based approach.Route-based forecasts take into account both meteorological and geographical data to provide a high-resolution forecast of RST and condition around the road network [2].Besides potential significant improvement in the forecasts, it has also brought about a new set of challenges.Whereas traditional site-specific forecasts could be easily validated against sensor data from outstations located at the forecast sites, this is clearly impossible for a route-based forecast [1] where thermal mapping is still required for the verification of the spatial forecast.However, this approach is not cost effective to provide data at a high temporal resolution.Route-based forecasts can still only rely on isolated thermal fingerprints.
A forecast dedicated to road networks is the conjunction of both spatial and time forecasts needs of a vision of a full network at once.Thermal mapping is an easy, reliable, and cost effective way to monitor RST and many other meteorological parameters along routes.A numerical model will provide a temporal forecast on specific locations, and many computations will be necessary to build a forecast covering the full route or network.A new approach based on statistics was developed by Chapman et al. [11,12] and provided RST over a full given route only through principal components analysis (PCA) but did not yet provide a RST forecast as it is usually understood, and it only relies on RST measurements which are not always available.To do so, partial least-square (PLS) regression could indeed contribute to such a goal.A similar study based on these multivariate data analysis tools was performed by Kršmanc et al. [13] but only on specific locations.This study will be divided in two parts.The first one will consist of a description of a RST forecast on a specific urban location based on PLS.The variables consisted in air temperature obtained from measurements with the thermal mapping vehicle, along with other meteorological and anthropogenic data.This will contribute and therefore provide a more comprehensive picture of RST variation for a broader range of atmospheric stability than that conventionally considered by such surveys.A second part will be dedicated to a comparison between forecast results and field data.Then a new approach will be described about the way a full RST profile over a route is elaborated from a  air measurement or forecast of one specific location, which can have a greater reliability than RST ones.Results will be provided and discussed.

Methodology
2.1.Equipment, Route Investigated.DTer Est in Nancy (France) has developed and used a thermal mapping vehicle from the beginning of the existence of the technique.Along the years, a substantial quantity of thermal data for analysis has been accumulated.For this investigation, a whole route going around and within this town was chosen (Figure 1(a)).Then a stretch of a French urban street in Nancy (France) was selected for detailed analysis on RST.It is roughly several hundred meters long with a typical urban canyon configuration (Figure 1(b)).The thermal dataset is available for this route contains thirty-four of thermal fingerprints, each considered a sample, obtained under extreme and intermediate weather conditions over years 2012 and 2013, at least once a month.
RST was obtained by using an infrared radiometer operating in the 7-14 m spectral range, for a range from −30 ∘ C to +70 ∘ C, an uncertainty of 2 ∘ C (at 23 ∘ C), fitted to the bottom of a survey vehicle, and located in a compartment at controlled temperature.The instrument measures the energy flux density emitted by the surface, and RST is calculated through the Stefan-Boltzmann equation [14].The road surface is considered to be a grey body and as such emissivity is held constant at 0.95 [15,16].Air temperature (−40 ∘ C to +125 ∘ C, ±0.3 ∘ C uncertainty) and relative humidity (0-100% range, ±2% uncertainty) data are also obtained through an atmospheric probe located on the roof of the vehicle.An electrical turbine generated a laminar flow circulating over the sensors to improve measurements accuracy and avoid turbulences artefacts.Figure 2 illustrates the French thermal mapping vehicle with on-board measurements devices.

Partial
Least-Square Regression.PLS is a statistical method that enables data reduction when dealing with large datasets [17].It consisted in computing a calibration using a training set between air temperature matrix  air and another one of variables.As pointed out by Almkvist et al. [18,19], the ground heat flux is one variable affecting RST.This work detailed in this paper focused on a selection of variables easy to obtain, and that therefore already included this ground heat flux.It includes RST at the chosen urban location, relative humidity, wind force global radiation, nebulosity, the existence of precipitations within the previous 24 hours, the moment of the day (before midday a.m., or after midday p.m.), and the traffic intensity (weak, moderate, and high) as an anthropogenic variable.Traffic data was obtained from the appropriate service and each qualitative appreciation corresponds to a range of hours within the day.A weak traffic occurs during the night, a moderate one roughly between 10:00 a.m. and 4:00 p.m., and the highest appears during peak hours such as the ones of 8:00 a.m. and 6:00 p.m.Although RST for winter months is around 0 ∘ C, this situation contributes to make difficult the decision to apply deicers with respect to very negative RST values, and this is the reason why they were studied.This training set contributes to the generation of a statistical forecast model of RST with  air .Since noise is contained in both variables and  air data, this computation will conduct to the generation of a calibration matrix  and a minimized error matrix , to obtain the relationship: In PLS, a set of basis vectors is searched for variables and another one for  air , and the understanding of how these sets are related is necessary.The eigenvectors for each of these two spaces are calculated to restore an optimum congruence between each variable factor and its corresponding  air in the least-squares sense.Since some noise is both present on variables and  air measurements, eigenvectors calculated for both spaces are shifted by different amounts in different directions, due to the independence of noises in theses spaces, destroying the perfect congruence between variables and  air data points.PLS will try to restore the optimum congruence, defined as a perfectly linear relationship between the projections, or scores, of the variables and  air data onto their corresponding factors.
In the case of thermal mapping, each thermal fingerprint is considered to be a sample, and each variable as a data point in a multidimensional space.The variables are some of the physical parameters potentially included in a numerical model and affecting RST and as explained by Hammond et al. [1,12].By using the data from several thermal surveys, a data matrix is generated which can then be assimilated into clusters of points in this multidimensional space.With respect to thermal mapping, PLS will determine how many factors are needed to properly forecast RST with  air .Variables and  air matrices, respectively, are 34 lines (one line per thermal fingerprint) per 8 columns (one column per variable) (Table 1) and 34 per 28 columns (one column per data point over the distance of the urban stretch).Once these factors are identified, they can then be used to build a forecast model for other weather conditions provided they are not significantly different from the ones used for the original PLS calculations.The first objective in this PLS approach is to identify and then to prescribe the appropriate number of thermal mapping runs required to forecast an accurate daily temperature pattern along a given route.It is then to establish the forecast thermal fingerprints, thus providing a simple spatial forecasting model, based on a cost effective and realistic data.

Combination of PCA, PLS, and RWIS Data/Weather Services
Forecast for a Spatial and Temporal Forecast.Marchetti et al. [12] gave extensive details about the use of PCA to thermal mapping data and RST in particular.In this study, the same methodology was applied to  air matrix.Each thermal fingerprint is assimilated to a sample.The matrix contains as many lines as there are samples and as many columns as there are measurement locations (in this case every 3 m).Again, by multiplying the scores and the loadings matrices and adding the average profile of all samples,  air profiles of the urban stretch from PCA,  air PCA , were built, at least correcting an eventual offset error.The work consisted in checking the number of samples mandatory to perform PCA, considering again a target of four samples. air PCA profile was then assimilated to a one-column matrix, where each element of the column is  air at a point of the itinerary ( air PCA,1 ,  air PCA,2 , . . .,  air PCA, ),  being the final point of this given route.For two  air PCA profiles,  air PCA 1 and  air PCA 2 , an interpolated  air profile,  air PCA interpolated , using a coefficient  ranging between 0 and 1, will be then obtained by using (2), to denote continuation of form.The values of  are selected to obtain RST profiles over the full RST range investigated and thus to obtain a RST profile even if measurements were not made.Consider ) .
(2) One of the ideal inputs used to take decisions in winter maintenance is the knowledge of a RST profile all over the road network which a service is in charge of.Considering as impossible to obtain either RST measurements or forecast over these profiles, these winter services rely on site-specific RWIS outstations giving atmospheric and road parameters such as air temperature, relative humidity, and RST.They can also use numerical weather models outputs to provide a forecast for this specific spot, or more recently over the whole route using route-based forecasting techniques [8].Marchetti et al. [12] have indicated a way through PCA to obtain the whole RST profile using the RST at a single outstation, but which is not a forecast as it is usually understood.Furthermore, there is some controversy about the proper way to make RST measurement.It is done either by a probe embedded into pavement, raising the question of its too local aspect, or by a radiometer as indicated above, this temperature being radiative and not thermodynamic.The  air measurement is more accepted and does not suffer the same controversy from weather services.The idea is then to use either RWIS air temperature or forecast data point from a weather service at one single outstation and then to use coefficient  to match this point with the corresponding one in  air PCA interpolated .Then, the whole  air profile along the route is built.Once this one is available, a PLS model is applied to forecast the RST at the location for which the PLS model was developed.Then, using again the PCA analysis and the interpolation, a RST profile along this very same route could be extracted.The principle of this work is described in Figure 3.A comparison is then conducted between measured and forecast RST profiles.

Search of the Optimum Number of Measurements Sets
for the PLS for RST Forecast.PCA was conducted with Unscrambler X 10.1 package software on  air datasets to identify potential specific thermal fingerprints.Results indicated that over 99% of the variance could be explained with the first principal component, indicating the data homogeneity.These results are consistent with a previous published study [12].In the case of  air , an average offset of −2.3 ∘ C was identified and this correction was then applied to  air profiles calculated from PCA. Generic  air profiles over this route using PCA outputs (scores and loadings) were then calculated and illustrated in Figure 4. PLS was then performed on the urban stretch, considering, respectively, first the whole dataset and then only on some samples corresponding to temperatures below 10 ∘ C considered as representative of winter in the Nancy (France) area.The same package software as for PCA was then used.The idea is to determine the minimum number of samples to build a reliable statistical model to forecast RST with  air .The reliability of such a model is established mainly through their  2 and their RMSE.The conducted evaluations and results are summarized in Table 2.The number of outliers, which corresponds to data poorly described by the statistical models, is very small with respect to the number of points used to elaborate the model, that is, the sample number multiplied by the 28 columns of  air matrix.Results indicated that at least 6 thermal fingerprints are needed to develop a reliable model with  2 greater than 0.90, with an optimum to 8 fingerprints.This figure is consistent with the actual practice in thermal mapping.Furthermore, this PLS approach is more adapted to short stretches, with  2 of 0.97 and 34 samples, than to a whole and longer itinerary where  2 is below this 0.90 threshold.The consideration of both  2 and RMSE indicated that PLS approach is specifically adapted for road environment with vegetation, where RMSE is near or below 2 ∘ C.
Once these models were developed, their performances were evaluated.To do so, they were applied to two  air profiles obtained from measurements done along the same urban stretch at two different dates, along with other variables presented in Table 1, to deduce and somehow to forecast corresponding RST.A comparison with available RST field data was then conducted.A specific focus was conducted on situations corresponding to winter weather conditions in the Nancy area (RST < 10 ∘ C or near this value).Results are detailed in Table 3. PLS is therefore able to provide a RST forecast based on  air measurement with a bias below 1 ∘ C for situations close to water freezing and two factors.Too many factors generated a poor bias but the standard deviation was unchanged.The worst result was obtained with the warmest situation, although it is only due to a bias.

Combination of PLS and Weather Services Forecast for a Spatial and Temporal
Forecast.Some improvements were obtained in the RST forecast with PLS use.So far, there is still the need of either a  air profile or other variables to obtain RST.As mentioned above, winter services are usually taking decisions on the basis of RWIS data or on the forecast of atmospheric parameters among which is air temperature.In both cases, this piece of information is very local.In some situations, there are no available RWIS or their density over the road network is not appropriate, and road managers can only rely on a forecast from a weather service, with the inconvenient of a forecast available at given hours with several hours between each.Sometimes, some locations of the network have a specific thermal behaviour because of its configuration, and the provided forecast could greatly differ from observations.The objective here is then to evaluate the performance of a RST forecast PLS-based using  air values from a RWIS or  air forecasts from a weather service.With the methodology described in Figure 3, a full  air profile is then built using results presented in Figure 4(b).This built profile is an input in the selected PLS model elaborated in Section 3.1 to obtain RST on a specific location.Once this RST value is available, a full thermal fingerprint is obtained on the basis of data presented in Figure 4(d).The challenge is then to determine to what extent a full RST profile could be elaborated with site-specific  air data.
A forecast was then undertaken at two different dates, 2013-12-11 and 2013-03-22, for which RST and  air data were available along the studied urban stretch.It was obtained from Météo France weather station located in the immediate  vicinity of the departure point of the urban location (Figure 1).Météo France data was provided every three hours, from 0:00 a.m. to 9:00 p.m. GMT.Since thermal fingerprints did not exactly start at the same time, Météo France data was interpolated to calculate a corresponding  air .One has to consider that this piece of information is sometimes the only reliable one obtained by road managers.A summary of these elements is provided in Table 4. Once these elements gathered, a forecast based on the air temperature at the beginning of the stretch was used   to therefore establish RST forecast on the specific urban location.To do so and as detailed in Figure 3, once the air temperature in one point is identified, the corresponding full stretch  air profile was established using PCA (Figure 4(b)).This profile was then used as input for the dedicated PLS model, whose performances were detailed in Table 3. Results for the two dates are given in Table 5.
In both cases, the standard deviation did not exceed 1.3 ∘ C, and the bias was larger in only one case (over 2 ∘ C) while below 1 ∘ C in the second case.The differences of performances could be easily explained.First of all, Météo France data every 3 hours has induced a poor accuracy in calculated  air of Table 4.As indicated in Table 4, air temperatures measured at Météo France weather station and by the thermal mapping vehicle were different.The atmospheric probe of the thermal mapping vehicle did not measure air temperature as it is done by weather services using a standardized shelter.Furthermore, although the thermal mapping vehicle was not exactly in the immediate vicinity of the weather station at the start time, there were nearly 3 km several hundred meters between the two locations, and the thermal mapping vehicle one is constituted of buildings and pavements with respect to the absence of constructions and the presence of grass for the weather station.The further the  air profile elaborated with PCA from the field measurement, the greater the risk to get an inappropriate RST.
The error in the RST forecast from PLS depends on the errors in the different variables used in its elaboration.
In a first approach, it depends on the error made on  air used as input, especially in the situation when the profile of air temperature over the stretch is elaborated from one single point.To analyze to what extent the error made in the RST forecast, a set of PLS calculations was undertaken considering inputs of  air profiles within ±1 ∘ C with respect to the measured  air profile at two given dates (2013-03-22 and 2012-12-11) at start time and start point.Corresponding RST forecasts and a comparison with field measurement are given in Table 6.When the  air inputs were between −0.2 ∘ C and 1 ∘ C with respect to  air at start time and start point, 79% to 93% of the error on forecast RST was within ±1 ∘ C.
The RST forecast value corresponds to the start point of the urban stretch.To obtain the whole RST profile over the very same stretch, the profiles obtained from the PCA and presented in Figure 4(d) were then used.The comparison between the forecast and the measured profiles is presented in Figure 5 for the two dates, 2013-03-22 and 2012-12-11, considering the RST forecast of Table 5 for the PLS model with 2 factors out of the 4 available ones and considering no error was made on RST.
In the 2013-03-22 case, 100% of the forecast is within ±2 ∘ C, 94% within ±1 ∘ C, and nearly 85% of the forecast is within ±0.5 ∘ C, with a standard deviation of ±1 ∘ C. The greatest difference of nearly 1.8 ∘ C between measurement and forecast can also be attributed to the presence of a specific urban configuration, with an incidence on the sky view factor that accounts for 60% of the RST value [20].In the second Advances in Meteorology   case (2012-12-11), only 85% of the forecast is within ±3 ∘ C, the difference being comprised between 3.2 ∘ C and 2.1 ∘ C. As indicated by Gustavsson et al. [21], RST is clearly affected by the urban configuration and by anthropogenic heat flux, in particular traffic, which is one of the differences between the two dates.The global radiation and the nebulosity were also among the main differences.Although good results were obtained with this PLS configuration in one situation, some variables, such as traffic, might be changed to numerical ones.Indeed, the traffic density can be related to heat flux due to tire friction, to sensible and latent flux due to vehicles engines, and to changes in convective heat exchange because of passing cars.Nevertheless, the objective is to generate a forecast based on a methodology which is different enough from the one of physical numerical models, and providing relevant and robust forecast.The extension of the forecast to a longer stretch could easily be obtained by elaborating new PLS models on the basis of an adjusted set of variables as the one illustrated in Table 1, and by iterating the PLS calculations to as many stretches as necessary.

Conclusion
The objective of this paper was to investigate a statistical approach to forecast road surface temperature on the basis of thermal mapping data.The tools used were principal components analysis and partial least-squares regression, to build a road surface temperature forecast based on air temperature and other meteorological variables along with an anthropogenic one.The forecast was first based on air temperature profile over the full urban stretch and then on one single data point.Results of the investigations indicated the possibility to elaborate a road surface temperature forecast with PLS and air temperature data.Nearly 94% of the forecasts were within ±1 ∘ C with respect to field measurements for a selected urban stretch in one situation and over 80% within ±3 ∘ C. The PCA used to  air profiles allowed the elaboration of generic ones.Then, the knowledge of  air from a single weather outstation was used to identify the corresponding  air profile for further site-specific RST forecast with PLS.Once this value is obtained, the results of PCA to RST profiles were used to retrieve the RST profile over the full urban stretch.Results clearly depend on the proximity of this air profile from field measurements.An estimation of the error indicated that  air inputs between ±0.5 ∘ C provided the best RST forecast results.Investigations indicated that the six to eight thermal surveys, mandatory to obtain a reliable road surface temperature forecast, and covering a large spectrum of weather situations, remain low enough to still have a cost effective approach of RST forecast within the global frame of winter maintenance.The idea will then be to iterate PLS calculations over several stretches to obtain a forecast over a full itinerary or over a full network.To obtain an analysis less statistical-mathematical and more physical, some investigations could be undertaken to apply independent component analysis (ICA) to thermal mapping data and to decompose a multivariate signal into independent non-Gaussian signals.

Figure 1 :
Figure 1: Thermal mapping route (black line) (a) and illustration of its urban configuration (b).

Figure 2 :
Figure 2: French thermal mapping vehicles and associated instruments (infrared radiometer (b) and atmospheric probe (c)).

TT
air at one single outstation (RWIS) Corresponding T air profile T air_PCA_interpolated from T air profiles from PCA and interpolation Application of PLS model to T air_PCA_interpolated to build RST forecast at the outstation (RWIS) Corresponding RST profile RST PCA_interpolated from RST profiles from PCA and interpolation RST forecast over the full urban stretch air profile built from PCA T air at one single outstation (RWIS)

Figure 3 :
Figure 3: Description of the principle to forecast RST from  air with PCA and PLS.

Figure 4 :
Figure 4: Results (explained variance (a, c) and interpolation (b, d)) of PCA applied to  air and RST profiles of the urban stretch.

Figure 5 :
Figure 5: Comparison between measured RST and a RST forecast over a full urban stretch based on the combined use of PCA and PLS for two dates (2013-03-22 (a) and 2012-12-11 (b)).

Table 1 :
List of variables used for PLS calculations.

Table 2 :
Configurations for PLS calculations on the urban stretch.

Table 3 :
Results of PLS models with optimum number of thermal fingerprints.
Where bias  = RST  − RST measured , with RST  , RST measured , respectively, modelled and measured road surface temperature.

Table 4 :
air data for RST forecast based on the coupling of PCA and PLS for RST forecast.

Table 5 :
Results of PLS models based on interpolated  air profiles for two dates.

Table 6 :
Evaluation of error made on RST forecast with a PLS model (2 factors used out of 4 of the model based on 6 samples) with an error of ±1 ∘ C on  air profiles for two dates.