Automatic Road Pavement Assessment with Image Processing : Review and Comparison

In the ﬁeld of noninvasive sensing techniques for civil infrastructures monitoring, this paper addresses the problem of automatic crack detection by automatic analysis of optical images of the surface of the French national roads. The ﬁrst contribution is a state of the art of the image processing tools applied to civil engineering. Second, we describe the proposed method to detect ﬁne defects in pavement surface. This approach is based on a multi-scale extraction and a Markovian segmentation. Third, an evaluation and comparison protocol which has been designed for evaluating this difﬁcult task – the road pavement crack detection – is introduced. Finally, the proposed method is validated, analysed and compared to a detection approach based on morphological tools.


Introduction
The evaluation of road quality is an important task in many countries, like, for example, in France, where the national roads are inspected each three years in order to estimate the needed reparations and constructions.To estimate the quality, these aspects can be taken into account: the adherence, the micro-texture, the macro-texture and the surface degradations.Before 1980, all these inspections were accomplished manually, but, it can be automated with noninvasive techniques, like, image processing.To be more comfortable, less dangerous for employees and users of the road but also more efficient and less expensive, many systems have been proposed, based on ground penetrating radar [27] or laser system [53].However, for noninvasive evaluation of surface degradations, the recent research results seem more promising with optical image processing approaches for these reasons [60]: (1) The acquisition systems based on optical devices are easier to design and to use than other kinds of systems (they are less sensitive to mouvement or vibrations).
(2) They also allow a dense acquisition (each millimiter), i.e. the acquisition can be realized for the entire road surface, whereas for the other systems, like laser, the measurements are available every 4 millimeters at normal speed (90 km/h)1 .
(3) As the acquisition system is more dense, the measurement of the defects is more precise than with other systems.
(4) Even if the images are not always well contrasted, they are more contrasted than the images/signals that can be given by other devices, i.e. the ratio between noise and signal is greater with optical sensor than with other kind of sensors.
Nowadays, many acquisition systems are available [4,60], see Table 1 (interested readers can find details about the evaluation of such systems in [46,62]).Moreover, to the best of our knowledge, many semiautomatic detection of road defects can be found in the literature but only one is commercialized (by INO 2 ).From all the approaches proposed, it is difficult to know which ones is the most adapted to the task and what are the actual methods that are favoured, this is why, the first goal of this paper is to present a state-of-the-art of existing methods in noninvasive control based on image processing.This task of crack detection is difficult for the special case of road crack detection because it needs to detect a signal weakly represented (1.5% of the whole image) and weakly contrasted (the road possesses a texture that hides the crack).Recent methods have shown their limits: the detection contains a lot of false detections (induced by the particular texture of the road), the detection is not enough precise (we have a region of detection and not the skeleton plus the width of the crack).The main default of the existing methods is that they do not take into account the specific geometry of the crack: it is a thin and linear object.In consequence, the second aim of this work is to introduce a new method that take into account some geometric properties of the cracks.
Even if this problem is hard, and very important in the field of civil engineering, as far as we are concerned, there is no protocol for evaluating and comparing existing methods and it is difficult to known what kind of methods has to be chosen for this task.So, we think that, with the multiple methods proposed in the literature, it is important to evaluate and to compare the various methods in order to validate previous work and to identify the approaches that can be employed and/or the methods that need improvements.So, the third aspect discussed is the introduction of such a protocol.
In consequence, the objectives of this publication are, first, to give a state of the art of existing methods in noninvasive control based on image processing for estimating the quality of the road surface, second, to present our method and, third, to introduce a protocol of evaluation and comparison that allows to highlight the advantages and drawbacks of each method.

Automatic Road Crack Detection
In the literature, many papers have introduced approaches to detect thin objects in textured images, like in medical imagery, for the detection of blood vessels [13], satellite imagery, for road network detection [29].Since 1990, algorithms have been proposed for semi-automatic detection of road cracks (interested readers can see [59] for details about road imaging system and their limits in 1999).For the detection of cracks, three components have to be taken into account: (1) Acquisition (see.Table 2 for details); (2) Storage and (3) Image processing.
In this paper, only the last step is studied but the choices for the two first steps are important for the success of the image treatment.Moreover, most of the references are given in the field of road quality assessment, but, some of them come from different applications, like cracks and defects in concrete (for bridges or pipelines), on ceramics or on metallic surfaces (for industrial applications).For road cracks, most of the time, these hypotheses can be exploited: (1) Photometric hypotheses The crack pixels are darker than the road pixels.• (H p 2 ) The gray-level distributions of road crack and road surface are distinguishable.
(2) Geometric hypotheses • (H g 1 ) A crack is a fine continuous object.
• (H g 2 ) A crack is a set of connected segments that have different orientations.• (H g 3 ) A crack does not have a constant width on the whole length.
(3) Photometric and geometric hypotheses The points inside a crack can be considered as points of interest, from a photometric and/or geometric point of view.
These different hypotheses can be complementary, like (H p 1 ) and (H p 2 ) or (H g 1 ) and (H g 3 ), but some of them are opposite, like (H g 1 ) and (H g 2 ).The hypothesis (H pg 1 ) combines two kinds of constraint because, the definition of a point of interest (POI), that is a significant point in a scene, can be expressed both with photometric constraints (the distribution of gray levels near POI has some particularities) and geometric constraints (the point of interest can be a corner, an edge, or any kind of geometric structure).
For image processing, we enumerate semiautomatic and automatic detection approaches, and, five families can be considered, see Table 3: (1) based on histogram analysis (hyptotheses H p 1 , H p 2 and H p 2 ): the most ancient and the most popular ones.These methods use a thresholding based on histogram analysis [3,44,67], with Gaussian hypotheses [40] and/or adaptive or local thresholding [23,26].These approaches are simple and not time consuming, but they also give many false detections.In fact, these methods assume that the two gray level distributions (the road pavement distribution and the crack distribution) can be separated based on a global level statistics (histogram) 3 .In Figure 1, we can see that most of the time, this hypothesis is not valid.
An initial thresholding is needed and the results contain less false detections than methods based on histogram analysis.However, the major drawback of this kind of techniques is the strong dependence to the parameter choices.
(3) based on a learning phase in order to alleviate the problems of the two first groups of methods [51,50] (hypotheses H p 1 and H p 2 ).Most of them are based on neural networks [17,32,37].The drawback is the learning step that can not allow a fast and fully automatic analysis.
(4) based on filtering, the most recent ones (hypotheses H p 1 , H g 1 and H g 3 ).Using edge detections by fixed scale filtering is not adapted to the task of the detection of road cracks because the width of the crack is not constant and this is why many methods are based on wavelet detections [2,7,71,73] with adaptive filtering [12,13,65] (these approaches will be detailed in the § 4), contourlets [43], Gabor filters [1], Finite Impulse Response filter (FIR) [24] and methods using models based on partial differential equations (PDE) [5,49].Some techniques also use auto-correlation filtering [42,61] (similarity measures are estimated between the targets that simulate cracks and targets of the original image).An other kind of algorithms is based on texture analysis [54,63] (the crack is considered as a noise inside a texture).
(5) based on an analysis of a model [48,10] (hypotheses H p 1 , H g 1 −g 3 and H pg 1 ).These approaches are based on a local analysis versus a global analysis in order to take into account the local constraints and the global constraints of a crack.By multi-scale analysis of texture combined with an algorithm of minimal path [48] or by local detection of point of interest combined with geodesic contours [10].
In conclusion, we can notice that:   i.e. the performance of the method is dependent on the road texture.
• Old methods based on histogram studies, even those that are local, do not design correctly the problem, i.e. they do not take into account geometric characteristics of the cracks and photometric characteristics of the road pavement.
• Learning methods are efficient but the learning step is expensive (the time and the investments from the users that are not expert in image processing).
For all these reasons, even if learning methods have been used in our previous work, this paper focuses on the presentation of two methods that try to alleviate the limits of the old ones: we try to obtain a denser detection with a low rate of false detections.

Proposed and Compared Methods
Some preliminary works about methods adapted to this task (the detection of road cracks), have included experiences on a learning method.A neuron-based method has been tested [45], on the real images of size 768 × 512 presented in § 4.2.Results are interesting but learning methods are not easy to use for non-specialist in image processing, and, they cost a lot of time to the users.The main goal is to propose a system that facilitates the work of users and not a system that induces a lost of time by including a learning phase, and a maintenance each year in order to maintain the performances of the system 4 .In consequence, we have now focused our work on methods that allow automatic processing and, in particular, we present two approaches: (1) The first, Morph, belongs to the families ( 1) and ( 2) because it combines thresholding and refinement by morphological analysis.
(2) The second, GaMM, of families ( 4) and ( 5), is based of the advantage of multi-scale analysis and local modelling of the crack.
Morph has been proposed before GaMM and is quite near the method presented in [66].The contributions of this section are about GaMM, we propose a new model for the sites and the potentials used in the Markovian model.The advantages of this new method will be illustrated with qualitative and quantitative results in § 5.

Morphological method (Morph)
The chosen approach is based on hypotheses H p 1 , H g 1 and H g 3 and it follows these steps: (1) Pre-processing of the images: to reduce the texture and increase the contrast between the road pavement and the crack; (2) Binarization by thresholding (the threshold is different in the various variant and a local threshold can be used); (3) Refinement by closing; (4) Segmentation with shape analysis; (5) Extraction of the crack characteristics.
For step 1, three variants are introduced by combining these local tools: an erosion in gray levels, a conditional median filtering, a histogram equalization, a mean filtering (these pre-processings are detailed in § 5.1).The step 4 is realised in two passes: first a connected components labeling is realized and, second, the size and the shape of each component is determined in order to remove components where shape is not similar to a crack: the shape of a crack has to look like a thin object, the width, w, and the height, h, are used for this task.More precisely, from an expert point of view, a crack is not significant if h < 50cm but, we can suppose that we manage to detect only a small part of the crack and this constraint becomes h < 7.5mm.Moreover, the mean width w min < 3mm and the maximal width w max < 6.5mm.All these thresholds are empirically set.In Figure 2, we illustrate the kind of results obtained at each step for the 3 variants.The final proposed method, named Morph, merges the 3 results (with a weighted sum and the weights are chosen with a learning phase) and refines the result by computing the closing in gray levels of the fusion result.

Adaptive Filtering and Markovian Modelling (GaMM)
More recently, our work focused on the field of wavelet decomposition.As it is difficult to chose the mother wavelet (useful for generating the wavelet family for multi-scale analysis) well adapted to the detection of road cracks, the adaptive filter theory seems convenient and, in particular, it allows to build a mother wavelet adapted to our task.First, we present the first step of the algorithm based on adaptive filtering (hypotheses H p 1 and H g 3 ) and the second on Markovian segmentation that can take into account the particular geometry of the crack (H g 2 and H g 3 ).

Algorithm
The goal of this algorithm, presented in Figure 3, is to obtain, step 1, a binarization (black pixels for background and white pixels for the crack) and a refinement of this detection by using a Markovian segmentation, step 2. Using adaptive filtering is important in order to allow the detection of non-constant width of the crack (which is realistic) (hypothesis H g 3 ).The number of scales for the adaptive filtering has to be chosen and depends on the resolution of the image.By supposing a resolution of 1 mm per pixel, by choosing 5 scales, a crack with a width from 2 mm to 1 cm can be detected.Moreover, the number of directions (for the filtering) also has to be chosen and, it seems natural to take these four directions: [0, π 4 , π 2 , 3π 4 ] that correspond to the four usual directions used for crack classification.The adaptive filtering is applied in each scale, each directions and then all the results are merged on each scale (mean of the coefficients).The results of this filtering is used to initialize the Markovian modelling used for the segmentation step.

Adaptive filtering
Some details are provided in order to realize the step 1a and 1b in Figure 3.The ψ ∈ L 2 (IR 2 )5 function is a wavelet if: where Ψ is the Fourier transform of ψ.The equation (1) induces that IR 2 ψ(x)dx = 0.The wavelet family is defined for each scale s and for each position u, by : where R θ is a rotation of angle θ.
One of the main difficulties to apply a wavelet decomposition is the choice of the mother wavelet ψ.The adaptive filter h of s is defined by: The crack signal depends on the definition of the crack.In this paper, like in most of the papers of this domain, crack pixels correspond to black pixels surrounded by background pixels (road pixels).This is why, in [65], a crack is a piecewise constant function f , defined for each position x ∈ IR by: where the factor a and the threshold T have to be determined.It does not correspond to a realistic representation of the crack.Because of sub-sampling, lights, orientation of the camera, the signal is more like a Gaussian function with zero mean: where a is the size of the crack and depends on σ, the deviation of the Gaussian law, i.e. a = . Consequently, the term σ allows to fix the width of the crack (like threshold T in equation ( 4)).Finally, for the step 1, h is estimated for each size of signal (determined by σ) for the 5 scales, as explaines in the beginning of § 3.2.1, and φ bb is interpolated in order to have the same size.Then the filter is transformed by rotation in order to cover the 4 orientations.

Segmentation
The goal of this part is to extract shapes, i.e. cracks, using the detection maps estimated at the first stage of the algorithm (step 2a of the algorithm of the Figure 3).For the first step of segmentation (initialization), the sites are of size 3 × 3, consequently, a regular grid is considered in the image.In [65], four configurations are possible and represented in Figure 4 (the part inside the rectangle with low gray levels).The initialization of the sites is based on the configuration that maximizes the coefficients obtained with the adaptive filtering.More formally, if we denoted , the four configurations, the best configuration γ best is: where m 2,α is the mean of the coefficients on the considered configuration γ 2,α .These four configurations do not represent all the possibilities and are not realistic configurations.In fact, all these four configurations are centered, whereas, it is possible to have some non-centered configurations.Consequently, we use the set of sixteen configurations illustrated in Figure 4 (all the presented sites).By modifying the number of configurations, we need to adapt the initialization of sites and equation ( 6) becomes: ] where m i,α is the mean of the coefficients on the considered configuration γ i,α .The image is considered as a finite set of sites denoted S = {s 1 , . . ., s N }.
For each site, the neighborhood is defined by: is defined as a subset of sites in S whose every pair of distinct sites are neighbors.These random fields are considered: (1) The observation field Y = {y s } with s ∈ S. Here, y s is the mean of the coefficients on the site.
Figure 4: The sixteen configurations in order to improve the modeling of sites -The four initial configurations proposed in [65] are in the bold rectangle, the sites are represented by the clearer gray levels, and for the proposed configurations the sites are represented by the darker gray levels.
(2) The descriptor field L = {l s } with s ∈ S. If there is a crack l s = 1 elsewhere l s = 0.
At each iteration, a global cost, or a sum of potentials, that depends on the values of the sites and the links between neighborhoods, is updated.This global cost takes into account the coefficients of the sites (computed from the coefficients estimated during the first part of the algorithm: adaptive filtering) and the configurations of each site and its neighbor sites (the 8 neighbors).More formally, the global cost is the sum of all the potential functions of the sites.This potential function contains two terms: The first term, u 1 , corresponds to the data term, and it evaluates how a site is similar to a crack from a photometric point of view (hypotheses H p 1 and H p 2 ).This term is based on the results given by the adaptive filtering.The second term, u 2 , represents the constraints induced by the neighbors of the site.More precisely, it estimates the consistency between a site and each neighbor site and it takes into account the geometric hypotheses H g 2 and H g 3 .The choice of the value α 1 depends on the importance of each part of the equation ( 8) and it will be discussed in § 5.1.1.
The function u 1 is given by: The parameters ξ 1 , ξ 2 and k have to be fixed 6 .For the definition of u 2 , we have to determine the number of cliques.In [65], 4 cliques are possible and the 8-connexity is considered.The potential function proposed in the precedent work only considers the difference of orientations between two neighborhoods and not the position between the two sites of the clique, see Table 4.Some cases are not penalized with the old configuration.For example, these two unfavorable cases are not penalized: • two sites with the same orientation but with no connection between them ; • two sites with the same orientation but their position makes them parallel.This is why, with the sixteen configurations that are presented in Figure 4, the potential has to take into account the differences of orientations between two sites (there are 16 × 16 possibilities) and the position of the two sites (there is 8 possibilities because we consider the 8 neighbors).Consequently, the new potential function u 2 follows these two important rules: (R 1 ) The lower the difference of orientations between two sites, the lower the potential.
(R 2 ) The lower the distance between two sites, the lower the potential (in this case, the distance implies the minimal distance between the extremities of the two segments). 6The choice of k is related to the maximal number of pixels that belong to a crack (it depends on the resolution of images and hypothesis about the size and configuration of cracks).We have chosen k in order to consider at most 5% of the image as a crack.Moreover, our experiments have brought us to take ξ 1 =ξ 2 =100.
For s 1 : For s 1 : Figure 5: Examples of the function u 2 -These two examples of sites with their respective neighbors show the behavior of the potential u 2 with the two considered aspects: orientation and distance.In example (a), with the help of the orientation term, the configuration s 3 is penalized and s 2 is less penalized than s 3 .In example (b), with the help of the two terms about the distance, the site s 3 is penalized, compared to s 1 .On the contrary, the particular case of s 2 is favorable and it compensates the penalty given by the orientations.
More formally, if: • d denotes the Euclidean distance between the two closest extremities of the sites, with d ∈ [0, d max ]7 ; • θ 1 and θ 2 are the orientations of respectively s = {p i } i=1..N s , and s ′ = {s ′ j } j=1..N p ′ where, p i , respectively p ′ i , is the pixel i of the N s , respectively N ′ s , pixels that composes the site s, respectively s ′ ; • θ e is the angle between the two sites; the u 2 function is defined by: (10) where NbC indicates the number of connected pixels between the two sites s and s ′ and J(x) equals 1 if x = 0 and 0 elsewhere.The first term is induced by the rule about the orientations, (R 1 ).This term equals zero when the sites have the same orientation and this orientation is the same as the orientation between the sites, i.e. θ e = θ 2 = θ 1 .This first term penalizes the configurations where the sites do not have the same orientation but also the particular case where they are parallel, see example (a) in Figure 5.The second term and the third term express the rule (R 2 ) about the distances.Two aspects have to be distinguished: the number of connected pixels, when the sites are connected, and, on the contrary, i.e. when the sites are not connected, the distance between the sites.It allows to give low influence at disconnected sites and also to increase the cost of sites that are parallel but connected, see example (b) in Figure 5.To study the influence of all these terms, the equation has been normalized and the different terms have been weighted (using α 2 , the choices for α 2 will be discussed in § 5.1.1).

Evaluation protocol
For the evaluation of image processing methods, nowadays, there is no protocol used by the community of road pavement analysis.However, all the approaches for characterizing the quality of the road advise to take into account the severity of the defect.For this, we need to know precisely the size, the width and the location of the cracks.This is why, now, it seems important to precisely evaluate the performances of all these automatic existing methods.For designing this protocol, we have to determine which kind of images have to be tested and which kind of criteria of efficiency can be used.In the community, to the best of our knowledge, there is no reference images and most of the time, the evaluation of the performances is qualitative.
For the evaluation of automatic crack detection, to the best of our knowledge, no evaluation and comparison protocol has been proposed in the community.However, in all the countries, for estimating the quality of the road surface, it is important to know exactly the size and the width of defects, i.e. to detect precisely the defect.This is why, it seems important to characterize quantitatively the performances of the methods.For building this kind of protocol, it is necessary, first, to choose the tested images, second, to choose how to build reference segmentations, and, third, to determine the criteria used for the quantitative analysis.For the reference segmentations, two approaches can be used to estimate them: (1) To compute synthetic images with synthetic defects -In this case, the exact position of the defects is known and these reference segmentations can be considered as ground truth.
(2) To propose reliable segmentations of real images -It supposes that we are able to provide a segmentation that is reliable enough to be employed as a reference.For evaluation, these segmentation can be called "pseudo-ground truth".The two solutions are studied, and, this is why we have to explain how the manual segmentations (that are our references) are done.Before, we briefly describe the acquisition system.In some cases, the sensors do not have the same settings and the global illuminations are different, so, it can generate some "false cracks".This aspect has been easily taken into account in a pre-processing step by eliminating the junction area in the region of interest.

Acquisition
The acquisition system used for the dataset of our experiments is described in Figure 6.It contains 4 video cameras with 3 sensors in gray levels in the backside of the car and 1 color one in front of the car.The first camera is needed to determine the environment conditions (weather, location, traffic) whereas the three other ones are used for the crack detection.
The resolution of this one is smaller than the 3 others and moreover, the optical axis is not perpendicular to the road surfaces, on the contrary of the 3 others.The 3 cameras have been physically synchronized directly during the acquisition.To be independent of the illumination problems, nine stroboscopic lights have been added.The position of the lights is perpendicular to the road plane and distant of 1 meter from the surface.The light power has been chosen in order to not deteriorate the visualisation of the road pavement and of the defects.

Reference images
The most difficult is to propose images with a reference segmentation.On the first hand, we introduce synthetic images with a simulated crack (the size of these images are 256 × 256) 8 As shown in Figure 7, the result is not realistic enough.It does not seem realistic because the contrast is too important between the road and the crack.Moreover, the interuptions of the crack, the changes in the direction, the presence of many paths, etc, in the default, are no simulated.In order to be more realistic, it seems that we have to design and to implement a complex heuristic to simulate the crack and it is too much effort for finally having only a synthetic default.This is why, on the second hand, we have simulated different defects on real images that previously contain no defect (the size of these images are 768 × 512 and 1920 × 480).The result is more realistic but the shape and the photometric aspect of the cracks (which are randomly chosen) does not seem realistic enough.This is why, it appears important to propose a set of real images (size 768 × 512 and 1920 × 480) with manual segmentations that are reliable enough to be considered as reference segmentations.To summarize, the two first kinds of images allow to propose an exact evaluation and to illustrate theoretically the behavior of the method whereas the last kind of images allows to validate the work on real images with a pseudo-ground truth.

Reference segmentations
For real images, we briefly explain how the manual segmentations are validated.Four experts have man- 8 The road is a random texture, i.  ually segmented the images with the same tools 9 and in the same conditions.Then, the four segmentations are merged, following these rules: (1) A pixel marked as a crack by more than two experts is considered as a crack pixel; (2) Every pixel marked as a crack and next to a pixel kept by step (1) or ( 2) is also considered as a crack.
The second rule is iterative and stops when no pixel is added.Then, the result is dilated with a squared structuring element of size 3 × 3. To evaluate the reliability of the reference segmentations, we estimate, first, the percentage of covering between each operator, and, second, the mean distance, D, between each pixel (detected by only one expert and not kept in the 9 We use a "home-made" software that proposes an interface that helps the person to segment the default.The principle is that the user has to select points on the crack.These points have to be close enough (from 5 to 20 pixels of distance).Then, the path between two close points is automatically detected by using a simple heuristic: the path that minimizes the mean intensity is selected.The interface is complete enough to allow the displacement of the points, the removing of some points, the removing of some cracks.The user can also select the width of the path (crack).Some filters are also proposed to improve the contrast between the crack and the road in order to help the user.reference image) and the reference segmentation.
Table 5 shows some results for 5 of the 42 images manually segmented.We have distinguished 5 families: the first one contains images acquired in static whereas the four other ones are acquired in dynamic.Moreover, we have 4 different kinds of road pavement acquired in dynamic.The 10 images have been taken in order to show results on each of these families.We can notice that the first 4 images are the most reliable because the mean error is less than 2 pixels.The precision of these results is satisfactory.On the contrary, the last 6 images show the important variabilities between operators and how it is difficult to extract a segmentation for these images, and, in particular, in the image 936, where the error is due to a bad interpretation of one of the four operators who finds a defect that does not exist.
IMAGES F (%) 2 (%) 3 (%) 4 (%) S (%) D (PIX)  The comparison of the 4 manual segmentations for estimating the final reference segmentations -For each image, are presented: the percentage of pixels in the whole image that are preserved as crack pixels in the final reference segmentation (F), the percentage of recovering between 2, 3 and 4 manual segmentations and the sum of these 3 percentages (S).For all the crack pixels that are not preserved in the final reference segmentation, the mean distance to this segmentation is given (D).
By analyzing the results for the criterion D, presented in Table 5, we can classify the 42 tested images in 3 categories, i.e. images with: (1) A reliable segmentation: The criterion D < T r .It means that all the operators have built segmentations that are quite near to each other.
(2) A segmentation that is moderately reliable : The criterion T r ≤ D < T a .It means that some parts of the crack are not easy to segment and there are local errors.
(3) An ambiguous segmentation: The criterion D ≥ T r .It clearly shows that the images are difficult to segment and in most of the cases, it means that some parts are detected as a crack whereas they are not and reversely.
The threshold have been empirically chosen and T r = 2, T a = 4.In Figure 8, first, we present the mean distance, D i , i ∈ {1, 2, 3, 4}, between the final reference segmentation, S r , and each manual segmentation, S i (obtained for the four operators) and, second, the criterion D for each real images of our protocol.The first graph illustrates how it is important to combine the four manual segmentations instead of using only one manual segmentation.Indeed, we can notice that each operator, alternately, gives an interpretation that is different from the three others.The second graph explains how the thresholds are chosen for determining the detections that are "accepted" for the evaluation, see § 4.4 for explanations about accepted detections.All these remarks are illustrated in Figure 9. Overall, the four segmentations are near to each other and if the segmentations are combined it permits to detect the width of the crack.However, these examples also present the difficulties of this task: areas where the cracks are less visible, the texture elements that have the same size and/or the same gray levels as the crack pixels.Thus, in some cases, one operator extends the crack or gives a different shape.In some extreme cases, the operator can even confound a crack and an other object of the scene (a piece of wood, for example).In another way, these last examples highlight the interest in combining different segmentations in order to obtain reference segmentations as reliable as possible.

Criteria of efficiency
In this section, we introduce how the reference segmentation and the estimated segmentation are compared.In Figure 10, we present common evaluation criteria that are used for segmentation evaluation: (1) the percentage of correct detections (true positives) (TP) ; (2) the percentage of false positives (FP) ; (3) the percentage of false negatives (FN) and (4) the similarity coefficient (DICE).
This last criterion seems to be the most significant because it evaluates the ratio between the FP and the FN, and it well resumes the results of all the criteria.Moreover, it directly expresses what is important to evaluate: how the evaluated method can reduce errors of detection whilst increasing the density of good detection.
For real images, the detections that are "accepted" have been added in order to tolerate a small error on the localization of crack pixels.This criterion is needed because perfect detection seems, for the moment, difficult to reach, see the results in Table 5, that illustrate this aspect.In consequence, these accepted pixels have been included in the estimation of the similarity coefficient or DICE.The threshold for accepted Figure 8: The variations between each manual segmentations that are used for building pseudo-ground truth -The first graph represents for each operator (one curve for one operator) and each image (x-coordinates), cf. Figure 17 for the corresponding images, the mean distance, D i , i ∈ {1, 2, 3, 4}, whereas, the second graph, presents D, cf.s § 4.3 and Table 5.This graph allows us to distinguish the different categories of images (red axes): reliable (D < T r ), moderately reliable (T r ≤ D < T a ), ambiguous (T a ≤ D).In the two graphs, the purple axes represent the five different samples of images (each sample corresponds to a kind of road pavement).The first ones were acquired with a static system whereas the four others were acquired with a dynamic system.pixels equals 0 for synthetic images whereas it depends on the mean distances, see D in table 5, for the real images.

Experimental results
In this section, two aspects are studied: (1) the evaluation of the method based on an adaptive filtering and a Markovian modelling in order to characterize its behavior, to estimate the best parameters and to determine the best variant ; (2) the comparison to the Morph method.

Adaptive filtering and Markovian Modelling
We want to determine, first, how to fix the different parameters, second, the pre-processing steps that are necessary, and, finally, which variant is the most efficient.In consequence, these points have been studied: • Parameter values -The weights α 1 , equation ( 8), and α 2 , equation (10), are tested from 0 to 1 with a step of 0.1.
• Pre-processings -pre-processings have been experimented to reduce noises induced by texture, to increase the contrast of the defect and, to reduce the light halo in some images: (1) Threshold -This pre-processing has been proposed in order to reduce the light halo in the six last images presented in Figure 7 and in all the im- In order to preserve the crack signal, each pixel under a given threshold is not filtered 10 .
• Algorithm variants -Four variants are compared: (1) Init -This is the initial method proposed in [65].
(2) Gaus -This variant supposes that the distribution of the gray levels inside a crack follows a Gaussian function, see § 3. • Comparison -We have compared this method with the method based on morphological tools and that is quite similar to [66], Morph.

Influence of parameters
Among the results, two conclusions can be done: (1) For each variant and each pre-processing, the weights between the term for adaptive filtering and the term for the Markovian modelling should be the same, see equation ( 8), i.e. α 1 = 0.5.However, when more weight is given to adaptive filtering, the quality of the results is lower than when more weight is given to the Markovian segmentation.It means that in this kind of application, the geometric information is more reliable than the photometric information.It seems coherent with the difficulties of the acquisition.
(2) For the Markovian modelling, we have noticed that the results are the best when the weights are the same between the orientation term and the distance term, see equation (10), i.e. α = 0.5.However, better results are obtained when the weight of the orientation is greater than the distance one instead of the reverse.It implies that the orientation characteristics are more reliable than the distance ones and this remark is coherent with the fact that cracks present strong spatial constraints.Moreover, it is also linked with the difficulties induced by the acquisition (the lighting system makes the photometric information less reliable).

Pre-processing
These tests have been done with real images, because, the synthetic images do not need pre-processings.The results are given by: Init Gaus InMM GaMM Restoration Restoration Threshold Erosion However, for the four first images (acquired with lighting conditions more comfortable than the lighting conditions of the next 6 ones), the pre-processing is not significant for increasing the quality of the results.Moreover, with the new Markovian modelling, the pre-processing step does not significantly increase the quality of the results.

Variants
The results are presented for two different cases: (1) with synthetic images and (2) with real images.
For the first category, the ground truth is available whereas for the second category, a pseudo-ground truth is used and the detections which are accepted are taken into account in the evaluation, i.e. a threshold is applied to the distance between the segmentation estimated by the evaluated method and the pseudoground truth segmentation.The thresholds applied on the distance for the accepted detections are determined with the results given in Table 5, column D. In Figure 11, the evolution of the similarity coefficients, or DICE, for the 11 synthetic images, 11.(a) and 10 of the real images is presented, 11.(b).With synthetic images, the method GaMM is clearly the best for most of the images.However, for one image (the fifth), the results are worse than the results of the method Gaus but they are still correct (DICE=0.72).
On the contrary, for the most difficult images (the 3 first ones that contain a real road background), the method GaMM obtains acceptable results (DICE > 0.5) whereas the other methods are not efficient at all.Illustrations are given in Figures 12 and 13: they show how the method GaMM can reduce false detections.

Image
Ground truth Init

Gaus InMM GaMM
Figure 12: The segmentation results on some synthetic images presented in Figure 7 -These are the results obtained with the four variants and these illustrations show how the method GaMM gives the clearest result.We can also notice the good results of the method InMM.

Results and comparison with Morph
Finally, we have compared the results of GaMM on each of the complementary dataset (32 images) with Morph.The mean DICE is 0.6 with GaMM whereas it is 0.49 with Morph, see Figure 14.It shows how GaMM can outperform Morph.However, if we compare image per image, the results show that in 50% of the cases GaMM is the best, see illustrations of these results in Figures 15 and 16.More precisely, GaMM seems more efficient with ambiguous images, whereas Morph is the best with reliable images.Finally, we can also precise the execution time for the two methods: about 1 minute for GaMM and 5 seconds for Morph with a processor intel R core TM 2 duo of 2 GHz).These execution times give only some indications because the implementation, in particular for GaMM, has not been optimized.

Conclusions
In conclusion, this paper gives a review about image processing methods for the crack detection of road pavement.It can help the researchers who want to choose and to adapt an auscultation method to the constraints of the transport structure that is studied (it depends on the quality of the surface, the needs Init Gaus InMM GaMM 13: The segmentation results on some real images -These are the results obtained with the real images presented in Figure 7.The method InMM obtains the clearest (i.e. with less false detection) but we can also noticed the good quality of the detection map with the method GaMM.
of the auscultation).Moreover, a new method for the detection of road cracks has been introduced and we have presented a new evaluation and comparison protocol for automatic detection of road cracks.As far as we are concerned, we proposed real images with ground truth for the first time in the community.The new method, GaMM, has been validated by the proposed protocol and compared to a previous one, Morph.This evaluation shows the complementarity of the two methods: the Morph method obtains more true positives than the GaMM method whereas this ones reduces the percentage of false positives.
Our first improvements of this work will focus on the evaluation and comparison protocol.We want to increase our data set by taking into account the different qualities of road surface or road texture (because for the moment, each proposed method seems very dependent on the quality of the road texture).In a second step, our future work will include new experiments about the acquisition system.Indeed, the acquisitions and the results obtained with the acquisition system presented have shown its limits, for example, in Figure 7, some parts of the crack are not "visible".It comes from the fact that to highlight the crack, it depends on the orientation of the lights and of the sen-Figure 14: The comparison of the similarity coefficients between GaMM and Morph -The dotted lines illustrate the five sets of tested images.For the first set that corresponds to real images with no illumination problems, the results are mixed whereas for the four other sets, GaMM is the best.The mean of this criterion is 0.6 (variance = 0.0257) for GaMM whereas it is 0.49 (variance = 0.0750) for Morph.However, this method has one step of characterization of the cracks (not introduced in GaMM) and this step can remove cracks that do not respect the characteristics of a cracks (in length, size and shape).This step contributes to reduce errors, in some difficult cases, it decreases the performances of the detection, compared to GaMM.sors.Using one single sensor and one light always in the same position/orientation, we can sometimes miss some defaults in the acquisition.So, it seems important to study other kinds of system to improve the quality of the automatic treatments.
Third, we want to improve the GaMM method, to begin, by adding the extraction of the crack characteristics, like in Morph.

Figure 1 :
Figure 1: Examples of histograms (second line) of images (first line) with cracks -These images present only one mode in their histograms and it is impossible to separate the gray level distribution of cracks from the gray level distribution of the road pavement.

1 )
For each scale do 1a) For each direction do Estimate Adaptive Filter (AF) 1b) Merge AF in all the directions 2) For each scale do 2a) Initialization of the sites (Markov) 2b) While not (stop condition) do Updating of the sites 3) Fusion of the results on each scale

Figure 3 :
Figure 3: The studied algorithm for the method based on adaptive filtering and segmentation by Markovian modelling -The step 1 leads to a binary image using adaptive filtering, while step 2 refines this result with a Markovian modelling.

Figure 2 :
Figure2: The different steps of the method morph -The conditional filtering is applied when the gray level is higher than 40 (to prevent the removal of the crack).The last step is proposed in order to reduce false detections and to complete the detection.

Figure 6 :
Figure6: The acquisition system used for the evaluation -In (a), The acquisition system is illustrated.In (c) and (d), an example of the final images is given.In (d), we can see the road that is visible in (c).The processing is done 1 meter by 1 meter, i.e. independently on each image presented in (d).The surface contains two reparations of vertical cracks.In some cases, the sensors do not have the same settings and the global illuminations are different, so, it can generate some "false cracks".This aspect has been easily taken into account in a pre-processing step by eliminating the junction area in the region of interest.
e. each intensity is randomly chosen by supposing a uniform distribution of intensities in [0; 255].Then, the user gives the position of the the length and the orientation (vertical, horizontal or oblique) of the crack.The crack points are built by randomly selecting the next point in the neighborhood and the intensity in [0; 100].

Figure 7 :
Figure 7: Examples of the images tested.

Figure 9 :Figure 10 :( 4 )
Figure 9: The levels of difficulties of the tested images -On these images, we present the four examples of manual segmentations.The codes are: red (light and dark), blue, green, for each of the four operators.The parts in yellow correspond to the parts of the cracks detected by more than one operator.There are two examples per category of segmentations.To better visualize, only a part of the images is shown.In the ambiguous images, we clearly see the mistake of the operator in dark red.

2 . 2 . ( 3 )
InMM -This is the initial version with an improvement of the Markovian modelling (new definition of the sites and of the potential function), see § 3.2.3.(4) GaMM -This is the method Gaus with the new Markovian modelling.

Figure 11 :
Figure 11: The variations of the similarity coefficients, see Figure 10, for the 4 variants -The first graph shows the results for synthetic images (the 3 first ones are obtained from real images with simulated defect) and the second graph presents the results with real images.The good performances of the methods InMM and GaMM can be noticed.

Table 3 :
The classification of crack detection (or thin objects in textured images) into five different families.For each family, the hypotheses that are employed are specified.

Table 4 :
The function u 2 used in [65] -This table presents the values u 2 (s ′ , s) for the sites in low gray levels in Figure 4.In our experiments, like the authors, we have chosen