Enhancing Structural Crack Detection through a Multiscale Multilevel Mask Deep Convolutional Neural Network and Line Similarity Index

Tis paper proposes a novel and practical crack-detection method for infrastructure. Te proposed method exhibits three key components. First, a multiscale multilevel mask deep convolutional neural network (MSML Mask DCNN) is proposed to accurately estimate crack candidates comprising linear and curvilinear features. Second, the proposed neural network is trained using only public image-sets. Te main principle of this approach is that cracks have unique and distinct features, and therefore, public image-sets provide sufcient information to estimate crack candidates for a neural network. Tird, a line similarity index (LSI), which is calculated using the Hough transform and coordinate transformation with principal component analysis


Introduction
Cracks on the surfaces of civil structures are important indicators of defect propagation and structural health. Most defects, including eforescence, water leakage, exfoliation, and separation, originate from cracks, suggesting that cracks can be considered as representative metrics of structural health [1,2]. It is important to detect these symptoms in advance because on-time operation and maintenance (O&M) not only decreases the repair cost and time signifcantly but also prevents the propagation of cracks and degradation of structures. Tus, crack detection is an important initial inspection process in O&M that can efectively maintain the healthy state of civil structures and ensure their safety and reliability.
In recent years, several studies have proposed a variety of crack-detection methods using optical images because optical cameras are inexpensive and can easily record inspection images to evaluate the health state of the surfaces of civil structures. Specifcally, cracks in images have distinct features, such as linear or curvilinear shapes, and appearances that are darker than the background. Moreover, cracks are characterized by the specifc appearance of edges and continuity of lines. Tese features are difcult to be identifed by using a conventional signal process and machine learning methods including particle flter (PF) and Gaussian process regression (GPR), which are widely used for predicting system responses [3,4]. Also, continuity of lines is not easy to be predicted using long short-term memory (LSTM) and broad learning system (BLS) [5]. Terefore, these features can be extracted through image processing methods including image fltering, edge detection, image segmentation, and a hybridization of these methods [6]. However, practical crack detection through traditional image processing is disturbed by noise and low contrast [7]. It is difcult to extract cracks from the background in images with low contrast using image segmentation because the pixel intensities of the crack and background are similar. Furthermore, certain cracks captured in noisy images may be eliminated through noise fltering using traditional image processing techniques. However, most methods merely work on specifc images and cannot be commonly applied to all images because the selected hyperparameters for noise fltering depend on the recording conditions. Consequently, specifc image processing methods corresponding to the image status are indispensable in the traditional approach [8].
Te novelty and key contributions of this study are as follows.
(i) An integrated framework was proposed for efective crack detection in inspection images for real-world applications. Tis framework comprises a multiscale multilevel mask DCNN (MSML Mask DCNN) and image processing using a line similarity index (LSI). (ii) Te MSML Mask DCNN estimates crack-related pixels from the input image with high accuracy and robustness. Moreover, the MSML Mask DCNN is only trained using public image-sets, implying that no additional efort is needed to record images for training the proposed neural network. Te core principle of this approach is that cracks have distinct features, and thus, feature maps from public imagesets provide sufcient information for crack detection. (iii) Te LSI is proposed to exclude non-crack candidates that are classifed from the crack estimation process based on two important criteria. One is the deviation of the crack features with respect to the representative line, and the other is the number of crack features that cross the representative line. Tese two characteristics of cracks ensure high precision and robustness in removing non-crack features. (iv) Te proposed image processing method successfully reduced noise from the acquired images caused by vibration and out-of-focus blurs, which are common phenomena in recorded images. (v) Te proposed framework was validated using public image sets and newly recorded image sets from the inside of a building and from an underground tunnel. A test set of public image sets confrmed the superiority of the proposed MSML Mask DCNN in terms of both accuracy and robustness. Images from feld experiments demonstrate that the suggested LSI eliminated over 75% of non-crack pixels from the mask estimated by MSML Mask DCNN, confrming that the proposed method is efective for real-world applications.
Te remainder of this paper is organized as follows: Section 2 discusses the related work, Section 3 provides the preliminary information, and Section 4 elaborates on the proposed method. In Section 5, we present the experiments conducted to validate the efectiveness of our approach, while Section 6 ofers the results and discussion of the image sets measured from feld experiments, analyzing the performance and robustness of our method. Finally, the conclusion and future work are presented in Section 7.

Related Work
Considerable eforts have been devoted to developing an efective architecture of DCNN for crack detection. One important factor that can afect estimation accuracy originates from conventional annotation. Te training image set for a DCNN is usually annotated with bounding boxes that may include several background pixels in addition to those related to crack characteristics. Te background images in bounding boxes deteriorate the extraction of distinct features from crack images, resulting in low accuracy and robustness. Tis limitation of a square bounding box could be overcome by incorporating a mask DCNN, which addresses an annotated mask based on each pixel of the feature [9]. Tis method uses images that are annotated in a pixelto-pixel manner; consequently, this instance segmentation process assigns a label to each pixel of an image. Te annotation characteristic of mask DCNNs signifcantly improves the estimation accuracy of crack detection and has promoted current research on several efective mask DCNNs [10][11][12][13][14][15][16]. Deeper and wider neural network architectures that are supervised by annotated images in a pixel-to-pixel manner can ensure high accuracy and robustness in crack detection. Several studies have been conducted on classifcation models of DCNNs at the image-patch level or region on two stage detectors, which are sequentially region localizing and then classifying. Crack and non-crack classifcation models are trained at the image-patch level by addressing the crack candidate region [13]. A mask regional convolutional neural network (Mask R-CNN) is also combined with the DCNN using two stage detectors [14]. Tis architecture of a neural network localizes a crack as an object in an image using a region proposal network at the segmentation of pixel-level. However, validation of the proposed method was limited because image availability is limited. A dual-scale CNN classifcation neural network, which combines GoogLeNet classifer at a large scale (224 × 224) and ResNet classifer at a small scale (32 × 32), was also proposed to detect cracks from diferent scales of input images [15]. However, dual-scale CNN models require a heavy computational efort because this architecture

Preliminary
DCNN with the architecture of a two-stage detector enhances detection accuracy in cases where the target object to be detected is obvious. However, cracks in real-world environments are thin and irregularly shaped, suggesting that cracks that are not fully trained by the training image-set might not be detected by a two-stage detector. Terefore, a DCNN with a one-stage detector exhibits better accuracy in real-world applications. In contrast to one-stage detectors, U2-Net employs an autoencoder architecture that consists of an encoder, decoder, and fusion module, all built with the Unet as a fundamental block, which they refer to as the Re-Sidual U-block (RSU) [17]. Tis architecture allows U2-Net to efciently extract feature maps from multiscale and multilevel information at each scale of the encoder and decoder, providing it with a faster processing speed than existing models that rely on backbones for feature extraction. However, a limitation of this approach is that it cannot fully transfer the initial input spatial information to the subsequent scale due to the multiscale and multilevel input manner. A DCNN that was trained with the crack candidate regions and combined with local and global feature regions from the speeded-up robust features (SURF) method and the convolutional neural network (CNN) [13], respectively, has an improved prediction accuracy by utilizing both features as complementary components. However, this method depends on the training image-set and pre-processing step, including the binarization of the crack images in SURF. Tis step can adversely afect a performance of crack detection because the binarization of the crack image in the SURF method can cause an error in the crack image, including a noisy background. Transforming a classifcation CNN model to a fully convolutional model was proposed to produce coarse output maps to remove the correlation in the scale of the input image [16]. Tis method does not require several scales of an input image because the feature maps from diferent scales of images are trained to build the classifer. However, this method is limited in that it cannot classify patterns or noise that are not included in the training image set. Tis limitation suggests that training all types of patterns or noise, including cracks, is impossible. According to the literature survey, two-stage models perform well with respect to the detection area of an imagepatch or region but require heavy computational eforts. Te network fuses results based on features extracted from each level of the U-net. Moreover, these methods still have several limitations, resulting in false detection due to the environment of the training data. In addition, these neural networks also detect non-crack objects, which have features similar to cracks in real-world applications. Specifcally, non-crack objects with crack-like shapes, such as tiles, boundary lines between tiles and tile joints, and wall-toceiling boundaries in backgrounds, are estimated as cracks through mask DCNNs. Tese results considerably limit realworld applications of DCNNs. Moreover, capturing images for training neural networks and annotating cracks in images require great efort. Tis is another hurdle in the use of DCNN for crack detection. Tese limitations motivated the present study to establish an improved crack-detection method that can be used in real-world applications.

Methodology
A complete fowchart of the proposed method comprising four phases is shown in Figure 1. In phase A, a crack mask was estimated from an optical image using the MSML Mask DCNN. Te multiscale multilevel architecture of the proposed neural network ensures a high accuracy and robustness. In phase B, the estimated crack masks were preprocessed using morphological image processing, and a contour flter was used for denoising. Tese masks were modifed to have distinguishable curvilinear lines for calculating the LSI.
In phase C, straight lines corresponding to the preprocessed crack masks were calculated through the Hough transform and then transformed to the principal components according to the principal component axis. In phase D, the LSI was calculated based on the straight lines calculated in phase C and the estimated mask. Te probability distribution of the LSI for crack features, which was constructed using a public image-set, was used to eliminate line-like candidates from the estimated mask. Te detailed processes performed in each phase are described in the following subsections.

Phase A: Crack Detection Using the MSML Mask DCNN.
In this phase, a crack mask was predicted using the MSML Mask DCNN from a measured optical image ((a) in Figure 1). Te MSML Mask DCNN was especially addressed to detect cracks because of its high accuracy and robustness among many mask-based DCNN. Specifcally, it constructs a high-scale feature map from low-scale input images and a low-level feature map from high-scale input images, which implies that a multiscale architecture can efectively extract diferent features at diferent size of images and fuse them to recognize the object of interest. Furthermore, multilevel feature layers conduct elementwise summation of features extracted from shallow and deep feature maps. Tis elementwise summation overcomes the gradient vanishing problem because shallow features summed by elements to features in a deeper layer conserve the semantic information of objects [18]. Hence, this architecture would be efective to extract features at diferent level of complexity.
Te architecture of the proposed MSML Mask DCNN is shown in Figure 2. Tis model is designed to extract cracks of varying lengths from input images recorded in diferent environments. Te theoretical basis of this model lies in the incorporation of multiscale layers in both the encoder and decoder, which enables the extraction of both local and global features. Te multiscale layers in the shallow part of the encoder detect fne and sharp features including edges   International Journal of Intelligent Systems and corners, whereas those in the deeper part of the encoder detect smooth yet complex features including surfaces of various objects [19]. Te feature maps extracted at each scale are then concatenated to estimate the cracks in an image. Tis architecture reduces the size of parameters and is computationally efcient [17,20]. Furthermore, the MSML Mask DCNN includes multiple levels, each consisting of an encoder and decoder pair, with the multilevel encoders capturing multilevel features from shallow to deep. In addition, the MSML Mask DCNN includes multiple levels, each comprising an encoder and decoder pair, with the multilevel encoders capturing features at diferent levels of abstraction. Te encoder which is propagates initial information to lower level helps maintain the spatial information of features, thereby minimizing potential information loss that may occur due to the multilevel autoencoder architecture. Te decoder branch adds 1 × 1 convolution layers after up-sampling and an elementwise sum operation to enhance the learning ability and maintain the smoothness of the features. All outputs in the decoder of the multilevel concatenate the multiscale features of the current level [21,22]. It is worth noting that the proposed architecture difers from that of U2-Net, even though both address an autoencoder architecture. In essence, the proposed model addresses a cascade form, whereas the U2-Net addresses a nested U-net architecture. Further, the proposed method directly transfers the spatial information of input features into the initial encoder across multiple levels, whereas the U2-Net model sequentially transfers input features through the steps of multiscale and multilevel, which may result in a relatively higher potential for spatial information loss. Hence, the proposed method would be more accurate and robust than other neural networks in the architecture of autoencoder. Quantitative evaluation on performances of the proposed architecture as well as those of other neural networks are described in detail at Section 6. Te features extracted from each layer in the frst level execute the convolution operation of the corresponding feature at the same scale and encoder/decoder in the second level extracts the features at each layer of the encoder and decoder. All features extracted from all layers of each scale and encoder/decoder were concatenated to determine the fnal features of the estimation. Ten, the concatenation result executes a convolution operation with a one-by-one flter and a sigmoid function. Tis process results in a fnal crack mask. However, the estimated crack mask includes all line-like features including actual cracks and non-crack candidates. Hence, the non-crack mask should be eliminated from the estimated mask for real-world applications.

Phase B: Preprocessing for LSI Calculation. Phase B aims
to not only acquire a new crack mask with denoising ((h) in Figure 1) but also to generate a mask for calculating the LSI ((g) in Figure 1). To achieve these goals, this study utilizes several morphological image-processing methods, including a dilation method, skeletonization [23], and a contour method [24].
First, a dilation method is applied to connect articulation points in the estimated cracks to retain and reinforce the linearity of the cracks ((b) in Figure 1). Tis method is necessary because an estimated mask for a crack has several disconnected points, even though an actual crack is a long curvilinear line. Te dilation operation ⊕ is a convolution operation between an input image A and a kernel mask B that performs image dilation, which is formulated as follows: where z and (B) z denote the values related to the coordination of kernel mask B and the value of the operation matrix, which is the result of the convolution between A and B transitioned upon z. Second, the subsequent skeletonization method reduces the line thickness of the curvilinear line in the pre-processed mask ((c) in Figure 1). Tis preprocessing method improves the efciency of the LSI calculations. Figure 3 shows a fowchart of the skeletonization method comprising opening and erosion operations. Te erosion operation ⊖ ((c) in Figure 3) is also a convolution operation, as shown below.
where (B) denotes the set of B elements through the operation mask with A and B among z.
Another morphological image processing technique is an opening operation ∘ , which combines the dilation and erosion operations as follows: In the opening operation, the order of methods is important; the erosion operation should be executed frst, followed by the dilation operation. Note that the opening operation is an efective method for eliminating small amounts of noise in this study [25]. In the skeletonization method, opening and erosion operations ((b) and (c) in Figure 3) are executed in parallel with an estimated mask. Te processed result of the former operation is subtracted from the input mask ((d) in Figure 3) and then united to the processed result of the latter operation ((e) in Figure 3). Tis process is iterated until the width of the connection node is less than or equal to one pixel, resulting in a thin skeletonized crack line. Tis mask is input to two subsequent steps, that is, noise fltering ((d) in Figure 1) and the Harris corner detection method ((g) in Figure 1). Noise fltering is used to separate crack and non-crack features, whereas the Harris corner detection method is used to calculate the LSI.
Tird, noise fltering is executed by applying a contour method to a pre-processed mask ((d) in Figure 1). A detailed fowchart of the denoising process using the contour method is shown in Figure 4. Tis method comprises two steps: contour identifcation and noise fltering. In the frst step, a positive pixel, defned as a nonzero pixel, is scanned International Journal of Intelligent Systems and selected. When a positive pixel is selected, the subsequently connected positive pixels with the same direction of rotation as the initial pixel are identifed. Te next positive pixel then becomes the center of the points. Tis process is iterated until the initial positive pixel becomes the center of the points, and the path taken by all tasks is considered the contour of the object. Tis task is repeated until all pixels of the image have been scanned. In the second step, the area of the contour is calculated; and it is determined if the area is smaller than th area , where th area denotes the predefned threshold of the noise size for elimination. Te detected contour is considered as noise if the aforementioned condition is satisfed and eliminated from the input mask. Tis process is iterated until all contours have been inspected, resulting in a denoised output mask. Tis mask is input into the second dilation method ((e) in Figure 1), followed by an intersection operation to generate a denoised mask ((f ) and (h) in Figure 1). Tis mask is used for separating cracks from the non-crack features in phase D.
Te Harris corner detection method, which is used to enable an efective calculation, [26] is executed in parallel with the previous step to eliminate the intersection point ((g) in Figure 1). Te fowchart of the process involved in eliminating the intersection point is shown in Figure 5. Te processed mask obtained from the skeletonization method is the input in this method. In this step, a window of size 3 × 3 was used as a kernel, which was moved along the rows and columns over all pixels. Trough this task, (u, v) was calculated as follows: where u and v denote the moving coordinates indicating the row and column inside the window (ranging from −1 to 1), and x i , y i denote the row and column coordinates inside the input mask. Furthermore, I denotes the intensity value of the pixel. (u, v) indicates the variation in intensity between the center pixel and one of the other pixels around the center pixel. A value greater than the predefned threshold is regarded as a corner or crossing point. (u, v) can be simplifed as follows: where M is defned as follows for efcient calculation: Finally, R is calculated from M to classify the size of the singular value in each direction. Ten, to determine whether the pixel is a corner, edge, or fat portion, the following calculation is performed: where k denotes the weight of a square trace of M to obtain an appropriate value of R in the range of 0.04 to 0.06. Te gap between the R value and zero is calculated. An R value less than zero indicates that the pixel is an edge, whereas an R value close to zero indicates that the pixel represents a fat portion. An R value greater than zero indicates that the pixel represents a corner. Based on this principle, pixels regarded as corner or crossing points are selected. After the pixels have been selected, the region around these pixels is eliminated. If these pixels are not eliminated in this step, the calculated LSI exhibits many errors. Te pixels around the crossing point include the information of lines other than the representative line targeted for the LSI. Terefore, this task results in a mask being used for the LSI calculation in the subsequent phases.

Phase C: Selection of the Representative Crack.
In phase C, a straight line representing an estimated mask is frst calculated through the Hough transformation ((i) in Figure 1) [27]. Several straight lines are generated for one estimated mask in the Hough space, implying that one representative line will be selected among these candidates. Tis representative line is selected by utilizing a mean-shift cluster method [28] in the Hough space because this method is faster than other clustering methods and achieves accurate performance with no limitation in the number of detected lines. Note that both processing speed and accuracy are important for real-world applications. Te representative line is used to defne a candidate region for calculating the LSI, which includes pixels within a predefned distance T with respect to the representative line ((l) in Figure 1). An  intersection operation is also executed for the mask in this region using the mask created in phase B so that the information available in the original estimated mask is used for calculating the LSI ((j) in Figure 1). When calculating the LSI of the line, the result of the intersection operation ((j) in Figure 1) preserves information regarding the targeted line, and the LSI from this mask is the real targeted information of this phase. Principal component analysis is also performed on the resulting mask in the original coordinate system to determine the two principal axes and to efectively calculate the LSI ((k) in Figure 1). Te mask image is then transformed to a hyperplane represented by the frst and second axes based on a principal component analysis, such that the x-axis becomes a representative line, that is, the frst principal axis through this coordinate transformation, whereas the y-axis represents the second principal axis.

Phase D: Eliminating Non-Crack Features.
In this phase, the LSI is calculated to distinguish actual cracks from noncrack features in the estimated mask ((m) in Figure 1). Te LSI is a quantitative metric that represents the variation in

Input mask
Move window along rows and columns where x and n denote the frst principal axis (x-axis) in the transformed coordinate system and the number of lines in the estimated mask crossing the x-axis, respectively. Te subscripts of max and min denote the maximum and minimum values, respectively, of a curvilinear line at the transformed principal coordinates, and f denotes the function formed by the line. Te constant α represents the weight of the second term. Note that the LSI combines two crack characteristics. Te frst is the variation in the crack features with respect to the representative line, which is shown as the frst term on the right side of equation (8). Te second is the number of crack features that cross the representative line, which is the x-axis in the principal coordinate system. Tis is the second term on the right-hand side of equation (8). An exponential function with the weight of α is used to ensure an even contribution of each factor while calculating the LSI. Ten, the calculated LSI is compared to the probability distribution of the LSI values of cracks, which was created from public crack image-sets. Te LSI value of non-crack pixels can be considered an outlier in this probability distribution because the LSI value of noncracks is smaller than that of cracks. Tus, it is possible to distinguish non-crack features from crack features using the proposed method.

Evaluation
Metrics. Two types of metrics were used in this study for the performance evaluation: metrics for the evaluation of the DCNN and for the proposed method. Te former includes the optimal dataset scale (ODS), optimal image scale (OIS), and average precision (AP) [29]. Typical segmentation models use a mean intersection over union (mIoU) metric because it can evaluate clustering or classifcation performances by comparing the pixels of ground truth and predicted masks [30]. However, the mIoU fails to provide a proper performance evaluation of the prediction result over all images in the image-set because the pixels in the predicted masks are used for the calculation of mIoU, which are determined by a threshold in the classifer. Typically, a threshold of 0.5 is used [31]. In other words, the threshold of the classifer plays a critical role in the evaluation of mIoU. However, this predefned threshold has no regular role as a representative value for the dataset, because the evaluated mIoU up to the threshold value cannot be proportional. Tis efect stands out in the dataset containing imbalanced objects such as cracks for anomaly detection [32]. Specifcally, a crack in an image usually comprises a connection of thin pixels. Tese unique characteristics are difcult to evaluate using mIoU. Hence, many previous studies related to crack detection have addressed ODS, OIS, and AP instead of mIoU, including DeepCrack, FPHBN, and CrackIT [10,[33][34][35]. Hence, this study also addressed ODS, OIS, and AP instead of mIoU to evaluate the performance of the proposed method for fair comparison. Note that ODS indicated the best F1 score for all thresholds ranging from 0.01 to 0.99 for all image-sets, whereas OIS indicated the best F1-score in the same range of thresholds for each image in image-sets. AP indicates the average precision of all imagesets, which is equal to the area of the precision-recall curve. Tese metrics can be used to compare the performances of the DCNN architecture for crack detection, regardless of the threshold of classifcation in an imbalanced dataset. Precision is defned as a fraction of the relevant pixels among all the pixels retrieved from the estimated mask. Recall is defned as a fraction of the retrieved pixels among all the relevant pixels obtained from the ground truth. Te metrics used to calculate the ODS, OIS, and AP are Precision, Recall, and F1-score, which are defned as follows: where TP, FP, and FN denote true positive, false positive, and false negative rates, respectively, which are calculated by comparing the estimated mask to the ground truth. TP indicates the number of pixels accurately estimated as positive in the estimated mask compared to the ground truth. FP is the number of pixels incorrectly estimated as positive in the estimated mask when compared to the ground truth. FN is the number of pixels incorrectly estimated as negative in the estimated mask when compared to the ground truth. Te metrics used for the evaluation of the proposed method were M rem , M elim , and F1 M , which indicate the changes in TP and FN. M rem is defned as the ratio of the remaining TP pixels, that is, TP rem , to the TP pixels in the estimated mask, which are obtained using the mask DCNN after employing the proposed method, as follows: M elim is defned as the ratio of the eliminated FP pixels, that is, FP elim , to the FP pixels in the estimated mask, which are obtained using the mask DCNN after employing the proposed method, as follows: where P LSI denotes the number of pixels remaining after the deployment of the proposed method. Both metrics ranged from 0 to 1, where 1 indicated the best performance of the proposed method. Tus, M rem � 1 indicates that all TP pixels remain, and there is no loss in 8 International Journal of Intelligent Systems estimation, and M elim � 1 indicates that all FP pixels are eliminated after deploying the proposed method. However, both metrics demonstrate diferent aspects related to the performance of the proposed method. Consequently, F1 M was calculated to evaluate the overall accuracy of the proposed method as follows: Note that this metric is the harmonic mean of M rem and M elim and thus, F1 M is a representative metric for evaluating the proposed method. However, these metrics should be considered together for quantitative analysis because each metric represents a diferent aspect of performance. A high value of M rem indicates a greater precision in crack estimation, whereas a high value of M elim indicates that one of the cracks has a greater recall in the evaluation of crack estimation while estimating the performance of the DCNN.

Public Image Set.
Four sets of crack images obtained from the literature were used for training, validating, and testing the proposed neural network. Tese four public image-sets, as provided in Ref [10], were acquired and employed for this study. Sample images from each image-set are illustrated in Figure 6, while detailed information regarding all image-sets is presented in Table 1. Te CrackTree260 image-set consisted of 260 road pavement images with a resolution of 800 × 600 pixels, whereas the CRKWH100 and CrackLS315 image-sets consisted of 100 and 315 pavement images with a resolution of 512 × 512 pixels. Stone331 comprises 331 images of stone surfaces with a resolution of 512 × 512 pixels. Tese images were recorded in diferent environments and presented a variety of cracks of diferent origins, suggesting that these images are efective for training and testing the accuracy and robustness of the proposed neural network.
As the frst step, precise annotation was carried out again to secure accurate ground truths for all image-sets, even though public image-sets already provide ground truths. Although the proposed approach may be executable without further precise annotation, the additional annotation was conducted in light of the dependency of crack detection accuracy on the proposed approach's performance. Te provided annotation of crack masks included incorrect and missing annotations for thin cracks and cracks with similar appearances to their environments, making it difcult to quantitatively evaluate neural networks. Our proposed methodology has the ability to determine whether a detected line represents a crack based on the variability of the detected line, which makes additional annotations more useful, especially for detecting minor cracks. Tis can be stated that this was a valuable task of the proposed framework in phase A ((a) in Figure 1). Note that precise annotation plays a critical role to well train the neural network to cognize crack features including real cracks and crack-like objects with crack-like shapes including tiles, boundary lines between tiles and tile joints, and well-to-ceiling boundaries, whereas the strategy in Figure 1 plays a critical role to remove crack-like objects cognized from the MSML Mask DCNN for feld applications. Hence, both tasks are important to inspect cracks in real-world applications. After completing precise annotation, the CrackTree260, CRKWH100, and CrackLS315 datasets were divided into training and validation image-sets. Te Stone331 dataset was specifcally chosen as the test set to evaluate the impact of training on a particular crack image dataset from one environment on crack detection in diferent environments. Te training and validation image-sets were distributed at a ratio of 9 : 1, resulting in 605 training images and 70 validation images without augmentation (Table 1).

Field
Tests. Two feld tests were conducted to demonstrate the efectiveness of the proposed method. One experiment was conducted inside a building, whereas the other was conducted in an underground tunnel equipped with power-transmission facilities. Both structures are made from concrete. Te captured images included crack and non-crack features because these mixed features are commonly captured in real-world applications.
Te efect of light was frst evaluated in feld tests because several studies suggested the use of optical and infrared lights for crack measurements. [10,36,37] Tree types of light intensities (low, medium and high) were used in the measurements under diferent conditions of light emitting diode (LED) lights. Te frst case was without an LED; in the second case, the LED, which was fxed on the camera, was facing 45°away from the crack; in the third case, the LED was fxed facing the crack. Te images captured under diferent light intensities were then used as the input images for the MSML Mask DCNN to generate the estimation mask, as shown in Figure 7. Table 2 lists the efectiveness of the diferent light intensities. Four metrics were used to evaluate the accuracy of the estimated mask (threshold: 0.5). Te results clearly indicate that the F1-score and AP of the images captured under the high-intensity light showed the best scores of 0.7320 and 0.5430, respectively. Tis implies that images captured under higher light intensities are effective for accurately evaluating cracks. Hence, all image-sets were recorded under high-intensity light.
In the indoor experiments, crack images were recorded of cracks present on the ceilings and walls of two buildings, numbered 207 and 310, at Chung-Ang University, Seoul. Buildings 207 and 310 were constructed in 1969 and 2016, respectively. Te images were recorded using See3-cam_CU135, a 4K USB camera (e-con Systems). Te resolution and feld of view were set at 1920 × 1080 and 67°, respectively. Tis experiment aimed to achieve two goals. First, to obtain the stochastic LSI threshold, which is used for classifying the crack and non-crack features, and second, to test the proposed method. Eighteen images were obtained for determining the LSI threshold. Ten, 250 crack and 250 non-crack sub-masks were extracted from these images using a method similar to that used in phase C ((j) in Figure 1). Sub-masks were extracted from the full-resolution image because one image may include two or more target International Journal of Intelligent Systems masks that are line-shaped, and the LSI of each line is calculated. Ten images were considered for testing the model, as exemplifed in Figure 8(a). Tese images include crack and non-crack features, where the non-crack features include tiles on the ceiling or a corner or crossing connected to each wall.
In the underground tunnel, images of structural surfaces such as the ceiling and side wall were captured. Te tunnel constructed in 2001, carries Shingwangmyeong-Yeungdeungpo power transmission lines and is managed by the Korea Electric Power Corporation. Te O&M protocols for underground power transmission lines recommend that tunnel structures should be inspected biannually through a patrol-based inspection. Hence, 2 km of this underground tunnel was inspected. Te images shown in Figure 8(b) were acquired using an M5055 pan-tilt-zoom camera (Axis Communications, Sweden) with a 5x optical zoom. Te resolution, feld of view, and zoom were set to 1920 × 1080, 71°, and 3x, respectively, to record the images accurately. Twenty images were captured from the underground tunnel. Figure 8(b) shows that these images include crack and non-crack features, confrming that noncrack features in the images should be eliminated to accurately detect cracks. Non-crack features included    International Journal of Intelligent Systems transmission lines, hangers, or supports, such as pillars in this experiment. It is worth noting that obtaining multiple images of a crack is difcult because cracks occur randomly and sparsely. Tis is another reason for using the F1-score in the evaluation [38].

Construction of the Proposed Method.
Tis subsection describes the details for development of the proposed method. Tis includes the estimation of the hyperparameters used in the MSML Mask DCNN and other variables used in the proposed method.
In phase A, the MSML Mask DCNN and other neural networks were trained, validated, and tested using a graphics processing unit (GPU) server with an Intel Xeon Gold 5218 CPU with 128 GB memory and four NVIDIA RTX 2080 Ti graphics cards. Out of the original image-sets, from CrackTree260, CRKWH100, and CrackLS315, 90% were used for training, and the remaining 10% for validation. All images from Stone331 were used for the test set. Te resolution of the input images was reshaped to 512 × 512 to improve computational efciency. Te MSML Mask DCNN has two levels of the multiscale Mask DCNN (MS Mask DCNN). Te MS Mask DCNN levels were constructed by combining three and four diferent scales of the encoder and decoder networks. Note that fusing the information obtained from three and four feature maps from a deep and wide neural network enhanced the estimation accuracy. Te larger the scale of the MS neural network, the more accurate the estimation but the longer the calculation time, suggesting that a trade-of exists in constructing neural networks. Hence, this study uses two diferent scales of the MSML Mask DCNNs to analyze the advantages and disadvantages of these neural networks in terms of both accuracy and computational efort. A single MS Mask DCNN with the same structure at each level in the MSML Mask DCNN, Mask R-CNN, and Residual UNet (ResUNet) was constructed for quantitative evaluation in terms of accuracy, robustness, and computational load because they show good performances in the literature [39,40]. Te Bayesian optimization (BO) method iteratively evaluates promising hyperparameter confgurations within a user-defned budget to achieve the best results [41]. In this study, the BO method was used with a trial number of 100 to optimize the hyperparameters of each model and ensure fairness in the comparison of the diferent models. Te use of BO is important to secure the best results for each model and ensure a fair comparison between the diferent models. Te trial number of the BO method was set to 100 to secure optimal hyperparameters. Tis study needs to identify the optimal hyperparameters for the early stopping and Adam optimizer, specifcally patience, learning rate, β 1 and β 2 , and ϵ, when applying the BO method. Te BO method should determine the range for each hyperparameter. Hence, this study selects the range of each hyperparameter based on literatures. Specifcally, patience, which is the hyperparameter for early stopping, has a value between 10 and 20 [42][43][44]. However, this study set a relatively broad range of 10 to 50 to account for variations in model size and dataset characteristics. Several studies have indicated that a learning rate of 0.001, β 1 of 0.9, β 2 of 0.999, and an ϵ of 10 −8 can produce satisfactory results in the Adam optimizer [45,46]. Based on this recommendation from literatures, β 1 and β 2 were set within the range of 0.9 to 0.999, which are not far from the recommended values. Since ϵ and learning rate have a lesser impact on the results than β 1 and β 2 , a wide range was established to explore various combinations of values. Table 3 lists the ranges of hyperparameters used in hyperparameter optimization and the estimated optimal hyperparameters for each DCNN method. Furthermore, as a performance metric for each deep learning model, Frames Per Second (FPS) was calculated for each model's input and output images to compare their performance.
In phase B, the parameter used in the frst dilation method was the number of iterations, which was set to three for the steps shown in (b) and (e) in Figure 1. Te predefned threshold of the area classifying the noise and cracks, which was determined by the contour method, was set to 50 for the step shown in (d) in Figure 1. Te number of iterations used in the second dilation method was set to fve. Te window block size, aperture size, and threshold of the Harris corner value in the Harris corner detection method were set to 3, 1, and 0.005, respectively.
Phase C comprises the Hough transform to identify the representative lines and coordinate transformation followed by principal component analysis. Te Hough transform only has hyperparameters. In the step where the representative line is selected, the width of the pixel around the representative line was set to fve for the intersection operation with the result of the Harris corner detector mask.
In phase D, the decision line, which separates crack features from non-crack features, should be chosen to calculate the LSI. Te image-sets of the crack and non-crack features comprised 750 and 250 images, respectively. Specifcally, 500 images of cracks were randomly selected from the public image-sets listed in Table 1, whereas 250 images of cracks were selected from the images acquired inside the buildings, as mentioned in Section 5.2. Furthermore, 250 images of non-crack features were extracted, as mentioned in Section 5.2. Te right-sided three-sigma value was calculated from non-crack features, whereas the left-sided three-sigma value was calculated from crack features. Te mean of these boundary values was then considered as the decision line in the probability distribution of the LSI values of the cracks.  Table 4 shows the evaluation results in terms of the ODS, OIS, and AP for all validation and test image-sets, and the best scores are shown in bold font for each image-set. Remarkably, the MS Mask DCNN and MSML Mask DCNN show better accuracy than Mask R-CNN and ResUNet, although the calculation speeds of Mask R-CNN and ResUNet are generally faster than those of the MS Mask DCNN and MSML Mask DCNN. Tese results are reasonable because the multilevel and multiscale architecture of a neural network increases both the prediction accuracy and computational loads. Specifcally, the Mask R-CNN model requires both segmented mask information and bounding box information of the object. However, cracks scarcely fll the bounding boxes owing to their diverse shapes, such as irregularly stretched lines or meshes, and because it is difcult to separate each crack as an object. Tis DCNN also performs poorly in instance segmentation of cracks, resulting in a relatively low accuracy compared to other neural networks. Te ResUNet has a similar structure to the MS Mask DCNN, which is based on the auto-encoder, and the results of the ResUNet were approximately equivalent but marginally worse than those of the MS Mask DCNN and MSML Mask DCNN. Specifcally, the mAP of the ResUNet was 90.75% for the validation image-sets of CrackTree260, CRKWH100, and CrackLS315, whereas that of the proposed neural network was 94.19% for three validation image-sets. Moreover, the AP of the ResUNet and MSML Mask DCNN with four layers were 72.65 and 83.26% for the test image-set of Stone331, confrming that the proposed neural network outperforms the  Mask R-CNN and ResUNet. Note that the ResUNet showed better accuracy in the three metrics only for the validation image-set of CrackTree260. Tis can be explained by the fact that the ResUNet would be overftted for the CrackTree260 image-set because this image-set was partitioned into training and validation sets. Note also that the phenomenon a neural network can be overftted to the training image-set is in common and therefore this evaluation of neural network capability focuses on results of the test images-sets in general [47]. Te results for the Stone331 image-set clearly indicated that generality and accuracy of MSML Mask DCNN is better than those of the ResUNet because the Stone331 image-set was measured at diferent environment from the training image-sets. It can be inferred that the ResUNet estimates more false positive pixels than the MS Mask DCNN. Tis diference originates from the architecture of the neural network that extracts feature maps at each scale layer in the auto-encoder. Specifcally, the ResUNet concatenates feature maps corresponding to each scale in the decoder network for augmentation of features and then estimates the mask at the last layer of the network. Consequently, the feature maps of the encoder network are combined with those of the decoder network, and then the feature maps are faded out. In contrast, the MS Mask DCNN utilizes the feature maps together to calculate a loss and to estimate a mask directly.

Results and Discussion
Tis architecture conserves the semantic information of cracks, which are extracted from diferent scales, resulting in better performance for detecting cracks.  DCNN by 2.63, 3.02, 4.44% for four scale layers. Tese results suggest that the MSML Mask DCNN is more accurate and robust for crack estimation using images measured from diferent environments. Tis observation can be explained by the fact that the MSML Mask DCNN not only maintains the advantage of the MS Mask DCNN, but also improves the performance by using feature maps extracted from the second level of the network, which is concatenated to feature maps from the frst level of the neural network. Te feature maps extracted from each scale in the second level of the network also contributed to the estimation of cracks. Terefore, additional feature maps at each scale in the second level of the neural network, which are based on the feature maps extracted from the frst level of the neural network, improve the performance in detecting objects precisely. In conclusion, the MSML Mask DCNN outperforms other neural networks in terms of both accuracy and robustness, even though a complex architecture increases the computational load in estimating cracks. It should be noted that testing the proposed neural network with images from diferent environments is important to secure estimation accuracy and robustness. However, the FPS of the MSML Mask DCNN was slower than that of the MS Mask DCNN. Tis trade-of is an important consideration in real-world applications.

Applications of Each Phase.
Te results for each phase executed using the proposed method are described in this subsection. One image, which included several non-crack features, was used to demonstrate the efectiveness of the proposed method, and a detailed transformation of this image is shown in Figure 9.
In Phase A, the candidate mask is estimated using the MSML Mask DCNN. Figure 9 (a) shows that an image estimated using the MSML Mask DCNN contained several line and pattern features. Line features originated from the connections between diferent tiles, and pattern features are obtained from the patterns on the tiles. Tese features were estimated as crack features because their characteristics are similar to those of cracks. Tis was a common in real-world applications, which degrades the accuracy of estimating cracks through neural networks.
In Phase B, the dilation method is executed on the estimated mask candidates (Figure 9 (b)). Tis method connects the disconnected cracks in the mask. Te skeletonization method is then applied to the dilated mask (Figure 9 (c)). Tis process thins the dilated mask, resulting in a realistic image, from which it is easier to select the representative line for the LSI calculation. Te size of the patterns, which are considered noise in this study, becomes smaller than the predefned threshold; consequently, the contour method eliminates these patterns, that is, noise (Figure 9 (d)). Te mask does not include noise after this step, implying that it can be compared to the original estimation. Te dilation method is executed again because certain pixels are disconnected after implementing the prior steps (Figure 9 (e)), and then an intersection operation is performed between the original mask and the dilated mask, resulting in a new mask that only includes line or curvilinear features without patterns or noises (Figure 9 (f )). Simultaneously, the Harris corner detector is applied to the mask obtained from the skeletonization method. Pixels around the corner or crossing points detected by the Harris corner detector are eliminated to minimize the error of the LSI (Figure 9 (g)).
In Phase C, the Hough transform obtains representative lines for the LSI from the mask after the application of the Harris corner detector (Figure 9 (h)). Tis method transforms all the information in an image into Hough spaces comprising r and θ coordinates (Figure 9 (i)), where r and θ denotes the distance between the origin coordinates in a mask and a line and the angle between the x-axis in a mask and a line, respectively. A line that is thicker than one pixel is selected as a representative line because the LSI is calculated from pixels around the representative line, which is selected based on the Hough transform ((j) in Figure 9). An intersection operation is performed between this mask and the mask from phase B ((g) in Figure 9) to select a targeted mask for the LSI calculation ((k) in Figure 9). Ten, coordinate transformation was performed, followed by principal component analysis to analyze the line and curvilinear features in the frst principal axis ((l) in Figure 9). Tese steps, from phases A to C, prepare a mask for the LSI calculation.
In phase D, non-crack features are eliminated based on the probability distributions of the crack and non-crack image-sets. Figure 10 shows this probability distribution with the decision line, which was calculated from the mean of the three-sigma boundary values (99.7%) for crack and non-crack features. Specifcally, the three-sigma value of the cracks on the left side is 0.7321, whereas that of the noncracks on the right side is 0.6719, resulting in a decision line of 0.7 as the mean of these two values. Tis clear separation of the two features originated from two key characteristics: the variation of crack features with respect to the representative line and the number of crack features that crossed the representative line, suggesting that the decision lines can classify crack and non-crack features. Hence, a subtraction operation in this phase efectively eliminates non-crack candidates in the mask. It is worth noting that the density of the crack features was diferent from that of the noncrack features. Te former is in the range of 0.0 to 0.1, whereas the latter is in the range of 0 to 6, confrming again that the proposed LSI is an efective metric for distinguishing crack features from non-crack features. When the LSI of the representative line was smaller than that of the decision line, Morphology operation Mask operation Intersect operation Subtract operation Figure 9: Results of each phase through the proposed method.
International Journal of Intelligent Systems the representative line mask ((j) in Figure 9) was subtracted from the new mask ((f ) in Figure 9). Finally, the subtracted mask becomes the fnal mask ((m) in Figure 9), where the non-crack components have been eliminated. Tis process was repeated until the number of representative lines had been estimated.

Cracks in Building
Interiors. Figure 11 shows images after deploying the MSML Mask DCNN and the proposed method. Table 5 lists the three values of metrics evaluating the proposed method with the indoor test image-set. Estimating cracks with the MSML Mask DCNN includes both non-crack pixels, which are components of the ceiling tiles, and crack pixels on the concrete wall ( Figure 11 (A-3)), whereas the proposed method eliminates non-crack pixels ( Figure 11 (A-4)). Non-crack features were mostly eliminated, confrmed by the values of M rem , M elim and F1 M , which are 0.98, 0.94, and 0.96, respectively (image No. 8 as in Figure 11(a)). Specifcally, 2% of the estimated pixels among the TP pixels were eliminated, whereas 94% of FP pixels, which are the non-crack components of tiles and notches on ceilings, were eliminated. Tis analysis demonstrates that the proposed method efectively eliminated non-crack features, whereas remained crack features. Te non-crack components eliminated depend on the predefned threshold of the area classifying noise and crack, which is decided by the contour method, suggesting that an appropriate threshold would be important for real-world applications. Noise closely located to the representation line also degrade accuracy, implying that objects around cracks play an important role in detection accuracy. were almost eliminated. Specifcally, cracks located on the inner side of a corner were barely detected because one long crack was detected as several short cracks, and thereby TP pixels were considered noise in the proposed method; please see Figures S1(a) and S1(b), which are enlarged fgures of Figure 11 (B-2) and (B-3). An edge of a corner has a line-like characteristic and therefore the corner negatively afected crack detection because the cracks were located around the corner. By contrast, image No. 10 shows low F1 M because FP pixels were mostly not eliminated, whereas TP pixels were almost eliminated. Exact values for three metrics M rem , M elim and F1 M were 0.83, 0.63, and 0.71, respectively. Tis result is also highly correlated to the objects around the cracks and the geometrical environment. Specifcally, image No. 10 includes several non-crack pixels that are components of the door and ceiling tiles (Figure 11(c)). Figure 11 (C-4) shows that the results of the proposed method retained several noncrack features because of the adjacent lines to the representation line, which have lower pixels than the threshold of the LSI. Te proposed method calculates the LSI from an area of crack candidates with respect to the representation line, implying that the LSI is proportional to the size of the amplitude and variation of the crack candidate. If a crack candidate involves a crossing line because of other objects (as in Figure 11 (C-4)), this crossing line results in an error in the LSI calculation because it amplifes the LSI of the crack candidates. Te Harris corner detection was used in the proposed method to eliminate the area of the crossing point. However, the Harris corner detection method also defnes an appropriate threshold for hyperparameters to detect the corner of crossing points and the results depend on the images. In this case, the crossing points are not fully detected, and this causes an error for a high LSI to be retained as a crack. [48].
Regardless of these limitations, the proposed method shows a good capability for crack detection in real-world applications. To validate the superiority of the proposed method, the accuracy of the proposed method is compared to other state-of-the-art methods in literature [13][14][15][16] (Table 6). Te architecture of the other four methods was replicated from Ref. [13][14][15][16] and then trained with the same public image-sets (Table 1). Finally, the hyperparameters for each method were optimized using the BO method. Te results showed that the proposed method outperformed other state-of-the crack detection methods, suggesting that the proposed deep and wide neural network efectively detects crack features, even though the MSML Mask DCNN was only trained using public datasets. Moreover, the AP of the entire framework fusing the MSML Mask DCNN and LSI shows at least two times higher accuracy than the others, confrming that false positive pixels are successfully eliminated by the proposed method. However, the proposed method is slower than other methods because of the deep architecture of the MSML Mask DCNN and LSI processing, suggesting that a high performances GPU would be indispensable for implementing the proposed method in realworld applications. Tis comparison clearly suggests that the proposed method is efective in real-world applications. 6.4. Cracks in the Underground Tunnel. Figure 12 shows sample images of applying the proposed method on the underground tunnel image-set and Table 7 lists the quantitative results of the proposed method for the three metrics. Image No 12 shows one of the best results (Figure 12(a) Figure 12(c)), implying that only a few FP pixels were eliminated by the proposed method. Tree categories of non-crack features Ground truth Estimation Proposed method Figure 12: Estimated masks of sample images captured inside a building (overlapped for visualization by the crack mask with a dilation of three). (a) to (d) correspond to images no. 12, no. 1, no. 6, and no. 11. Te notation 1 to 4 corresponds to the original image, original image overlapped by ground truth, original image overlapped by the DCNN estimation result, and original image overlapped by the result from the proposed method overlapped on the image.  result in these poor performances (-in Figure 12 (C-3)). In category, the estimated crack-candidates located on the right were eliminated because the Hough transform detected this line. By contrast, the crack candidate located on the left side remained because the Hough transform could not detect it because of the catenary shape of the line. Category includes a large area of several rounded holes that are similar to curved lines. However, a shape made up of several rounded holes can approximate a line and thereby most candidate cracks were efectively eliminated. By contrast, a small area of rounded holes was retained in category because of a resolution problem ( Figure S1(c)). A small area of rounded holes can be eliminated like in category if the image is recorded at a closer distance and has adequate resolution. However, the image was recorded far from cracks, and therefore a small area of rounded holes was considered a small line and not eliminated. Image No. 11 also shows low accuracy (M rem : 0.46, M elim : 0.72, and F1 M : 0.56). Tis image includes the most concerns including many disconnected cracks and rounded holes of TP pixels ( Figure 12(d)). Tis observation indicates that complex facilities in the underground tunnel degrade the accuracy of the proposed method. Tus, automated crack inspection for real-world applications in such tunnels would be extremely difcult and limited. Hence, more eforts should be devoted in future work to increase automated inspection processes. However, the mean values of M rem , M elim , and F1 M are 0.91, 0.71, and 0.76, respectively, for all images from the underground tunnel, suggesting that the proposed method efectively eliminates non-crack features of a line shape even in complex circumstances. It is worth noting that the resolution of images plays an important role in all crackdetection methods. A high-resolution close-up image increases the accuracy of crack detection, which is a reason for most previous studies recording images within a dozen centimeters or using high resolution images [15,16]. However, this study measured images from 1.4-1.8 m distance because it is a reasonable and economic distance for practical applications considering the camera deployed on the mobile robot. It is also worth noting that the measurements were limited by the camera's unchangeable position. A camera facing cracks exactly in parallel enhances the detection accuracy. However, it is difcult to face all internal structures for recording during a patrol inspection. Te resolution and position of the camera are the main reasons for low accuracy in some images.
Regardless of several limitations, the proposed method is more efective than other state-of-the-art crack-detection methods for real-world applications. Table 8 compares the results from the proposed method with those of the other four state-of-the art methods [13][14][15][16]. Te other four methods were trained with the same public image-sets (described in Table 1) and optimized through the BO method. Te results are similar to those from the indoor image-set. However, the estimated APs were higher than those obtained from the indoor image-set. Remarkably, the proposed method outperformed other state-of-the-art crack-detection methods. Specifcally, the proposed MSML Mask DCNN is better than other deep learning-based crackdetection methods, suggesting that the proposed deep and wide neural network efectively detects crack features, even though the MSML Mask DCNN was only trained using public datasets. Moreover, the AP of the entire framework combining the MSML Mask DCNN and the LSI shows two times higher accuracy than the others, confrming that false positive pixels are successfully eliminated by the proposed method. In conclusion, a comparison of the proposed method with other four state-of-the-art methods for indoor and tunnel image-sets clearly demonstrates that the proposed method is efective for real-world applications in complex structures with many objects.

Conclusions
Tis study proposes an integrated framework for efective crack detection in images for real-world applications comprising the MSML Mask DCNN and LSI. Te proposed method estimates crack candidates using the MSML Mask DCNN only trained by public image-sets based on the principle that cracks have distinct features. Employing the proposed method to test the images demonstrated its effectiveness. Specifcally, the MSML Mask DCNN outperformed the state-of the-art neural networks in terms of both accuracy and robustness, whereas the proposed LSI efectively distinguishes non-crack features from cracks. Hence, the proposed method can improve the capability of crack detection on the surface of structures located in complex and various environments. However, the proposed method limits the elimination of other shapes with noncrack features, such as round holes. Future work includes studying a method for eliminating other shapes with noncrack features in crack candidates. Furthermore, a quantitative evaluation of the proposed method should be conducted using more feld test images. A novel architecture of a mask neural network should also be studied to enhance the estimation accuracy of crack detection in structures.

Data Availability
Te data used in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.

Authors' Contributions
Ji-Wan Ham was involved in methodology, formal analysis, writhing the original draft, and tests, Siheon Jeong was involved in data curation, resources, and software, Min--Gwan Kim was involved in annotation of dataset, hard ware, and tests, Joon-Young Park was involved in conceptualization, validation, and visualization, and Ki-Yong Oh was involved in writing, reviewing, editing, supervision, and project administration. 20 International Journal of Intelligent Systems