Blotch Detection in Archive Films Based on Visual Saliency Map

Degradations frequently occur in archive films that symbolize the historical and cultural heritage of a nation. In this study, the problem of detection blotches commonly encountered in archive films is handled. Here, a block-based blotch detection method is proposed based on a visual saliency map. The visual saliency map reveals prominent areas in an input frame and thus enables more accurate results in the blotch detection. A simple and effective visual saliency map method is taken into consideration in order to reduce computational complexity for the detection phase. After the visual saliency maps of the given frames are obtained, blotch regions are estimated by considered spatiotemporal patches—without the requirement for motion estimation—around the saliency pixels, which are subjected to a prethresholding process. Experimental results show that the proposed block-based blotch detection method provides a significant advantage with reducing false alarm rates over HOG feature (Yous and Serir, 2017), LBP feature (Yous and Serir, 2017), and regions-matching (Yous and Serir, 2016) methods presented in recent years.


Introduction
With the archive films that have been used since the last periods of the 17th century, it is aimed to convey the social, cultural, and historical events experienced by societies to future generations. In America and Europe, research committees were formed to investigate the limitations and usefulness of archive films. As a result of research studies, the benefits of films have been brought to the forefront and the interest in the film industry has increased rapidly, and this sector has gained an important place in the society. e videos taken during that period were stored with films made of light and sound sensitive materials. e analogue storage method used for the storage and transmission of these films, which is a valuable cultural and historical heritage for today's societies, does not seem to be a reliable method anymore [1] because these films are likely to be exposed to physical deformations during transport due to poor storage conditions. In addition, biological and chemical deteriorations may occur on films stored for a long time. ese films, which are the cultural heritage of the world, guiding next generations and a window to the past, are very important for archivists and historians in terms of keeping their originality intact [2].
Degradations in archive films can be classified as image flickering, blotches, scratches, noise, and image destabilization. Stain particles in the film projector are able to induce scratches, which are vertical lines and are almost in the same spatial position in neighbouring frames, pending transmission. Image flickering occurs due to perversions in the exposure time. Grain noise comes about due to minor deformity in the chemical layer of the film. Image destabilization occurs when recording a video with the handheld camera. However, one of the more frequent and annoying deteriorations is blotches [3]. e blotches are usually caused by dust or chemical deterioration that accumulates over time on the films and are characterized by being irregular and approximately constant intensity, dispersed in different spatial locations, creating a temporary discontinuity. In such degraded regions, there is no correlation between the current frame and previous or next frames [4][5][6][7]. For restoration of these degradations, a flow diagram is given in Figure 1. It is very important that the archive films are restored so that they cannot be noticed by an impartial observer. However, manual restoration of a video sequence having a lot of degradations is difficult. erefore, semiautomatic restoration techniques are generally preferred [8]. e restoration of blotches is carried out in two steps. First, the degraded regions of the video sequence are detected. e degraded regions are then concealed using information from defectfree regions. Many of the blotch detections in the literature have focused on the possibility of the blotches occurring at different spatial positions in successive frames [9][10][11][12]. erefore, both previous and next frames for the current frame are generally used as a reference for detecting the blotches. However, the blotch detection phase becomes a   [7]. Here, we are only interested with the blotch detection phase.

Current frame Previous frame
Frame pyramids Motion-compensated current frame Blotch detection Figure 2: Motion estimation and compensation procedures for the blotch detection phase [9]. Here, a hierarchical approach is used for shrinking blotch regions and estimating large displacements.
2 Complexity very difficult problem since moving objects also cause temporary discontinuities. According to the studies, due to the fact of high cost of the blotch restoration, it is seen that more manual procedures are needed when compared with the correction of other artefacts [13]. In the studies, it was aimed to restore the blotches in archive films without decreasing the frame quality, disrupting the originality of the frame, and by minimizing human intervention.
A typical blotch restoration consists of the following steps: (i) Estimating motion between frames in the preprocessing (ii) Applying a blotch detector considering temporal discontinuities (iii) Checking results of the blotch detection (postprocessing) to improve accuracy (iv) Concealing the blotch regions ere are some challenges in the blotch detections, as listed below: (i) e need to predict the motion around the degraded regions in blotch detection algorithms, but the information in these areas is not safe (ii) e risk of false alarm (FA) rate may impair the original flow of the film (iii) Too much data to be processed to ensure temporal consistency in the film To eliminate the difficulty in the first item above, the hierarchical block matching motion estimation can be considered in classical blotch detectors, as shown in Figure 2.
Here, motion in a blotch region is estimated by a mean value of the three neighbour motion vectors with the lowest summed absolute difference. Note that there is no motion change very fast between the successive frames, as depicted in Figures 3(a)-3(e). e study in [9] can be examined for more details about motion estimation. erefore, if motion compensation can be done effectively, FA rate decreases.
In our approach, it is also aimed to detect the blotches in archive films to minimize especially the error rates. erefore, a blotch detection method using a visual saliency map is proposed. e visual saliency map focuses on areas that are visually distinct in complex scenes [14,15]. In this context, it is a useful option to focus the visual saliency map only on blotches in a specific area to resolve the problem in the last item above.
is article is organized as follows. Section 2 describes blotch detection studies. e proposed blotch detection method is detailed in Section 3. Experimental evaluation is presented in Section 4. e results and perspectives are given in Section 5.

Related Studies
Since almost every nation restores historical archive films in general, there is no public dataset. Different methods have been developed for blotch detection applications [6][7][8][9][10][11][12]. e study in [1] can be used to obtain more detailed information about research studies in this field. Here, the most classical approaches and recent studies will be taken into consideration. ese studies have been conducted on the basis of temporal discontinuity, different intensities, and shape characteristics of blotches [9][10][11][12]. e performance of these studies varies depending on the characteristics of blotches and the efficient handling of the complexity of archive film content. e main feature observed in most of the previous approaches was the comparison of the low temporal correlation blotches with the real objects having high correlation on the frames [13]. e most common approach to the blotch detection problem is as follows: when differences of the pixel values in the current, the previous, and/or the next frames are high, the current pixel is classified as a blotch. Spike detector index (SDI), ranked order difference (ROD), autoregressive (AR), and Markov random field (MRF) detectors are some of the basic blotch detection methods. e SDI is used to detect temporal discontinuities in image intensity values [10]. e SDI compares each pixel value of the current frame with the corresponding pixel values in the motion-compensated frames in the forward and backward directions, depending on the minimum square difference. e ROD is a detector based on ranking statistics [9]. e underlying assumptions of the AR detector are the ability to predict degradations in motion-compensated previous or next frames based on the AR models of nondefect regions [12]. e MRF detector considers a virtual blotch detection frame based on a model created from degraded regions [12]. It has been observed that the performance of these methods varies greatly depending on the accuracy of motion estimation, which is quite difficult in complex scenes and degraded image sequences. In fact, the AR and MRF models, which have more computational complexity, produce almost the same performance as the SDI [9,12]. e use of textural information in the region of interest has been suggested in addition to these region-based methods in blotch detection problems [1,11]. In the first step of the former approach carried out by Yous and Serir, candidate regions were extracted by time-spatial segmentation. In the second step, the blotches in the candidate regions were determined based on the gradient map [11]. Yous and Serir proposed a new detection method in their later approach, using the feature extraction approaches in [1]. In the approach, histogram of directional gradients (HOG) and local binary pattern (LBP) features were used to determine a blotch region by measuring the similarity between the previous and the next frames. us, the use of feature descriptors provides more robustness against motion estimation errors and brightness changes.
Wang and Mirmehdi [6] presented a blotch detection algorithm consisting two steps. In the first step, video sequences were trained by using hidden Markov model (HMM) and then leave-one-out process was applied to detect blotches. In the second step, a Markov network was used to provide spatial continuities and pyramidal implementation of Lukas-Kanade feature tracker algorithm was considered to reduce the FA rate.

Complexity
Licsár et al. [8] proposed a hierarchical gradient-based motion prediction method to detect blotches in archive films. In addition, in the last step of the method, they reduced the FA rate by classifying the spatial properties of candidate blotched regions with support vector machine (SVM).
Xu et al. [13] suggested a preprocessing step to eliminate illumination change, flickering, and noise between frames for a better blotch detection. en, in the method, a regionmatching algorithm was developed by using the size of regions, difference in intensity, edge type, area, and other statistical characteristics.

Proposed Blotch Detection Method
In a typical blotch restoration application, there are process steps such as motion estimation (ME) and then motion compensation (MC). However, FA rate may increase due incorrect MC when blotch detection is carried out in complex scenes. In our method, a numerical model mimicking the saliency map of the human visual system is used [15] to overcome some problems occurring in the blotch detection applications. In the proposed method, by using visual saliency map, the blotch detection is performed in areas different from environmental factors by evaluating 4 Complexity only the specific regions of the frame, not the whole frame. In the following, a detailed explanation of our approach based on the visual saliency map is given.

Visual Saliency Map.
Researchers have suggested the use of physiological and psychological aspects of the human vision system (HSV) in computer vision studies [16]. An important aspect of HVS is that it considers the visual saliency map. is model is the process of selecting the most prominent points in our field of vision; in other words, these regions are generally called "saliency points" or "interesting points". e saliency map allows the HVS to extract only useful information from the visual field [17] and also helps organize the visual information more quickly [18]. e cost of computing information can sometimes be an important problem in computer vision applications. e saliency map models may use many computer vision algorithms in the initial stage to find the prominent areas of the given image.
In this way, only saliency regions can be considered in detail, which reduce the computational complexity of the algorithms. In fact, the human vision system is an interaction between bottom-up or top-down mechanisms [19,20]. e top-down mechanism is context-aware while the bottom-up mechanism is the opposite [21]. A mechanism from bottomup was proposed by Itti et al. [22]. Based on these studies, a number of saliency map approaches have been developed so far [23][24][25]. Montabone and Soto proposed another visual saliency feature extraction (VSF) method that considers the original image (frame) resolution [15]. e retina of the human eye consists of ganglion cells. ere are two types of ganglion cells: on-centre and off-centre. e on-centre ganglion cell responds to bright areas surrounded by a dark background, while the off-centre one responds to dark areas surrounded by a bright background, as shown in Figures 4(a)-4(d). e VSF algorithm calculates the saliency map considering on-centre and off-centre differences.
In our algorithm, we choose the VSF method, which is simple and also calculates the integral image in O(1) time complexity [15]. e VSF method is briefly explained below.
Let us define the pixel coordinate in the gray-scale image as p � (x, y) and the corresponding pixel value as G(p). In the method, firstly, the colour input image is converted to a gray-scale image. Secondly, the integral image of this grayscale version is calculated as follows [15]: Here, it is assumed that the image is scanned from top to bottom and from left to right. In this context, any rectangular region with the upper-left corner coordinate p 1 � (x 1 , y 1 ) and the lower-right corner coordinate p 2 � (x 2 , y 2 ) is given below [15]: (2) In the method, centre and surrounding calculations are then performed as follows [15]: where l ∈ 12, 24, 28, 48, 56, 112 { } represents the surround. Finally, depending on-centre and off-centre differences, the saliency maps are generated as follows: where VSF on,l � max C(p) − S (p, l) , 0 and VSF off,l � max S (p, l) − C(p), 0 . In the proposed method, the oncentre difference approach M ≡ VSF on is used for computing the saliency map for each frame.

Blotch Detection Using Visual Saliency Map
e aim of our approach is to develop a new model to improve blotch detection performance by reducing the FA rate. It has already been mentioned in the archive films that blotches caused by dust and dirt particles adhering to the film surface can be of random spatial positions, intensities, and sizes in consecutive frames. erefore, in the proposed method, in order to detect the blotches with different intensities compared to their neighbours, the visual saliency map is used to take advantage of the human visual system. e general mechanism of the proposed system is shown in Figure 5. In this context, blotch detection application is developed by using the visual saliency maps obtained from the consecutive frames. e saliency patch maps obtained from the current, previous, and next frames of the spatiotemporal adjacencies of the current pixel p are respectively M (m,n) , which are divided into patches to yield the corresponding Y 5 t , Y k t−1 , and Y k t+1 , where c � 0, 1 { }. Here, the patch size is set to 5 × 5 for using the VSF model efficiently [26]. k = {1, 2, ..., 9} denotes the adjacent patch number, (m ± c, n ± c) represents the adjacent patches which are set based on the ratio of the frame size to the patch size, and the index t indicates the frames in the temporal space, as assigned below:

Complexity
Considering the frame is scanned from top to bottom and from left to right, M t (p) denotes the processed pixel of the patch centred at point p at the top-left corner of the current frame and M (0,0) t indicates the 5 × 5 corresponding patch (see Figure 6).
In the proposed method, if any pixel M t (.) in the current patch Y 5 t is greater than a certain threshold, a saliency pixel is determined. Here, the certain threshold (th 1 ) is experimentally set. After one or more saliency pixels is determined in the current patch, the total number of saliency pixels in the temporal patches Y k t−1 and Y k t+1 are compared with another threshold (th 2 ) to check whether the current pixel is in a blotched region or not as follows: Check all saliency pixels in the temporal patches Y k t±1 . If any pixel M t±1 (·) in the temporal patches Y k t±1 is greater than th 1 then

Complexity
where the threshold (th 2 ) is set to the optimum value according to the related ROC curve. e neighbourhoods of the patches in the previous and next frames are examined in order to detect block-based blotch regions without motion estimation. Namely, in the previous and next frames, the saliency points in the search regions are taken into consideration, thus ensuring that the proposed method, similar to the block-based motion estimation, is not affected by any motion. In our method, a sharpener filter is applied to the visual saliency maps to make the saliency pixels even more marked. e flow diagram indicating the general operating principle is presented in Figure 7. When blotches in the archive films are not very obvious, the threshold th 1 should be set depending on this situation in order to be detected them. As shown in Figure 8, when th 1 is set to 80, it is also possible to detect blotched regions that are not obvious. However, since there are not many such regions in the archive films processed in this study, th 1 is set to 100.

Experimental Results
In this section, the proposed method is block-based and compared to the SDI [10], which is a simplest pixel-based method, and other block-based methods used in blotch detection problems [1,11] on five several datasets. e blotch detection system of the proposed method is implemented by using the MATLAB script language on the computer which has i7 CPU and 8 GB RAM.

Datasets.
For the performance evaluation of the methods, 'football_1' dataset consisting of dark blotches and "football _2" and "calendar" datasets consisting of blotches with gray-scale blotches and other archive films, called "yesilcam_1" and "yesilcam_2," with real blotches are used, respectively. e "football _1" and "football _2" dataset consist of scenes with no complex motions. e "calendar" dataset is frequently used for testing in many studies.

Performance Evaluation of Blotch Detection.
Here, qualitative results are evaluated first to measure the performance of the methods. In the first step of our approach, as can be seen from Figures 11(a)-11(e), saliency maps are generated by the VSF approach [15] for the blotched region detection. In the proposed method, successive frames are used to separate blotches from other regions. In addition, moving objects in frames must be estimated for reducing the FA rate. In the second step of the method, the saliency points are compared with the gray-scale intensity values in a certain neighbourhood in the previous and the next frames and thus the blotched regions are detected. For all experimental studies performed on the frames with artificial degradations, results are obtained depending on 20 different threshold values. Here, the appropriate threshold value is chosen, taking into account the correct detection and FA rates. As can be seen from the results given in Figure 12, the performance of the results obtained by the studies considering the HOG feature [1], the LBP feature [1], the regions-matching [11], and the SDI [10] is almost faultless since the artificial blotch regions contain only black colour. In the results of Figures 13 and 14, it is understood that if the colour values of the blotched regions are different, these blotched regions cannot be detected in the applications performed by the HOG feature, the LBP feature, the region matching, and the SDI as well. Note that the FA rate is reduced while the correct detection rate does not change regardless of the colour values in the blotched regions in the proposed method.
As can be seen from Figures 15(a)-(15(e) and 16(a)-16(e), almost similar results are obtained for two archive films containing degraded original frames as well, when compared to the above-mentioned results.
For quantitative results, ROC curves obtained by calculating the mean value of true positive rate and FA rate for each threshold value over the image sequences for "football_1," "football_2," and "calendar" are given in Yes Split M into patches and if any pixel M t (·) in the current patch Y t 5 is a saliency pixel, consider 18 patches in its adjacent in the previous and next frames.

Blotch regions Other regions
Calculate the total number of saliency pixels S t±1 in the temporal patches Y k t±1 .

Yes No
If is any pixel M t (·) in the current patch Y t 5 greater than a certain threshold?  As can be seen from these figures, the best quantitative results are obtained by the proposed method compared to the HOG feature [1], the LBP feature [1], the region matching [11], and the SDI [10], except Figure 17(a). However, note that "football_1" dataset contains only black coloured artificial blotches, which is not actually compatible with real world problems.
Here, when the motion estimation method presented in [9] is also applied to the proposed method, the results are almost not changed as seen in Figures 17(a)-17(c). However, it should not be forgotten that the motion estimation increases the computational complexity.
In our method, recall (true positive rate), precision, and FA rate are used for quantitative performance criteria. Recall is the ratio of the truly detected blotched regions to the actual blotched regions. Precision is the ratio of the truly detected blotched regions to the total detection. FA rate is obtained by dividing the false positive regions that are not actually the blotched regions by the blotch-free regions in the whole frame.
Finally, the mean values of recall, precision, and FA rate for evaluating quantitative performance of the methods are given in Table 1. ese statistical results in Table 1 depict the superiority of the proposed method over other methods [1,10,11].

Conclusion and Discussion
In our approach, the degraded achieve films containing blotches are detected using the saliency map approach. In this context, the proposed method has been applied on the three films, artificially generated with blotches containing different colour values, and on the two original degraded achieve films with blotches as well. e results of the proposed method obtained for the artificial degraded frames are compared to the HOG feature [1], the LBP feature [1], the region matching [11], and the SDI [10] in the literature. It is seen that the proposed method generates a better detection result independent of artificially coloured blotches compared to those methods in [1,10,11].
In the future study, the performance of the proposed method will be improved by conducting research studies on frames containing more complex scene structures.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.