A Genetic Algorithm and Fuzzy Logic Approach for Video Shot Boundary Detection

This paper proposed a shot boundary detection approach using Genetic Algorithm and Fuzzy Logic. In this, the membership functions of the fuzzy system are calculated using Genetic Algorithm by taking preobserved actual values for shot boundaries. The classification of the types of shot transitions is done by the fuzzy system. Experimental results show that the accuracy of the shot boundary detection increases with the increase in iterations or generations of the GA optimization process. The proposed system is compared to latest techniques and yields better result in terms of F1score parameter.


Introduction
With the growth of the Internet, the generation of multimedia contents is also increasing. This leads to the problem of effective utilizing and managing the video data. Effective utilizing and managing of the multimedia contents need effective indexing and retrieval system. This is much more difficult in the case of video. For an effective video retrieval system, the content of the video should be understood so that proper indexing system can be created for better video retrieval. The content of the video can be taken by first performing the video segmentation, dividing the video into meaningful shots, and analyzing each feature of the segments (shots) which is the key feature of each segment. A scene is a combination of more than one shot with different camera angles or a combination of similar shots.
In video segmentation (shot boundary detection), the video is divided into meaningful scenes so that each scene can be analyzed for finding the key feature(s). Shot boundary detection mainly consists of finding the two types of transitions abrupt transition and gradual transition [1,2]. Abrupt transition (also known as hard cut) is the sudden change of the consecutive frames in a video which marks the scene change due to sudden release of the camera rolling. Gradual transition (also known as soft cut) is of four types: fadein, fade-out, dissolve, and wipe transitions. All these gradual transitions are a result of the editing effect in a video. Fadein and fade-out are caused by the lightness value. In fade-in, a picture appears slowly from a darker (usually black) empty frame. In fade-out, a picture slowly diminishes to an empty frame (usually black frame). Dissolve and wipe transition is an effect due to overlapping of the current scene and the future scene. In dissolve, the overlapping is done in such a way that the current scene starts disappearing and the future scene starts appearing simultaneously. In wipe, the overlapping is done in such a way that the future scene grows over the current scene until the future scene appears completely.

Related Works
Many researchers [1][2][3] have tried to detect the transitions (known as shot boundary detection or temporal video segmentation) in a video in compressed and uncompressed 2 Computational Intelligence and Neuroscience domain. MPEG (Motion Picture Expert Group) provides video formats which provide a large area of analyzing frame features in the compressed domain using motion vectors [4], Discrete Cosine Transform coefficients [5], and so forth. The frame feature extraction can be globally and locally. Global feature extraction considers the whole feature of the frame such as the pixel value [6]. Local feature extraction considers some regions of the frame and the features in that region are only taken or in other senses the necessary/important features of the whole frame are considered. MSER [7], SURF [8], and so forth are some of the popular local feature descriptor used for shot boundary detection. These features are extracted from each frame of the video and calculate the differences between consecutive frames to find out the transitions. The gradual transitions are rather difficult than the abrupt transition as it may have the same effect with large object motion and camera motion [1]. Thus, it is necessary to extract features which give less/no effect with large object motion, camera motion, or lighting effect.
Intensity histogram and Color Histogram Difference are of the effective, simple, and widely used methods for shot boundary detection in the uncompressed domain which is not sensitive to motion [6]. In [10,11], SVD is applied to frame histogram matrix and a similarity measure is applied to find out the abrupt and gradual transitions. In [10], consecutive frames between two frames are skipped for analysis, which reduces the computational time drastically. In [9], HSV color histogram and an adaptive threshold are used for shot boundary detection and also the algorithm can detect flashes. In [8], entropy and SURF features are used to find the cut and gradual transitions where the intensity histogram is used to calculate the entropy of a frame.
Genetic Algorithm [12,13] and Fuzzy Logic [6,14,15] have been used for shot boundary detection. In [16], color histogram is generated using Fuzzy Logic for abrupt and gradual transition detection. In [17], an Adaptive Fuzzy Clustering/Segmentation (AFCS) algorithm is proposed and the fuzzy clustering algorithm is used for image segmentation where it takes into account the inherent image properties like the nonstationarity and the high interpixel correlation. A Multiresolution Spatially Constrained Adaptive Fuzzy Membership Function is used for tuning the AFCS. In [18], Genetic Algorithm is used to generate the membership function of the fuzzy system for image segmentation.
In this paper, we introduced a method of shot boundary detection using Fuzzy Logic system optimized by GA. Fuzzy system is used to classify the video frames into different types of transitions (cut and gradual) using normalized Color Histogram Difference. GA is used as optimizer to find the optimal range of values of the fuzzy membership functions. The result shows that the combination of this feature is efficient and the accuracy increases with increase in iterations/generations of GA.
The paper is organized as follows. Section 3 explains the feature extraction of the system. A detail explanation of the GA optimized fuzzy system to find out that the range of values of the membership functions is given in Section 4. Experimental Results and Discussion and Conclusion are given in Sections 5 and 6, respectively.

Feature Extraction
This section discussed the feature extraction used in our proposed system.

Color Histogram Difference.
Color histogram is a global feature extraction technique which is one of the simplest and widely used image feature extractions for shot boundary detection [19]. It is nonsensitive to motion [6,14]. In [6], the normalized color histogram between two frames, say th and ( + 1)th frames, in a video is defined as follows: where is the number of pixels in a frame, is the number of red pixels of th frame in th bin, and vice versa. , , and represent red, green, and blue components of a frame. It is observed that (1) yields a value with an interval [0, 1]. HD yields a value 0 when the th and ( +1)th frames are same and the HD value goes on increasing as the similarity between th and ( + 1)th frames decreases.

Fuzzy Logic System with GA Optimization for Finding the Value Range of the Membership Function
Genetic Algorithm (GA) is used as optimizer to find optimal values of the membership functions of the Fuzzy Logic system [20,21]. The steps are shown as follows. Variable HD −1 is the histogram difference value which is the difference between ( − 1)th and th frames; (c) HD +1 is with linguistic values negligible (N), small (S), significant (Sig), large (L), and huge (H); Variable HD +1 is the histogram difference value which is the difference between ( + 1)th and ( + 2)th frames.   Variable transition is the type of transition that can occur from one frame to another. no represents the frame where there is no transition.
The rule base consists of 28 rules of the form as in [6]. In Table 1, rules for detecting no transition (frame without any transition) are given. For detecting gradual transition and abrupt transitions, the rules are provided in Tables 2 and 3, respectively.

Optimization with Genetic
Algorithm. GA will be used to find the range of values of the membership function. We use the triangular membership function. The values of the input variables HD , HD −1 , and HD +1 range from 0 to 10. The

Initialization.
The unknown variables in this problem are the lengths of the bases of the five membership functions negligible, small, significant, large, and huge which will be same for the three input variables HD , HD −1 , and HD +1 . We will use 6-bit binary string to define the base of each five membership functions. The five strings, each of 6 bits, are then concatenated to form a 30-bit string which will be a solution for the population.

Evaluation.
The strings are mapped/encoded to values representing the lengths of the bases of the membership functions. This mapping process is computed using the following equation: where ( ) min and ( ) max are user-defined constants and they are usually chosen as the minimum and the maximum value of the variable. is the decimal value of each substring, is the number of bits in each substring, and base ( ) is the th base of the membership functions.
In the beginning, the GA randomly creates a population of 10 strings. For a string, the five bases of the five membership functions are calculated using (2).
Using the bases, we then find the initial, middle, and the final value (i.e., , middle, and ) of the triangular membership functions of the linguistic values as given in Table 4.
, middle, and are the initial, middle, and the final value of the triangular membership functions of the linguistic values.
is the fuzziness index which is a constant.

4
Computational Intelligence and Neuroscience We then find the degree of the membership of the values in Table 6 using the rules. Using the degree of membership of the values in a rule, we then find the weight of the rule.
We have the following rule: if(HD is huge) and (HD −1 is negligible) and (HD +1 is negligible) then ( ) is abrupt.
We find the degree of membership of the values contained in the rule as follows:  In this way, we then find the weight of all the 28 rules. Using the weights, we then compute the crisp output for row input values in Table 6 for a string/solution: where V 1 , V 2 , V 3 , . . . , V 28 are preset values determined by us which is either 0, 5, or 10.
The sum of the squares of the above difference between crisp output and actual output for all the values in Table 6 becomes the fitness equation. The equation is shown as follows: The fitness is subtracted from 1000 to convert the function from minimization to a maximization problem. The above processes are repeated for all the strings/ solutions of the population to find the fitness of all the strings.

Selection.
We then choose a set of strings whose fitness value is greater than some specific number.

Reproduction.
The population is modified using operators, namely, crossover and mutation.
These whole processes (evaluation, selection, and reproduction) are repeated for many generations and finally we then choose the bit string with largest fitness value.
This string with the largest fitness value will give the most optimal range of values for all the membership functions of the linguistic values.
After the GA finds the optimal values for the membership functions of the Fuzzy Logic system, the rule evaluation and the defuzzification procedure of the fuzzy system will start.

Rule Evaluation.
We need to find the degree of membership of the linguistic values of the input variables of the fuzzy system in the range of 0 to 1. We used the triangular membership function to find the degree of membership for the input variables. As shown in Figure 1, 1 to 5 and 1 to 5 are the range of values for a variable of a particular linguistic value.

Defuzzification.
To find the crisp or actual output which is either no transition, gradual, or abrupt, we calculate the weights of the set of rules of the fuzzy system using the degree of membership. Finally, we can calculate the crisp output by using (4).   strings of the first generation GA operation with their decimal values, base values, value range of the membership function, and the fitness value. The strings are sorted according to their fitness value. The fitness is calculated as a difference between the actual outputs of some input data as shown in Table 6 and   6 Computational Intelligence and Neuroscience     the crisp output of the same input data calculated using the membership function optimized by GA. Table 8 shows the string with largest fitness value in different generations. We can see from the table, as the generation increases, that the fitness also increases. Figures 2 and 3 show the graph of shot boundary detection of two videos by our Fuzzy-GA system. Theaxis represents the iteration/generation of the GA operations. The -axis represents the gradual and abrupt transitions of the video frames by our Fuzzy-GA application. We can see from the graph, as the iteration/generation increases, that the detection of the transition of the frames also increases. In Figure 2, it is observed that, using the range of the membership function value obtained in 50000 (5K) iteration/generation of the GA optimization given in Table 8, our proposed system detects 20 gradual transitions and 44 abrupt transitions. The actual gradual and abrupt transitions of the video are 26 and 45, respectively, as given in Table 5.
Using the membership function value range of 10000 generations shown in Table 8, we then find the degree of membership of the linguistic values of the input variables present in the rules. We then calculate the weights of the set of rules using the degrees of membership. The weights of the 28 rules starting from rule number 0 are 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2939, 0.0829, 0, 0.2464, 0.0639, 0, 0, 0, 0, The proposed system is compared with the latest techniques SBD using SVD and pattern matching [10] and SBD using Color Feature [9] and shows better performance in terms of 1score parameter. A comparison of the computational time is also provided in Table 9.
The computational time of the proposed system for all the videos in Table 5 is provided in Table 10. For each iteration/generation, the computational time includes the approximate time taken in seconds by the GA process, feature extraction, and the shot detection of the proposed system for all the videos.
In Table 11, recall, precision and 1score are represented by , , and 1, respectively.

Conclusion
This paper proposed a shot boundary detection using Genetic Algorithm and Fuzzy Logic. In this proposed system, GA is used as an optimizer for the fuzzy system. The GA system uses a preobserved actual input output values of shot boundaries for some videos for calculating the range of fuzzy membership values for the fuzzy system. The fuzzy system is used as a classifier which classifies the frames into abrupt and gradual transitions by using GA as optimizer. Normalized Color Histogram Difference is used for feature extraction and for finding the differences between two consecutive frames in a video. From the experimental result, it is observed that the detection of shot boundaries increases with increase in iteration or generation of the GA optimization process. Experimental results show that the proposed system yields better results and low computational time as compared with the latest techniques.