Coding B-Frames of Color Videos with Fuzzy Transforms

We use a new method based on discrete fuzzy transforms for coding/decoding frames of color videos in which we determine dynamically the GOP sequences. Frames can be differentiated into intraframes, predictive frames, and bidirectional frames, and we considerparticularframes,called Δ -frames(resp.,R-frames),forcodingP-frames(resp.,B-frames)byusingtwosimilaritymeasures based on Lukasiewicz 𝑡 -norm; moreover, a preprocessing phase is proposed to determine similarity thresholds for classifying the above types of frame. The proposed method provides acceptable results in terms of quality of the reconstructed videos to a certain extent if compared with classical-based F-transforms method and the standard MPEG-4.


Introduction
A video can be considered as a sequence of frames of sizes  × ; a frame is an image that can be compressed by using a lossy compression method.We can classify each frame as intraframe (for short, I-frame), predictive frame (for short, Pframe), and bidirectional frame (for short, B-frame) which is more compressible than I-frame.A B-frame can be predicted or interpolated from an earlier and/or later frame.In order to avoid a growing propagation error, a B-frame is not used as a reference to make further predictions in most encoding standards except in AVC [1].A frame can be considered as a P-frame if it is "similar" to the previous I-frame in the frame sequence; otherwise, it must be considered as a new I-frame.This similarity relation between a P-frame and the previous I-frame is fundamental in video-compression processes because a P-frame has values in its pixels very close to the pixels of the previous I-frame.This suggests to define a frame containing differences between a P-frame and the previous I-frame, called Δ-frame which has a low quantity of information and hence it can be coded with a low compression rate.A P-frame is decoded via the previous I-frame and the Δ-frame.In the MPEG-4 method [2,3], that adopts the JPEG technique [4] for coding/decoding frames, the I-frames, P-frames, and B-frames are arranged in a Group of Picture (for short, GOP) sequence.A B-frame is reconstructed by using either the previous or successive I-frame.Here the results of [5] are improved by using a technique based on F-transforms for coding B-frames.For convenience, we assume that the first frame of a video is an I-frame.We assign an ID number to each frame of the video.Then we can say that the th frame is a B-frame or a P-frame if it is "very similar" to the previous th I-frame in the sense that its similarity Sim(, ) a parameter defined on the Lukasiewicz -norm (see formula (12)) is greater than a threshold value Sim [5]; otherwise the th frame is assumed to be a new Iframe as the first frame of the successive GOP sequence.
The first algorithm is used for determining the GOP sequences; the second algorithm is used for determining the type of P-frame or B-frame.The first frame of the GOP sequence is always an I-frame and the last frame is a P-frame.The function "analyze GOP sequence (ID1, ID2)" reported in Algorithm 1 describes this process, where ID1 is the ID of the first I-frame and ID2 is the ID of the last P-frame in the GOP sequence.This function is used for determining if the th frame in the GOP sequence, where ID1 <  < ID2, is a B-frame or a P-frame.We define a threshold similarity Sim, and we compare it with the frame whose ID is formed from the integer [] contained in the mean  of the previous Iframe or P-frame and the th frame by obtaining a similarity value Sim(, []).In the array element NP[] we insert the ID number of the last frame after the th frame for which Sim(, []) < Sim holds.The variable  contains the ID number of the previous I-frame or P-frame; it is initially called ID1; the variable  points to the last frame in the GOP sequence; it is called ID2.In our approach we determine a GOP sequence at each step.The frame after the last P-frame is the I-frame of the new GOP sequence.After determining the GOP sequences of the color video, we use the F-transforms [5,[7][8][9][10] for compressing the frames.The F-transform method has been developed in [5].In this paper each frame is converted in the  space.Indeed, since the human eye perceives an image mostly in the  band (brightness) with respect to the  and  bands (chrominance), we can use a stronger compression rate for coding the image in  and  bands with respect to that one used for coding the image in the  band, without loss of information in the reconstructed image.In [5] the authors show that the quality of the reconstructed images is better than the one obtained using the F-transform method directly in the  space (see also [11,12]).The proposed method is widely discussed in Section 4. In Sections 2 and 3 the theory of F-transforms and its application are recalled for image compression, respectively.In Section 5 the results are deduced on a large color videos dataset.

Our Proposal
The proposed process includes the following steps: (1) each color frame, seen as a fuzzy relation, is converted from the space  to the space ; (2) a classification of the frames is made via the previous algorithms; (3) the compression rate   =   (  ) of the I-frames is the mean of three (possibly different) compression rates used in the three bands, that is, if any block   of an I-frame has sizes (say)   (  ) ×   (  ) in the band  and is coded to a block of sizes (say)   (  ) ×   (  ) for which the related compression rate is given by   =   (  ) = (  (  ) ⋅   (  )) ⋅ (  (  ) ⋅   (  )) − and the analogous meaning has the symbols   ,   .Of course we have   = (  +   +   )/3.A similar meaning can be given to  Δ =  Δ (  ) (resp.,   =   (  )) for Δ-frames (resp., R-frames).
A color image in the  space with pixels normalized in [0, 1] is converted to  space via the formula [5] Since no misunderstanding can arise, a frame is denoted by a capital letter instead of its ID number in a sequence of a video.In step (2), the similarity measure adopted in [5]  ( In the th band ( ∈ {, , }) we will use the symbol Sim  (, ).The authors [5] have shown that Lukasiewicz -norm provides the best results with respect to other norms as the classical Min and the arithmetical product.For convenience, we assume that the first frame of a video is an I-frame.For determining a GOP sequence in a single band, it can be verified if the successive frame  is a B-frame or a Pframe, that is, if it is "very similar" to the preceding I-frame  in the sense that Sim(, ) < Sim, with Sim ∈ [0, 1] being a prefixed threshold value; otherwise  is assumed to be a new I-frame.We determine a GOP sequence in an assigned band using (12) with the following process: (1) we consider the first frame  as an I-frame; (2) we compare  with the successive frame ; (3) if Sim(, ) < Sim, the frame  is a B-frame or a P-frame and is enclosed in the GOP sequence.Then we consider the successive frame  and go to step (2); otherwise  is a new I-frame.The previous frame is a P-frame and represents the last frame of the GOP sequence.
After determining the GOP sequence, we check if each frame of the sequence is a B-frame or a P-frame by using the previous algorithms.In step (3) we finally compress the frames.In order to reduce the mean compression rate for a P-frame, in [5] and references therein, the authors introduce a "difference" frame D, called Δ-frame, between a P-Frame  and I-frame  by defining  : (, ) ∈ {1, 2, . . ., } × {1, 2, . . ., } → [0, 1] as The usage of the Δ-frame has the advantage of using a stronger compression rate for the P-frames with respect to the I-frames; indeed a P-frame  has values in its pixels very close to the pixels of the previous I-frame.Hence the Δ-frame  in (13) has a low quantity of information and it can be coded with a low compression rate.Then, if   and   are the frames obtained after coding/decoding  and , the frame   (reconstruction of the frame ), with   ,   ,   : (, ) ∈ {1, 2, . . ., } × {1, 2, . . ., } → [0, 1], is deduced from the membership values of   and   via the following formula:  (16) We use the formulas ( 14) and ( 16) for reconstructing the P-frames and the B-frames in the videos, respectively.In accordance with [5], we convert each image in the  space by using the formula For simplicity of presentation, in our tests here we adopt (  ) = (  ), (  ) = (  ).In [5] a preprocessing phase is adopted for determining the threshold Sim calculated with the following steps:   (1) if the initial frame  is considered as an I-frame, we compress  in the th band ( ∈ {, , }) with compression rate   ; each successive frame is a P-frame  and we archive the similarity value Sim  (, ) calculated with formula (12); we compress the Δ-frame  in the th band with compression rate equal to   (less than of   ) and if   is the related decompressed frame, we derive the P-frame   via (14); (2) each P-frame  is also coded in the th band with compression rate   and let   be the decoded P-frame by using directly the F-transforms; then we determine the difference diff(PSNR) = |PSNR(  , ) − PSNR(  , )|; (3) the trend of diff(PSNR) is plotted with respect to the similarity Sim  (, ) in each band of the image.
As similarity threshold, we assume that value of Sim  (, ) such that diff(PSNR) does not exceed a prefixed limit is equal to 3 (cf.[5] for details); with  being the first I-frame of the GOP sequence.In our tests, in addition we put Sim = Sim in the preprocessing phase.

The Results
For brevity of discussion, we show the results obtained for the color video "tennis2" [6].We present all the results by assuming   ≈ 0.262 for the I-frames,  Δ ≈0.027 for the Δ-frames, and   ≈ 0.020 for the R-frames.Figures 1(a As examples we show some Δ-frames and R-frames in each band. (i)  Band.The first P-frame is given by the fourth frame.Figure 3(a) contains the Δ-frame obtained by using (13) from the fourth frame and the first frame (an I-frame).The second and the third frames are B-frames.Figure 3(b) (resp., Figure 3(c)) shows the R-frame obtained by using (15) from the second (resp., third) frame, the first frame (an I-frame), and the fourth frame (a P-frame).
(ii)  Band.The first P-frame is given by the sixth frame.(iii) Band.The first P-frame is given by the fifth frame.Figure 5(a) contains the Δ-frame obtained by using (13) from  All the results obtained for the video "tennis2" are synthetized in Table 1.
In Table 2 we report the final PSNR index in the three methods.

Conclusions
We present a new method for coding/decoding color videos, in which we classify a frame in I-frame, P-frame, and B-frame using similarity measures for determining the GOP sequences and the type of frames.Our method seems to be fully comparable with classical F-transforms and MPEG-4 for similar mean compression rates to a certain extent.

Figure 3 :
Figure 3: (a) Δ-frame from Frame 4 in  band, (b) R-frame from Frame 2 in  band, and (c) R-frame from Frame 3 in  band.

Figure 4 :
Figure 4: (a) Δ-frame from Frame 6 in  band, (b) R-frame from Frame 2 in  band, (c) R-frame from Frame 3 in  band, and (d) R-frame from Frame 4 in  band.

Figure 5 :
Figure 5: (a) Δ-frame from Frame 5 in  band, (b) R-frame from Frame 2 in  band, (c) R-frame from Frame 3 in  band, and (d) R-frame from Frame 4 in  band.

( 4 )
then the threshold Sim is given by Sim = max ∈GOP {max {Sim  (, ) :  ∈ {, , }}} (18) )-1(d) show the first frame of the video and the corresponding singleband images in the  space, respectively.As example of Diff(PSNR), Figure2contains the plots of Diff(PSNR) ≤ 3 for the similarity values obtained in , , and  bands for which we choose Sim  (, ) > 0.948 = Sim (as average).

Figure 4 (
a) contains the Δ-frame obtained by using (13) from the sixth frame and the first frame (an I-frame).The frames 2, 3, and 4 are B-frames.Figures4(b)-4(d) show the R-frames obtained by using (15) from the first frame (an I-frame), the B-frames 2, 3, and 4, and the sixth frame (a P-frame), respectively.

Table 1 :
Results for "tennis2"[6]  in the proposed method.and the first frame (an I-frame).The frames 2, 3, and 4 are B-frames.Figures5(b)-5(d)show the R-frames obtained by using (15) from the first frame (an I-frame), the B-frames 2, 3, and 4, and the fifth frame (a P-frame), respectively.