We use a new method based on discrete fuzzy transforms for coding/decoding frames of color videos in which we determine dynamically the GOP sequences. Frames can be differentiated into intraframes, predictive frames, and bidirectional frames, and we consider particular frames, called Δ-frames (resp., R-frames), for coding P-frames (resp., B-frames) by using two similarity measures based on Lukasiewicz t-norm; moreover, a preprocessing phase is proposed to determine similarity thresholds for classifying the above types of frame. The proposed method provides acceptable results in terms of quality of the reconstructed videos to a certain extent if compared with classical-based F-transforms method and the standard MPEG-4.
1. Introduction
A video can be considered as a sequence of frames of sizes N×M; a frame is an image that can be compressed by using a lossy compression method. We can classify each frame as intraframe (for short, I-frame), predictive frame (for short, P-frame), and bidirectional frame (for short, B-frame) which is more compressible than I-frame. A B-frame can be predicted or interpolated from an earlier and/or later frame. In order to avoid a growing propagation error, a B-frame is not used as a reference to make further predictions in most encoding standards except in AVC [1]. A frame can be considered as a P-frame if it is “similar” to the previous I-frame in the frame sequence; otherwise, it must be considered as a new I-frame. This similarity relation between a P-frame and the previous I-frame is fundamental in video-compression processes because a P-frame has values in its pixels very close to the pixels of the previous I-frame. This suggests to define a frame containing differences between a P-frame and the previous I-frame, called Δ-frame which has a low quantity of information and hence it can be coded with a low compression rate. A P-frame is decoded via the previous I-frame and the Δ-frame. In the MPEG-4 method [2, 3], that adopts the JPEG technique [4] for coding/decoding frames, the I-frames, P-frames, and B-frames are arranged in a Group of Picture (for short, GOP) sequence. A B-frame is reconstructed by using either the previous or successive I-frame. Here the results of [5] are improved by using a technique based on F-transforms for coding B-frames. For convenience, we assume that the first frame of a video is an I-frame. We assign an ID number to each frame of the video. Then we can say that the kth frame is a B-frame or a P-frame if it is “very similar” to the previous ith I-frame in the sense that its similarity Sim(i,k) a parameter defined on the Lukasiewicz t-norm (see formula (12)) is greater than a threshold value SimP [5]; otherwise the kth frame is assumed to be a new I-frame as the first frame of the successive GOP sequence.
The first algorithm is used for determining the GOP sequences; the second algorithm is used for determining the type of P-frame or B-frame. The first frame of the GOP sequence is always an I-frame and the last frame is a P-frame. The function “analyze GOP sequence (ID1, ID2)” reported in Algorithm 1 describes this process, where ID1 is the ID of the first I-frame and ID2 is the ID of the last P-frame in the GOP sequence. This function is used for determining if the kth frame in the GOP sequence, where ID1<k<ID2, is a B-frame or a P-frame. We define a threshold similarity SimB, and we compare it with the frame whose ID is formed from the integer [Ms] contained in the mean Ms of the previous I-frame or P-frame and the kth frame by obtaining a similarity value Sim(k,[Ms]). In the array element NP[k] we insert the ID number of the last frame after the kth frame for which Sim(k,[Ms])<SimB holds. The variable i contains the ID number of the previous I-frame or P-frame; it is initially called ID1; the variable w points to the last frame in the GOP sequence; it is called ID2.
Algorithm 1 (analyze GOP sequence (ID1, ID2)).
Pseudo-code for determining a GOP sequence
i=ID of the first I-frame //i is the ID of first frame of the video
w= number of frames //w is the ID of the last P-frame of the video
k=i+1
IF k<w
Calculate the similarity Sim(i,k) between the kth frame and the ith frame
If Sim(i,k)<SimP,
the kth frame is a B-frame or a P-frame and is inserted in the GOP sequence
k=k+1
Else
analyse GOP sequence (i,k-1)
i=k
go to (3)
End.
Algorithm 2.
Pseudo-code for determining type of frames
i=ID1//i is the ID of the first frame of the GOP sequence
w=ID2//w is the ID of the last P-frame of the GOP sequence
For each k in [i+1,w-1]
NP[k]=k//InitializeNP[k]
s=k+1
Create the [Ms]th frame as a new frame whose normalized pixels are obtained as the mean between the normalized pixels of the ith and sth frames
Calculate the similarity Sim(k,[Ms]) between the kth and [Ms]th frames. If Sim(k,[Ms])<SimB,
NP[k]=s-1
Else s=s+1
go to step (6)
next for
NPMin=min(NP[k])
The frames between the ith and NPMin-th frames are labelled as B-frames
The NPMin-th frame is labelled as a P-frame
If NPmin<w then
i=NPMin,
go to step (2)
End.
In our approach we determine a GOP sequence at each step. The frame after the last P-frame is the I-frame of the new GOP sequence. After determining the GOP sequences of the color video, we use the F-transforms [5, 7–10] for compressing the frames. The F-transform method has been developed in [5]. In this paper each frame is converted in the YUV space. Indeed, since the human eye perceives an image mostly in the Y band (brightness) with respect to the U and V bands (chrominance), we can use a stronger compression rate for coding the image in U and V bands with respect to that one used for coding the image in the Y band, without loss of information in the reconstructed image. In [5] the authors show that the quality of the reconstructed images is better than the one obtained using the F-transform method directly in the RGB space (see also [11, 12]). The proposed method is widely discussed in Section 4. In Sections 2 and 3 the theory of F-transforms and its application are recalled for image compression, respectively. In Section 5 the results are deduced on a large color videos dataset.
2. Fuzzy Transforms
We recall from [9] some essential definitions. Let n≥3 and x1,x2,…,xn be points (nodes) of [a,b] such that x1=a<x2<⋯<xn=b. The fuzzy sets A1,…,An:[a,b]→[0,1] form a fuzzy partition of [a,b] if
(1)Ai(xi)=1 for any i=1,2,…,n;
(2)Ai(x)=0 if x∉(xi-1,xi+1), where i=1,2,…,n and x0=x1=a, xn+1=xn=b;
(3)Ai(x) is a continuous function on [a,b];
(4)Ai(x) is strictly increasing on the interval [xi-1,xi] for i=2,…,n and is strictly decreasing on the interval [xi,xi+1] for i=1,…,n-1;
(5) for any x∈[a,b], ∑i=1nAi(x)=1.
We say that {A1,A2,…,An} constitute a symmetric fuzzy partition if the following hold:
(6) equidistance of the nodes, that is, xi=a+h·(i-1) for i=1,2,…,n, where h=(b-a)/(n-1);
(7)Ai(xi-x)=Ai(xi+x) for any x∈[0,h] and i=2,…,n-1;
(8)Ai+1(x)=Ai(x-h) for any x∈[xi,xi+1] and i=1,2,…,n-1.
Considering functions f taking values on a finite set P={p1,…,pm}⊆[a,b], f:P→[0,1], we suppose that P is sufficiently dense with respect to a fuzzy partition {A1,A2,…,An} of [a,b], that is, if m>n and for each i=1,…,n there exists an index j∈{1,…,m} such that Ai(pj)>0. Now let n,m≥3, y1,y2,…,ym∈[c,d] be other m assigned nodes such that y1=c<⋯<ym=d. Let C1,…,Cm:[c,d]→[0,1] be another fuzzy partitions of [c,d]. Let f:P×Q→[0,1] be a function defined on the finite set P×Q={p1,…,pN}×{q1,…,qM}⊆[a,b]×[c,d], with N>n and M>m, where P (resp., Q) is sufficiently dense with respect to some fuzzy partition {A1,A2,…,An} of [a,b] (resp., {C1,…,Cm} of [c,d]). Then [Fkl],Fkl∈[0,1], k=1,…,n and l=1,…,m, is the fuzzy matrix which is defined as discrete F-transform of f with respect to {A1,A2,…,An} and {C1,…,Cm} if the following holds:
(1)Fkl=∑j=1M∑i=1Nf(pi,qj)Ak(pi)Cl(qj)∑j=1M∑i=1NAk(pi)Cl(qj).
Afterwards we define fnmF:P×Q→[0,1] to be the inverse F-transform of f with respect to {A1,A2,…,An} and {C1,…,Cm} as
(2)fnmF(pi,qj)=∑k=1n∑l=1mFklAk(pi)Cl(qj).
The following theorem holds.
Theorem 3.
Let f:P×Q→[0,1] be a function assigned on P×Q={p1,…,pN}×{q1,…,qM}⊆[a,b]×[c,d]. Then for every ε>0, there exist two integers n(ε), m(ε) with n(ε)<N, m(ε)<M and some fuzzy partitions {A1,A2,…,An(ε)} of [a,b] and {C1,C2,…,Cm(ε)} of [c,d] for which P and Q are sufficiently dense with respect to these partitions, respectively, and such that the following inequality holds for every i=1,…,N, j=1,…,M:
(3)|f(pi,qj)-fn(ε)m(ε)F(pi,qj)|<ε.
3. The Coding/Decoding Process
Let R be an image of sizes N×M, considered as a fuzzy relation R:(i,j)∈{1,…,N}×{1,…,M}→[0,1]; that is, R(i,j)=P(i,j)/Lt, with P(i,j) being the normalized value of the pixel with respect to the length Lt of the scale used. For simplicity, let pi=i, qj=j, a=c=1, b=N, and d=M. Let the fuzzy sets A1,…,An:[1,N]→[0,1] and C1,…,Cm:[1,M]→[0,1], with n<N and m<M, form a fuzzy partition of [1,N] and [1,M], respectively. Following [8], R is subdivided in submatrices RB of sizes N(RB)×M(RB), RB:(i,j)∈{1,…,N(RB)}×{1,…,M(RB)}→[0,1], called blocks, coded to matrices of sizes n(RB)×m(RB), (n(RB)<N(RB), m(RB)<M(RB)) via the following discrete F-transforms [FklB] for every (k,l)∈{1,…,n(RB)}×{1,…,m(RB)} as
(4)FklB=∑j=1M(RB)∑i=1N(RB)RB(i,j)Ak(i)Cl(j)∑j=1M(RB)∑i=1N(RB)Ak(i)Cl(j),
and decode [FklB] via Rn(RB)m(RB)F:(i,j)∈{1,…,N(RB)}×{1,…,M(RB)}→[0,1] defined as
(5)Rn(RB)m(RB)F=∑j=1M(RB)∑i=1N(RB)FklBAk(i)Cl(j)
which approximates RB in the sense of Theorem 3; that is, there exist, for every ε>0, two integers n(RB,ε), m(RB,ε) such that the following holds for every (i,j)∈{1,…,N(RB)}×{1,…,M(RB)}:
(6)|RB(i,j)-Rn(RB,ε)m(RB,ε)F(i,j)|<ε.
Unfortunately the previous theorem does not suggest a method for finding such integers, and then we try to assign values to n(RB)=n(RB,ε) and m(RB)=m(RB,ε) for getting compression rates given by
(7)ρ(RB)=n(RB)·m(RB)N(RB)·M(RB)
which are useful to code any original block RB. The recomposition of the blocks Rn(RB)m(RB)F gives the image RF whose PSNR with respect to the original image R is calculated via the following well-known formula:
(8)PSNR(R,RF)=20log10Lt∑i=1N∑j=1M(R(i,j)-RF(i,j))2/N×M.
In accordance with [8], in the proposed experiments the best results are deduced with the symmetric fuzzy partitions A1,…,An(RB):[1,N(RB)]→[0,1] and C1,…,Cm(RB):[1,M(RB)]→[0,1] defined as
(9)A1(i)={0.5(cosπh(i-1)+1)if1≤i≤x2,0else,Ak(i)={0.5(cosπh(i-xk)+1)ifxk≤i≤xk+1,0else,An(RB)(i)={0.5(cosπh(i-xn(RB-1))+1)ifxn(B)-1≤i≤N(RB),0else,
where k=2,…,n(RB)-1, h=(N(RB)-1)/(n(RB)-1), xk=1+h·(k-1), and
(10)C1(j)={0.5(cosπs(j-1)+1)if1≤j≤y2,0else,Ct(j)={0.5(cosπs(j-yt)+1)ifyt-1≤j≤xt+1,0else,Cm(RB)(j)={0.5(cosπs(j-ym(RB)-1)+1)ifym(RB)-1≤j≤M(RB),0else,
where t=2,…,m(RB)-1, s=(M(RB)-1)/(m(RB)-1), and yt=1+s·(t-1).
4. Our Proposal
The proposed process includes the following steps:
each color frame, seen as a fuzzy relation, is converted from the space RGB to the space YUV;
a classification of the frames is made via the previous algorithms;
the compression rate ρI=ρI(RB) of the I-frames is the mean of three (possibly different) compression rates used in the three bands, that is, if any block RB of an I-frame has sizes (say) NIY(RB)×MIY(RB) in the band Y and is coded to a block of sizes (say) nIY(RB)×mIY(RB) for which the related compression rate is given by ρIY=ρIY(RB)=(nIY(RB)·mIY(RB))·(NIY(RB)·MIY(RB))- and the analogous meaning has the symbols ρIU, ρIV. Of course we have ρI=(ρIY+ρIU+ρIV)/3. A similar meaning can be given to ρΔ=ρΔ(RB) (resp., ρR=ρR(RB)) for Δ-frames (resp., R-frames).
A color image in the RGB space with pixels normalized in [0,1] is converted to YUV space via the formula [5]
(11)[YUV]=[0.2990.5870.114-0.169-0.3320.5000.500-0.419-0.0813][RGB]+[00.50.5].
Since no misunderstanding can arise, a frame is denoted by a capital letter instead of its ID number in a sequence of a video. In step (2), the similarity measure adopted in [5] is used for classifying the type of frame. It is based on the Lukasiewicz t-norm between two frames F and G, with F,G:(i,j)∈{1,2,…,N}×{1,2,…,M}→[0,1], defined as
(12)Sim(F,G)=(∑i=1N∑j=1M{1-max{F(i,j),G(i,j)}+min{F(i,j),G(i,j)}}∑i=1N∑j=1M{1-max{F(i,j),G(i,j)})×(N×M)-1.
In the μth band (μ∈{Y,U,V}) we will use the symbol Simμ(F,G). The authors [5] have shown that Lukasiewicz t-norm provides the best results with respect to other t-norms as the classical Min and the arithmetical product. For convenience, we assume that the first frame of a video is an I-frame. For determining a GOP sequence in a single band, it can be verified if the successive frame G is a B-frame or a P-frame, that is, if it is “very similar” to the preceding I-frame F in the sense that Sim(F,G)<SimB, with SimB∈[0,1] being a prefixed threshold value; otherwise G is assumed to be a new I-frame. We determine a GOP sequence in an assigned band using (12) with the following process:
we consider the first frame F as an I-frame;
we compare F with the successive frame G;
if Sim(F,G)<SimP, the frame G is a B-frame or a P-frame and is enclosed in the GOP sequence. Then we consider the successive frame G and go to step (2); otherwise G is a new I-frame. The previous frame is a P-frame and represents the last frame of the GOP sequence.
After determining the GOP sequence, we check if each frame of the sequence is a B-frame or a P-frame by using the previous algorithms. In step (3) we finally compress the frames. In order to reduce the mean compression rate for a P-frame, in [5] and references therein, the authors introduce a “difference” frame D, called Δ-frame, between a P-Frame G and I-frame F by defining D:(i,j)∈{1,2,…,N}×{1,2,…,M}→[0,1] as
(13)D(i,j)=[F(i,j)-G(i,j)+1]2.
The usage of the Δ-frame has the advantage of using a stronger compression rate for the P-frames with respect to the I-frames; indeed a P-frame G has values in its pixels very close to the pixels of the previous I-frame. Hence the Δ-frame D in (13) has a low quantity of information and it can be coded with a low compression rate. Then, if D′ and F′ are the frames obtained after coding/decoding F and D, the frame G′ (reconstruction of the frame G), with D′, F′, G′:(i,j)∈{1,2,…,N}×{1,2,…,M}→[0,1], is deduced from the membership values of F′ and D′ via the following formula:
(14)G′(i,j)=max{0,F′(i,j)-2D′(i,j)+1}max{1,F′(i,j)-2D′(i,j)+1}.
Now we present a new schema for coding/decoding a B-frame which is inserted in a GOP between an I-frame F and a P-frame G. Then we consider a frame R given by
(15)R(i,j)=[(F(i,j)+G(i,j))/2-B(i,j)+1]2
and we code it. Let R′ be the frame obtained after decoding R, with R′:(i,j)∈{1,2,…,N}×{1,2,…,M}→[0,1]. All the coding/decoding processes are realized via the F-transforms with the symmetric fuzzy partition given in Section 3. We reconstruct the B-frame, say B′, by combining the membership values of F′, G′, and R′ via the following formula:
(16)B′(i,j)=max{0,[F′(i,j)+G′(i,j)]/2-2R′(i,j)+1}max{1,[F′(i,j)+G′(i,j)]/2-2R′(i,j)+1}.
We use the formulas (14) and (16) for reconstructing the P-frames and the B-frames in the videos, respectively. In accordance with [5], we convert each image in the RGB space by using the formula
(17)[RGB]=[101.40751-0.3455-0.716911.77900][YUV]+[0.50.50.5].
For simplicity of presentation, in our tests here we adopt M(RB)=N(RB), m(RB)=n(RB). In [5] a preprocessing phase is adopted for determining the threshold SimP calculated with the following steps:
if the initial frame F is considered as an I-frame, we compress F in the μth band (μ∈{Y,U,V}) with compression rate ρIμ; each successive frame is a P-frame G and we archive the similarity value Simμ(F,G) calculated with formula (12); we compress the Δ-frame D in the μth band with compression rate equal to ρPμ (less than of ρIμ) and if D′ is the related decompressed frame, we derive the P-frame G′ via (14);
each P-frame G is also coded in the μth band with compression rate ρPμ and let G′′ be the decoded P-frame by using directly the F-transforms; then we determine the difference diff(PSNR)=|PSNR(G′′,G)-PSNR(G′,G)|;
the trend of diff(PSNR) is plotted with respect to the similarity Simμ(F,G) in each band of the image. As similarity threshold, we assume that value of Simμ(F,G) such that diff(PSNR) does not exceed a prefixed limit is equal to 3 (cf. [5] for details);
then the threshold SimP is given by
(18)SimP=maxG∈GOP{max{Simμ(F,G):μ∈{Y,U,V}}}
with F being the first I-frame of the GOP sequence. In our tests, in addition we put SimB=SimP in the preprocessing phase.
5. The Results
For brevity of discussion, we show the results obtained for the color video “tennis2” [6]. We present all the results by assuming ρI≈0.262 for the I-frames, ρΔ≈0.027 for the Δ-frames, and ρR≈0.020 for the R-frames. Figures 1(a)–1(d) show the first frame of the video and the corresponding single-band images in the YUV space, respectively. As example of Diff(PSNR), Figure 2 contains the plots of Diff(PSNR) ≤ 3 for the similarity values obtained in Y, U, and V bands for which we choose SimY(F,G)>0.948=SimP (as average). As examples we show some Δ-frames and R-frames in each band.
(a) Frame 1 of “tennis2” [6], (b) Frame 1 in Y band, (c) Frame 1 in U band, and (d) Frame 1 in V band.
Diff(PSNR) with the similarity in Y, U, and V bands.
(i) YBand. The first P-frame is given by the fourth frame. Figure 3(a) contains the Δ-frame obtained by using (13) from the fourth frame and the first frame (an I-frame). The second and the third frames are B-frames. Figure 3(b) (resp., Figure 3(c)) shows the R-frame obtained by using (15) from the second (resp., third) frame, the first frame (an I-frame), and the fourth frame (a P-frame).
(a) Δ-frame from Frame 4 in Y band, (b) R-frame from Frame 2 in Y band, and (c) R-frame from Frame 3 in Y band.
(ii) UBand. The first P-frame is given by the sixth frame. Figure 4(a) contains the Δ-frame obtained by using (13) from the sixth frame and the first frame (an I-frame). The frames 2, 3, and 4 are B-frames. Figures 4(b)–4(d) show the R-frames obtained by using (15) from the first frame (an I-frame), the B-frames 2, 3, and 4, and the sixth frame (a P-frame), respectively.
(a) Δ-frame from Frame 6 in U band, (b) R-frame from Frame 2 in U band, (c) R-frame from Frame 3 in U band, and (d) R-frame from Frame 4 in U band.
(iii)VBand. The first P-frame is given by the fifth frame. Figure 5(a) contains the Δ-frame obtained by using (13) from the sixth frame and the first frame (an I-frame). The frames 2, 3, and 4 are B-frames. Figures 5(b)–5(d) show the R-frames obtained by using (15) from the first frame (an I-frame), the B-frames 2, 3, and 4, and the fifth frame (a P-frame), respectively.
(a) Δ-frame from Frame 5 in V band, (b) R-frame from Frame 2 in V band, (c) R-frame from Frame 3 in V band, and (d) R-frame from Frame 4 in V band.
All the results obtained for the video “tennis2” are synthetized in Table 1.
Results for “tennis2” [6] in the proposed method.
Parameters
Y band
U band
V band
Number of I-frames
15
7
8
Number of P-frames
31
23
23
Number of B-frames
54
70
69
Mean compression rate ρ(B)
0.1128
0.0236
0.0245
Mean PSNR for I-Frames
27.011
25.545
25.812
Mean PSNR for P-Frames
24.816
23.710
23.815
Mean PSNR for B-Frames
24.734
22.819
23.026
Figures 6(a)–6(c) contain Frame 2 decoded with the proposed method, classical F-transforms, and MPEG-4, respectively.
(a) Frame 2 in the proposed method, (b) Frame 2 in F-transforms, and (c) Frame 2 in MPEG-4.
In Table 2 we report the final PSNR index in the three methods.
Comparison with other methods for “tennis2” [6].
Parameters
Proposed method
F-transforms
MPEG-4
Mean compression rate
0.053
0.058
0.055
Mean PSNR
23.915
22.801
23.431
6. Conclusions
We present a new method for coding/decoding color videos, in which we classify a frame in I-frame, P-frame, and B-frame using similarity measures for determining the GOP sequences and the type of frames. Our method seems to be fully comparable with classical F-transforms and MPEG-4 for similar mean compression rates to a certain extent.
Acknowledgments
The authors thank the referees and the editor whose suggestions have greatly improved the contents of this paper.
RichardsonI. E. G.2010Hoboken, NJ, USAJohn Wiley & SonsPereiraF. C. N.EbrahimiT.2002Upper Saddle River, NJ, USAPrentice Hall ProfessionalSikoraT.1995New York, NY, USAMcGraw- HillPennebakerW. B.MitchellJ. L.1993New York, NY, USAVan Nostrand ReinholdDi MartinoF.LoiaV.SessaS.Fuzzy transforms for compression and decompression of color videos201018020391439312-s2.0-7795815445810.1016/j.ins.2010.06.030http://sampl.eng.ohio-state.edu/~sampl/database.htmDi MartinoF.LoiaV.PerfilievaI.SessaS.An image coding/decoding method based on direct and inverse fuzzy transforms20084811101312-s2.0-4004908782210.1016/j.ijar.2007.06.008Di MartinoF.SessaS.Compression and decompression of images with discrete fuzzy transforms200717711234923622-s2.0-3394728401310.1016/j.ins.2006.12.027PerfilievaI.Fuzzy transforms: theory and applications2006157899310232-s2.0-3364498994610.1016/j.fss.2005.11.012PerfilievaI.De BaetsB.Fuzzy transforms of monotone functions with application to image compression201018017330433152-s2.0-7795365097310.1016/j.ins.2010.04.029NobuharaH.PedryczW.HirotaK.Relational image compression: optimizations through the design of fuzzy coders and YUV color space2005964714792-s2.0-2044436268410.1007/s00500-004-0366-7NobuharaH.PedryczW.SessaS.HirotaK.A motion compression/reconstruction method based on max t-norm composite fuzzy relational equations200617617252625522-s2.0-3374521964710.1016/j.ins.2005.12.004