Efficient Enhancement for Spatial Scalable Video Coding Transmission

,


Introduction
With the current boom of network technology, multimedia communication has become a popular subject, especially in information technology circles.The rapid evolution of communication systems technology has been signposted by the emergence of diverse multimedia innovations such as Internet Protocol television, digital television, video telephony telecommunication, video on demand, video conferencing, and web conferencing [1].This has led to heterogeneous demands for video sequencing with respect to the frame size, frame rate, and bit rate.There has thus been the need to use several criteria for encoding video sequencing to meet the requirements of the diverse systems, as shown in Figure 1 [2].
Scalable Video Coding (SVC) was recently standardized by the Joint Video Team (a group of video coding experts from the ITU-T study group) and ISO/IEC MPEG (Moving Picture Experts Group) [3].The multiple coding availed by SVC depends on the reconstruction of the lower resolution or lower quality signals from partial bit streams, as shown in Figure 2 [4,5].
SVC enables the efficient incorporation of three types of scalabilities, namely, spatial, quality, and temporal scalabilities.The spatial and quality scalabilities can be realized by a layered approach, wherein one base layer (BL) is used to encode the lowest temporal, spatial, and quality representations of the video stream, and one or more enhancement layers (ELs) are used to encode additional information.Using the base layer as a starting point, temporal versions of the video or versions with higher qualities and resolutions can be reconstructed during the decoding process [6].The block size of the SVC system can be varied to match the motion estimation that is used to reduce temporal redundancy between frames.It has seven modes for interprediction (skip, 16 × 16, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8, and 4 × 4) and two intramodes (4 × 4 and 16 × 16) [7,8].To ensure efficient video coding, additional processes are required, including interlayer prediction.The main objective of interlayer prediction is the reuse of the motion and texture information of the BL at the EL.In H.264 AVC, for each EL similar to the BL, an exhaustive search and comparison of the variable block-size partitions are required to minimize the rate-distortion (RD) cost [9].
Interlayer prediction can be done using any of three methods, namely, interlayer motion prediction, interlayer residue prediction, and interlayer intraprediction from a lower layer.Whereas all three methods enable enhancement of the coding efficiency, they increase the computational complexity of video encoding, making it difficult to realize an SVC encoder in a real application [10].
Several algorithms for fast mode decision schemes have been proposed to reduce the computational complexity.Shen et al. [4] suggested two techniques for mode decision, namely, skip mode decision and adaptive early termination, which utilize both coding information on spatial neighbor macroblocks (MBs) within the same frame and neighbor MBs in the BL.Kim et al. [11] described an algorithm based on a 16 × 16 coded block pattern (CBP) mode in the current frame and the best CBP mode in the selected reference frame.Li et al. [12] developed an algorithm based on the distribution relationship between the BL and EL.The algorithm enabled reduction of the number of candidate modes.Kim et al. [13] explained that the MB in the EL layer could be predicted using the modes of a colocated MB and its neighboring MBs in the BL.Wang et al. [14] proposed an algorithm that terminated the intermode decision early in the EL by estimating the RD cost from the MB of the BL and the enhancement layer, from temporal and spatial perspectives.Kim et al. [15] suggested a fast mode search for combined scalability based on the correlation between the search directions for bidirectional motion prediction of neighboring MBs and the current MB at the encoder.Dawei et al. [16] introduced an algorithm based on early termination of the modes used in the colocated reference macroblock of all the layers.This was done by calculating the mean peak signal-to-noise ratio (PSNR) for all the frames at the temporal level for a specific mode.The present paper introduces four algorithms based on the sending of partial EL data, thus reducing the number of sending bits.They also shorten the transmission time (TT) across the network and reduce the amount of required EL encoding, thereby also shortening the encoding time (ET).
The rest of this paper is organized as follows.Section 2 presents an overview of the different types of scalabilities and their underlying concepts.The proposed fast mode decision is introduced in Section 3, while Section 4 describes the experiments performed using the proposed algorithms and also presents and discusses the results.The paper is ended in Section 5 with a brief presentation of the conclusions of the study.

Types of Scalabilities.
There are three types of scalabilities in video coding, namely, spatial scalability, quality scalability, and temporal scalability.Figure 3 shows the differences among the three types of scalabilities.Spatial scalability involves the coding of a video using multiple spatial resolutions.As shown in Figure 4, the data decoded at lower resolutions can be used to predict the data of higher resolutions to reduce the bit rate.Quality scalability is considered as a special case of spatial scalability because the generated stream can be used to predict and decode the video with different qualities, as shown in Figure 5 [17].Conversely, in temporal scalability coding, structures containing bidirectional (B) and prediction (P) pictures in the BL are decoded as B and P frames, respectively.The EL frames are predicted using the lower temporal layer frames as the references frames.The frames in the BL and EL are used to build a group of pictures (GOP), as shown in Figure 6 [18].achievement of the given bit rate with minimum distortion [19,20].The RDO formula is as follows: where RD is the distortion cost, which is the difference between the current MB and the encoded MB; D is the sum of absolute differences (SAD) or the sum of squared differences (SSD); and R is the number of bits required to encode the current MB based on the selected mode.The

Mathematical Problems in Engineering
Lagrange multiplier  is used to achieve the minimum RD through a trade-off between R and D.

Zero-Mean Normalized Cross Correlation (ZNCC).
The ZNCC video analysis method was used in this study to detect the motion type of the video [21], whether fast motion, medium motion, or slow motion.It was one of the important factors considered in the trade-off among the benefits of the different proposed methods.The ZNCC equation is as follows: where  1 and  2 , respectively, denote video frames obtained at times 1 and 2,  1 and  2 denote the corresponding frame histograms,   is the th.bin of the color histogram , and  is the mean value of all the entries of .The ZNCC equation produces a real value between −1 and 1, with −1 indicating no similarity whatsoever between the histograms and +1 indicating that they are identical [22].

Phase-Based Frame Interpolation for Video.
Frame interpolation is one of the main concepts used in the development of the algorithms proposed in this work.The two most popular methods employed in frame interpolation algorithms are the Lagrangian and Eulerian methods.Lagrangian methods are based on motion models [14,23], while Eulerian methods are based on color change per pixel over time [24,25].The interpolation algorithms proposed in this paper may be considered to utilize an extension of Eulerian methods [26].
Considering input frames F1, F2, and Fout and denoting the steerable pyramid decompositions by P1 and P2, the steps of the proposed interpolation algorithms are as follows: as the following equation: (3) For all  =  − 1: (5) Calculate   ← interpolate (  , φdiff , ) as the following equation:

The Mode-Distribution Correlation between Macroblocks in BL and EL. SVC utilizes a BL and one or more ELs.
There is a relationship between the frames in the different layers.The layers are all identical with the exception of certain parameters, which depend on the type of SVC.This similarity can be used to minimize the number of selected modes of the macroblocks.In the present study, only a limited number of macroblocks modes were selected for testing for intra-and interprediction [29], resulting in reduced encoding time.In the case of spatial SVC, the presently proposed macroblock mode of the EL has the same mode as the corresponding macroblocks in the BL, which is selected by exhaustive search [30].

The Proposed Fast Mode Decision Algorithms
This paper proposes four algorithms for full SVC and interlayer residual SVC spatial scalability.The goal of the proposed algorithms is the reduction of the processing time (encoding time and transmission time).Two concepts are employed for this purpose.The main concept involves sending only a part of the EL frame data, with the missing frame data generated at the decoder by interpolation using the information on the surrounding frames, as mentioned in Section 2.4.The second concept is based on the mode-distribution correlation between the BL and EL, and this is affected by the spatial scalability, as mentioned in Section 2.5.Using these two concepts, the four proposed algorithms enable significant shortening of both the TT and ET.The first two steps of the four algorithms are the same.In the first step, the BL frames are encoded through an exhaustive search.In the second step, the up sample technique is used to make the BL frames the same size as the EL frames.The subsequent steps differ among the algorithms.

Proposed Algorithms for Improving
Interlayer Residual Spatial SVC 3.1.1.Interlayer Interpolation (ILIP) Algorithm.Outlined in Figure 7, this algorithm utilizes the interlayer residual concept for spatial SVC.To encode the EL frames, only the odd frames of the interlayer residuals are transmitted after being encoded by exhaustive search to achieve a high quality.This is because the transmitted frames depend on RDO, as mentioned in Section 2.2.At the decoder, the decoded odd interlayer residual frames and the even frames are derived by interpolation between the decoded odd frames.To obtain the full frames for the EL, all the interlayer residual frames are added to the BL decoded frames.This sequence reduces the sending bits, resulting in high performance with respect to the TT and ET.The results of this algorithm are described in Section 4. If shortening the TT is the most important reason for using SVC, the ILIP algorithm would be preferable.

Interlayer Base Mode (ILBM)
Algorithm.This algorithm particularly shortens the ET relative to the TT.The main difference between this algorithm and the previous one is the use of the mode-distribution correlation between the BL and the EL, instead of an exhaustive search, to encode the odd frames of the interlayer residual.The use of the mode-distribution correlation saves much ET with negligible increase in the number of sending bits compared to the ILIP algorithm.However, compared to the Joint Scalable Video Model (JSVM), the ILBM algorithm significantly reduces the number of sending bits.The flow chart of this algorithm is shown in Figure 8.When using the ILIP and ILBM algorithms, the video sequence should be encoded with a quality level lower than a certain threshold of quality (THQ).The application of the concept of the interlayer residual to SVC thus places a quality limitation.If the video sequence must be  encoded with a quality higher than the THQ threshold, the SVC full-search mode should be employed.However, for a given video speed, depending on which between TT and ET has higher shortening priority, the two algorithms to consider are the full interpolation (FIP) and full-base mode (FBM) algorithms described in Section 3.2.Mathematical Problems in Engineering frames in the EL by exhaustive search.The even frames are obtained by interpolation.The interpolated frames are then subtracted from the targeted even frames and encoded using the SAD formula in (6).The difference is passed through a low-pass filter (7) and the output (Delta Even E) is encoded as the P frame.The forgoing procedure is also applied to the corresponding even frames in the BL, and the output (Delta Even B) is used as the reference frames for (Delta Even E) to encode the last as P frame with a high quality.

Proposed Algorithms for Improving Full
where   is a pixel of the current even frame, and   is the corresponding pixel of the interpolation frame.
All the data sent from the encoder are decoded at the decoder.The full odd frames for the EL layer are obtained by the previous step, as well as parts of all the even frames.The missing parts of the even frames are obtained by interpolation between every two successive odd decoding frames and successively added to the decoded parts.10, this algorithm is used to improve full SVC when the application requires more shortening of the ET than the TT.This is done using the same steps as the FIP algorithm, but with replacing the exhaustive search of the encoding macroblocks with the mode-distribution correlation.

Comparison of the Four Proposed Algorithms.
As illustrated in Figure 11, the four proposed algorithms have three pivotal parameters, which are considered in discriminating among them for a specific purpose.The required level of quality is the core parameter.Accordingly, if the quality required is higher than the THQ, a full-search SVC should be used, and the choice would be between the FIP and FBM algorithms.However, if the required quality is lower than the THQ, an interlayer residual concept should be applied, in which case the choice would be between the ILIP and ILBM algorithms.
All the proposed algorithms shorten the TT and ET, but to varying extents.Hence, the second parameter considered in choosing among the algorithms is the percentage reductions of the TT and ET.The third parameter is the type of fastmotion video, which is determined by zero-mean normalized cross correlation (ZNCC), as discussed in Section 2.3.When using full search, if the video motion is slow, the FBM algorithm should be directly applied; otherwise, a trade-off between the TT and ET should be used to choose between the FBM and FIP algorithms.

Experiment and Results
The employed test platform was Intel (R), with 3.40 GHz CPU, 8 GB RAM, in the Windows XP Professional operating system.The experimental results of the proposed algorithms were variously compared with those of the JSVM reference software, Tae algorithm [11], and Seon-2 algorithm [13].All  the results were expressed as percentages relative to those of the reference software.The performances of the proposed algorithms were evaluated using five standard test sequences, namely, "Forman" (slow motion), "City" and "Bus" (medium motion), and "Soccer" and "Football" (fast motion).Four parameters were considered in the evaluation, namely, saving in number of sending bits (ΔNSB), saving in ET (ΔET), saving in TT (ΔTT), and degradation of peak signal-to-noise ratio (Y-PSNR) (see Table 1).The equations of the four parameters are presented as Positive values of ΔET, ΔTT, and ΔNSB indicate reductions of the encoding time, transmission time, and number of sending bits, respectively, while a positive value of Y-PSNR indicates an increase in quality.
Tables 2-7 present the results of the proposed algorithms for spatial scalability using different video resolutions such as the Sub Quarter Common Intermediate Format (SQCIF) and Quarter Common Intermediate Format (QCIF) in the BL,and the QCIF and Common Intermediate Format (CIF) in the EL.All the results are presented as percentages relative to those of the JSVM.The ILIP and ILBM algorithms are also compared with the Seon-2 algorithm and the FIB and FBM algorithms with the Tae algorithm.
The results presented in Tables 2-4 are for the ILIP and ILBM algorithms using the interlayer residual concept of SVC.The "Forman," "Bus," and "Football" sequences were considered.Tables 5-7 present the results for the FIP and FBM algorithms using the full mode concept of JSVC.In this case, the "Football," "Forman," and "City" sequences were considered.the "Forman" sequence test for an SQCIF BL resolution and QCIF EL resolution.The ILIP algorithm significantly reduced the NSB, and hence the TT, compared to the JSVM.It specifically decreased the TT, ET, and PSNR by 64.30%, 69.27%, and 0.001 dB (negligible), respectively.The corresponding decreases for the ILBM algorithm were 53.44%, 83.61%, and 0.0023 dB, respectively.The ILBM algorithm produced the highest saving in ET.For a QCIF BL resolution and CIF EL resolution, compared to the JSVM, the ILIP algorithm produced TT and ET savings of 63.25% and 67.68%, respectively, while decreasing the video quality by 0.0013.Compared to the Seon-2 algorithm, the ILIP algorithm shortened the TT and ET by 63.316% and 15.39%, respectively, while the video quality was increased by 0.011 dB.In the case of the ILBM algorithm, compared to the JSVM, the TT and ET were shortened by 52.81% and 83.07%, respectively, while the video quality was decreased by 0.0031.Compared to the Seon-2 algorithm, the TT and ET were shortened by 52.876% and 12.55%, respectively, while the PSNR was increased by 0.005 dB.

Results of Proposed ILBM and ILIP Algorithms for Interlayer Residual Spatial SVC. Table 2 presents the results of
The results for the "BUS" sequence are presented in Table 3.For an SQCIF BL resolution and QCIF EL resolution.ILIP algorithm reduced the NSB, and hence the TT, compared to the JSVM.It specifically decreased the TT, ET, and PSNR by 57.89%, 60.40%, and 0.001 dB (negligible), respectively.The corresponding decreases for the ILBM algorithm were 51.6%, 78.75%, and 0.003 dB, respectively.
For a QCIF BL resolution and CIF EL resolution, compared to the JSVM, the ILIP algorithm shortened the TT and ET by 46.97% and 57.93%, respectively, while decreasing the video quality by 0.001 dB.Further, compared to the Seon-2 algorithm, the ILIP algorithm shortened the TT and ET by 47.78% and 8.1%, respectively, while increasing the video quality by 0.019 dB.In the case of the ILBM algorithm, relative to the JSVM, it shortened the TT and ET by 51.6% and 78.75%, respectively, while decreasing the video quality by 0.003 dB.Compared to the Seon-2 algorithm, the ILBM algorithm shortened the TT and ET by 45.907% and 8.71%, respectively, while increasing the PSNR by 0.136 dB.
The results for the "Football" sequence tests are presented in Table 4.For an SQCIF BL resolution and QCIF EL resolution, the ILIP algorithm reduced the NSB, and hence the TT, compared to the JSVM.It specifically decreased the TT, ET, and PSNR by 45.42%, 51.36%, and 0.003 dB (negligible), respectively.The corresponding decreases for the ILBM algorithm were 40.75%, 74.01%, and 0.004 dB (negligible), respectively.
In the case of a QCIF BL resolution and CIF EL resolution, compared to the JSVM, the ILIP algorithm shortened the TT and ET by 44.74% and 50.41%, respectively, while decreasing the video quality by 0.004 dB.Compared to the Seon-2 algorithm, the ILIP algorithm shortened the TT and ET by 44.79% and 9.8%, respectively, while increasing the video quality by 0.037 dB.In the case of the ILBM algorithm, compared to the JSVM, it shortened the TT and ET by 40.24% and 72.32%, respectively, while decreasing the video quality by 0.041 dB.Compared to the Seon-2 algorithm, the TT and ET were shortened by 40.47% and 31.8%,respectively, while the PSNR was increased by 0.037 dB.
From the above results for the proposed ILIP and ILBM algorithms, it can be seen that, for a slow-motion video, represented by the Foreman sequence, the ILIP algorithm is very efficient for reducing the NSB compared to the JSVM, implying that it shortens the TT.It also shortens the ET while negligibly decreasing the PSNR.The ILBM algorithm also reduces the NSB, shortens the TT, and produces the greatest decrease of the ET while negligibly decreasing the PSNR.The results of the ILIP and ILBM algorithms for a medium-motion video, represented by the Bus sequence, are similar to those for the Foreman sequence.However, the specific percentage changes produced by the two algorithms compared to the JSVM vary.The same applies to comparison with the Seon-2 algorithm.
For a fast-motion video, represented by the Football sequence, the performances of the ILIP and ILBM algorithms relative to the JSVM are similar to those for the Foreman and Bus sequences, although the percentage changes differ.However, compared to the Seon-2 algorithm, the ILIP and ILBM algorithms are superior with respect to all the test parameters.Overall, the proposed ILIP and ILBM algorithms afford good videos when very small details are not important, as well as the shortest TTs and ETs for broadcast applications.

Results of Proposed FIP and FBM Algorithms for Interlayer
Residual Spatial SVC.The results for the Foreman sequence tests are presented in Table 5.For an SQCIF BL resolution and QCIF EL resolution, the FIP algorithm reduced the NSB, and hence the TT, compared to the JSVM.It specifically decreased the TT, ET, and PSNR by 55.33%, 52.83%, and 0.010 dB (negligible), respectively.The corresponding decreases for the FBM algorithm were 49.45%, 50.35%, and 0.013 dB, respectively.
For a QCIF BL resolution and CIF EL resolution, compared to the JSVM, the FIP algorithm decreased the TT and ET by 54.74% and 50.35%, respectively, while decreasing the video quality by 0.012 dB.Moreover, compared to the Tae algorithm, the FIP algorithm decreased the TT, ET, and video quality by 55.36%, 3.03%, and 0.012 dB, respectively.Compared to the JSVM, the FBM algorithm decreased the TT, ET, and video quality by 48.36%, 73.54%, and 0.020 dB, respectively.Compared to the Tae algorithm, it decreased the TT, ET, and PSNR by 48.98%, 20.61%, and 0.02 dB, respectively.
The results for the City sequence are presented in Table 6.For an SQCIF BL resolution and QCIF EL resolution, compared to the JSVM, the FIP algorithm decreased the NSB (TT), ET, and PSNR by 49.02%, 47.56%, and 0.013 dB (negligible), respectively.The corresponding decreases for the FBM algorithm were 47.36%, 73.45%, and 0.021 dB (negligible), respectively.
For a QCIF BL resolution and CIF EL resolution, compared to the JSVM, the FIP algorithm decreased the TT, ET, and video quality by 47.98%, 46.87%, and, 0.023 dB, respectively.Compared to the Tae algorithm, the corresponding decreases were 48.31%, 1.99%, and 0.023 dB, respectively.In the case of the FBM algorithm, compared to the JSVM, it decreased the TT, ET, and video quality by 46.83%, 72.67%, and 0.024 dB, respectively.Compared to the Tae algorithm, the FBM algorithm decreased the TT, ET, and PSNR by 47.16%, 27.79%, and 0.024 dB, respectively.
The results for the Soccer sequence tests are presented in Table 7.For an SQCIF BL resolution and QCIF EL resolution, the FIP algorithm decreased the NSB (TT), ET, and PSNR by 43.45%, 48.98%, and 0.021 dB (negligible), respectively.The FBM was not applicable to the Soccer sequence.In the case of a QCIF BL resolution and CIF EL resolution, the FIP algorithm decreased the TT, ET, and video quality by 42.00%, 48.79%, and 0.027 dB, respectively.Compared to the Tae algorithm, the corresponding decreases were 42.29%, 0.37%, and 0.027 dB, respectively.
The foregoing results of the proposed FIP and FBM algorithms indicate that their performances compared to the JSVM for the City sequence using different resolutions are similar to those for the Foreman sequence, although the percentage changes differ.Compared to the Tae algorithm, the FIP and FBM algorithms significantly shorten the TT while decreasing the video quality.
However, only the FIP algorithm, and not the FBM algorithm, is applicable to the Soccer sequence.Owing to its utilization of the mode-distribution correlation between the BL and EL, the FBM algorithm produces bad results when applied to a fast-motion video.Overall, the FIP and FBM algorithms are the best choice for applications in which small video details are important, such as in the medical field.
From the results for FIP and FBM proposed algorithms illustrated before the following is observed: For "city" stream compared with JSVM at different resolution, the same as explained in "Foreman" case will be valid and results with different percentage for two proposed algorithms FIP and FBM are obtained, where in the compression relative to Tae, the FIP and FBM algorithms save large transmission time with video quality decrement relative.
With "Soccer" video sequence, FIP algorithm is the only one applicable since FBM algorithm uses the concept of mode-distribution correlation between BL and EL that gives bad results with fast motion video.The FIP and FBM proposed algorithms are the best choice in applications that require very small details to appear in the video as medical applications.

Conclusion
New SVC algorithms with spatial scalability were developed with the objectives of shortening the TT and ET reducing the computational complexity of the SVC encoding, thereby shortening the ET.Three pivotal factors were considered in assessing the four developed algorithms, namely, the video quality, the correlation between the EL frames, and compromise between the TT and ET.The developed algorithms are of two categories.The first category includes the ILIP and ILBM algorithms, which are based on the concept of interlayer residual and are suitable for applications in which the required video quality is below the THQ.During tests, they shortened the ET and TT by up to 64.30% and 83.61%, respectively, compared to the JSVM, with negligible decrease in the PSNR.The ILIP algorithm enabled more TT saving than the LIBM algorithm, but less ET saving.The second category of the proposed SVC algorithms incudes the FIP and FBM algorithms, which are based on the concept of full search and are suitable for applications in which the required video quality is above the THQ.These algorithms shortened the ET and TT by up to 55.33% and 76.85%, respectively, compared to the JSVM, with negligible decrease in the PSNR.The FIP enabled more TT saving than the FBM, but less ET saving.The experimental observations confirm the effectiveness of the proposed algorithms for enhancing SVC by reducing the NSB, as indicated by the shortening of the TT.

Figure 9 :
Figure 9: Full interpolation.EP = interpolation even EL frame at encoder.BP = interpolation even BL frame at encoder.Delta Even E (Y) = SAD between (EP and corresponding even frame in EL).Delta Even B (Y) = SAD between (BP and corresponding even frame in BL).

Table 2 :
Results for foremen with interlayer residual concept.

Table 3 :
Results for bus with interlayer residual concept.

Table 4 :
Results for football with interlayer residual concept.

Table 5 :
Results for foreman with full mode concept.

Table 6 :
Results for city with full mode concept.

Table 7 :
Results for soccer with full mode concept.