Application layer systematic network coding for sliced H.264/AVC video streaming

,


Introduction
H.264 Advanced Video Coding (AVC) [1] is currently the most commonly used video coding standard, which is gaining widespread use in the emerging communication standards and applications.
Two key challenges of multimedia communication applications over wireless networks are high and varying error characteristics of underlying communications channels and huge heterogeneity of users' equipment.
One of the solutions is to use channel coding techniques which could recover the original data despite losses.The latest state-of-the-art solutions like those based on Reed Solomon (RS) codes are inflexible because the code rate has to be fixed in advance.Moreover, the encoding and decoding operations are quite complex especially for large Galois Field.For such codes, the error characteristics of the channel must be known in advance in order to adjust the code rate to it.This solution does not extend well to multiple receivers as then only a worst-case erasure channel can be assumed for all receivers.
To enable communications in the presence of packet losses, rateless Digital Fountain Raptor codes [2] have become standardized solution in many wireless systems such as Digital Video Broadcasting-Handheld (DVB-H) [3][4][5], Multimedia Broadcast Multicast Service (MBMS), and mobile Worldwide Interoperability for Microwave Access (WiMax) [6].
Another class of rateless codes which have been gaining increased popularity for applications in wireless broadcast/cellular networks are Random Linear Codes (RLCs) [7,8].RLCs show near-capacity performance over erasure channels even for low codeword lengths [9,10].In addition, the emerging networking concepts, such as hybrid broadcast/cellular networks (with users equipped with multiple interfaces) or device-to-device communications, offer a number of opportunities for achieving network coding gains using RLC [11].
Traditional solutions for reliable multimedia delivery use multiple independent Reed Solomon (RS) codes with different rate allocation over importance classes [12,13].These solutions do not have the rateless property and thus have to be designed for the worst channel conditions, and they cannot explore network coding gains over network topologies by packet processing in intermediate nodes.
For applications where short message lengths represent a natural choice (such as multimedia delivery) and where wireless multihop communications are encountered, the RLC scheme represents a more efficient and versatile approach.Such realizations are expected to result in increased throughput of wireless multihop broadcast/cellular networks [8].Due to all this, RLCs have been considered as a unique rateless/network coding solution [14][15][16][17][18] for emerging wireless systems, such as Long-Term Evolution-Advanced (LTE-A) and DVB-NGH (Next Generation Handheld).(See [19] for performance/complexity comparison between Raptor codes [2] and RLCs.) The inherent disadvantage of RLC is that RLC suffers from high decoding complexity of the Gaussian Elimination decoding as codeword length increases.However, even with very short codeword lengths that admit efficient implementation, RLC performance matches that of Raptor codes of higher codeword lengths [19,20].
Both Raptor codes and RLC are "all or nothing" codes that equally protect the entire stream.For embedded and scalable sources where different parts of the stream have different importance to reconstruction, unequal error protection (UEP) is beneficial.Expanding Window Fountain (EWF) codes as a class of UEP FEC codes for scalable video delivery are proposed in [21].EWF codes are based on the idea of creating a set of "nested windows" over a source block.This EW concept is extended to RLC, and performance limits are presented in [22].The suitability of nonsystematic EW RLC for transmission of data-partitioned H.264/AVC has been investigated in [4].
As compared to the data partitioning H.264/AVC feature [1], slicing has an advantage that the size of slices can be tailored to the application.The slicing feature of H.264/AVC can be used to partition video stream into classes of decreasing importance (for video reconstruction) with a very small decrease in overall performance.
A scheme has been proposed in [23] based on macroblock classification into three slice groups and UEP of H.264/AVC streams.The ordering of macroblocks into three slice groups is done by examining their contribution to the video quality.The three slice groups are then protected with UEP using RS coding.In [24], a slice sorting by relevance (SSR) algorithm for prioritizing slices based on their contribution to the reconstruction is used together with RS coding.The work in [24] is later extended in [25] and proposes an algorithm termed Concealment Driven Slice Ordering with RS codes.The ordering of slices is based on error propagation effect and the rate devoted to each slice.
The proposed work differs from the earlier work in the method of prioritizing slices and choice of systematic rateless codes for channel coding.The slice-partitioned video stream can provide an advantage with respect to H.264 Scalable Video Coding (SVC) [26] of better coding efficiency and compliance with the AVC standard.The layered video can be protected by EW RLC codes that can provide a different degree of protection to each layer/window.
Building on our prior work [4,19,20,22], the focus of this study is to analyse the use of the EW approach with systematic RLC as component codes for UEP of the slice-partitioned H.264/AVC video.Systematic RLCs have the advantage of supporting more efficient encoding and decoding procedures compared to nonsystematic RLC.
In contrast to [24], where priority layers are built based purely on distortion information, in this paper, we propose a new cost function that takes into account the frame play out deadline and temporal error propagation to better prioritize slices into quality layers.
Our simulation results show that EW RLC can be used to effectively protect the different priority windows for reliable video transmission over packet erasure channels.Significant performance gains are obtained compared to the equal error protection scheme and the benchmark scheme that prioritizes the sliced stream in an ad hoc fashion.
The rest of the paper is structured as follows.The relevant background on RLC and the slicing feature of H.264/AVC is covered in Section 2. The proposed system is described in Section 3. The results are presented in Section 4 and conclusion and future research in Section 5.

Background
In this section, we give a brief background on slicing in H.264/AVC [1,27] and overview of RLC [7] and EW RLC [22] coding scheme.

2.1.
Slicing in H.264/AVC.H.264/AVC provides many errorresilience features to mitigate the effect of lost packets during transmission.
One such scheme available in the baseline profile is slicing [27], which enables the partitioning of a frame into two or more independently coded sections, called slices.Each slice in a frame can have either a fixed number of assigned macroblocks (MBs) or fixed data rate.Each coded slice is independently decodable; however, the slices have different contribution (importance) to the video reconstruction.Thus, arranging the slices in decreasing order of their contribution to reconstruction can be used to provide a layered video stream suitable for UEP.

Random Linear Codes (RLCs)
. RLC applied over a source message produces encoded packets as random linear combinations of message packets with coefficients randomly selected from a given finite field GF(2 q ).For example, using RLC over a source message x of length K, an encoded packet ω is obtained as ω = K i=1 α i • x i , where α i is a randomly selected element of GF (2 q ).The resulting encoded packet ω is of the same length (b bits) as the source message packets.In addition, to each encoded packet, ω a header information is attached that contains the so-called global encoding vector g = {α 1 , α 2 , . . ., α k } consisting of randomly selected finite field coefficients.The header requirements in a unicast point-to-point setup can be relaxed if a pair of synchronized random number generators (RNGs) is used at the transmitter and the receiver and only the RNG seed is communicated within each encoded packet header.The encoding procedure is repeated at the transmitter in a rateless fashion.
Thus, each encoded symbol is a linear combination of all or a subset of the original source symbols.An RNG seed carried in the header of the encoded symbol can be used at the decoder to recover the coefficients used at the encoder.
Encoding procedure is simple to implement, and, for sufficiently large finite field used for creating linear combinations of source symbols, RLC codes perform as near-optimal erasure codes (one-byte field GF(256) is usually sufficiently good [7]).
For practical network coding, RLC is used at source nodes for encoding the source message packets and at intermediate network nodes for random recombining of incoming and/or buffered encoded packets.The source nodes and intermediate nodes may produce encoded packets in a rateless fashion, until the requirements of receiving nodes are met (which may be confirmed by feedback messages), or in the delay-constrained applications, until a new source block is scheduled for transmission.
After sufficient linearly independent coded symbols have been received, the decoder can recover the original source symbols.
The RLC use is hindered by the decoding complexity of Gaussian Elimination decoding, which is polynomial in the number of symbols.However, for short lengths of the source messages, the decoding complexity is acceptable (see [19,28] and references therein).
A systematic code is any error-correcting code in which the input data is embedded in the encoded symbol.The advantage of such codes is that the receiver does not need to recover the original source symbol in case of correct reception.
When erasure rates are low, it is effective to use systematic RLC, which further reduce the decoding complexity, since the received systemic packets can be used to reduce the effective code length before Gaussian Elimination decoding.[22].In [21], EWF codes as a class of UEP fountain codes are proposed.EWF codes are based on the idea of creating a set of "nested windows" over the source block.The rateless encoding process is then adapted to use this windowing information while producing encoded packets.In this paper, we use the main concept of EWF to create EW RLC [22] from consecutive source blocks containing fixed number of symbols (data packets).

EW RLC
First, we define a set of windows over the groups of source symbols of unequal importance.Coding is then performed over progressively increasing source block subset windows aligned with this "most to least importance" subsets.
The general layout of a window structure with three importance classes is shown in Figure 1.The window with the most important subset of encoded data is W 1 , and the importance of data additionally included in windows progressively decreases as we proceed to W 3 .The subset data of W 1 is contained in all the subsequent windows and is hence the best protected.Apart from W 1 , each window in addition to some of its own data also encloses all the data of the higher importance windows.Conventional RLC is applied on each window.
The encoding process for EW RLC has one important initial step that is to first select a window from which the RLC encoded symbol is to be generated.This selection of a window is determined by probability of selection of a window which is a preassigned parameter keeping in mind the importance of different layers and the data rate available.After a window is selected, the encoding is the standard RLC encoding performed over the source packets contained in that particular window only [22].The window selection procedure is independently repeated for each created encoded packet.
In [22], analytical performance of EW RLC is given together with comparison with traditional nonoverlapping UEP RLCs that use independent code for each window.In [4,20], nonoverlapping window (NOW) and EW have been used to provide unequal error protection to the data partitioned H.264/AVC video data.It is shown in [4] that the performance of EW is better as compared to NOW because in NOW each window is independently decoded and thus the low priority windows do not contribute to recovery of the high priority windows.

The Proposed System
In this section, we propose a system for optimally protecting the slice-partitioned H.264/AVC video data with systematic EW RLC.We assume that the encoded video stream is transmitted over a packet loss channel.That is, all packets that arrive at the application-layer RLC decoder are correct, while those with bit errors are discarded by error detection codes, such as Cyclic Redundancy Check (CRC) codes present at the lower layer in the protocol stack (e.g., physical or link layer).We further assume that error detection capability of the employed CRC codes is perfect, which is usual assumption [7,[9][10][11]24].Thus, the application layerto-application layer channel is modeled as packet erasure channel with random packet drop statistics.
In order to increase error resilience, we encode a video sequence using slicing with a fixed slice size of 600 bytes.That is, after the H.264/AVC encoding, we obtain the video data in which each frame including the IDR (instantaneous decoder refresh) is divided into slices of 600 ± 3 bytes, except for the last slice of each frame which can have a lesser size.The size of 600 bytes is chosen here to keep the number of RLC symbols per codeword low in order to reduce the decoding complexity of Gaussian Elimination.See [11,28] for discussion about acceptable block lengths for real-time RLC decoding.The resulting slices carry different importance to reconstruction which has been used to achieve UEP (see [24,29] and references therein).
After source coding, EW RLC coding takes place.Since systematic RLCs are used, first all encoded symbols (from all the slices) are transmitted without any coding.Because of possible errors/erasures in the channels, some packets will be missing at the decoder.
To correct these erasures, RLC redundancy packets are generated next.
Before RLC, the priority of each slice is obtained by dropping it from the Group of Pictures (GOPs) data and measuring the resulting peak signal-to-noise ratio (PSNR), as a frame-by-frame average of the entire GOP, by actual decoding.This also takes into account the error propagation effect to the subsequent frames due to loss of a slice in an earlier frame.That is, the cumulative PSNR of the GOP is measured by dropping each slice in turn starting at the first P frame.After having obtained the cumulative PSNR values for each slice (as dropped), the difference from the full-decoding PSNR of the GOP is measured.Determining PSNR drop can easily be done during the encoding process with negligible added complexity (see [24]).
The results are shown in Figure 2 for the first GOP (having 16 frames and the encoding structure IPPPP. ..) of the standard CIF Foreman sequence.It can be seen from Figure 2 that the importance of the slices on total frameaveraged PSNR generally decreases as we move towards the end the GOP.Similar results are shown in Figure 3 for the first GOP (having 64 frames and the encoding structure IPPPP. ..) for the Paris sequence.As can be seen from the figures, the PSNR drop values for Paris sequence are larger due to large GOP size.
Thus, we can sort the slices into multiple priority layers and assign a higher degree of protection to the important layers as compared to the layers containing less significant slices.Such layering enables a prioritized data transmission with UEP schemes and was used before in [24,29].
Purely grouping the slices into priority classes based on the PSNR decrease shown in Figure 2, as done in [24,29], does not take into account real-time frame playout deadline (frames coming sooner should be given a higher priority).
Motivated by this, we redefine a cost function used in [29], to take into account not only the drop in cumulative PSNR for each slice, but also the temporal importance of a slice: where D(slice) represents the drop in the cumulative PSNR (see y-axis of Figure 2).The value of τ(slice) represents the playout time deadline of the slice (frame) relative to the playout time of the first IDR in the GOP.That is, τ(slice) of the IDR frame is set to zero, and each subsequent frame adds its playout time duration to this value.w is a constant that trades off the distortion D(slice) and remaining playout time.
In this way, we create a system to assign a priority to all the slices trading off importance of the slices to reconstruction and playout time deadline.After computing W(slice) for all the slices in a GOP and selecting threshold values The first layer includes IDR and slices with W(slice) ≥ T 1 , the second layer includes all remaining slices with W(slice) ≥ T 2 , and so forth.In addition, our algorithm also puts at least one slice per frame to the first layer, if none is selected (from a frame) based on the above criteria alone.This helps to stop the error propagation effect further and thus improves resulting PSNR.Such selections may be needed for frames which occur towards the end of GOP as can be seen from Figure 2.
In the proposed scheme, we can create L windows using a threshold (L − 1)-tuple T 1 , . . ., T L−1 , and allocate different protection to each window.Note that the slices would already be in their decoding order within each layer.However, within each window, the slices will need to be restored to the original order to enable decoding by the AVC decoder.
After determining thresholds and assigning slices to the L windows, the size of each layer is fixed.Then, the remaining task is to find the optimal allocation of redundancy to each layer, or equivalently probability of window selection.We express the probability of window selection as an L-tuple where ith entry denotes probability of selection of a packet from layer i.For example, let L = 2 and the vector of selection probabilities as [0.6, 0.4], this implies that the first window, W 1 , will have a selection probability of 0.6, whereas W 2 will have a selection probability of 0.4.That is, in average, redundant packets from W 1 will comprise 60% of the overall redundancy.
To find optimal packet selection vector, we maximize the expected PSNR using analytically computed probabilities of decoding error performance.That is, where P(i) is probability that layer i will be the highest layer recovered, P(0) is the probability that nothing is recovered, psnr(i) is the PSNR of the reconstruction if all layers up to and including layer i are recovered, π is an L-tuple vector of window selection probabilities that determines the UEP allocation scheme, and PSNR(π) is the expected PSNR when UEP scheme π is used.
In the above maximization, we made assumption that, if decoding of window i fails, none of the packets from window j ≥ i can be used for reconstruction.This is true for nonsystematic EW RLC and approximation for systematic EW RLC.
Analytical expressions for probabilities P(i) assuming a random channel loss model for EW RLC as derived in our prior work [22] are as follows: where the desired decoding probabilities P d,N (l) are expressed as The expression for P r(ξ),N is as given: whereas where Γ i is the probability of selection of the ith window, R i s are random variables denoting the number of received packets from window i, and K i is the number of source packets in widnow i.
The optimization method is exhaustive search and scales linearly with the number of UEP schemes being used.
For error concealment, we repeat the last correctly decoded frame to replace frames for which the base layer is not decoded properly.

Results and Analysis
In this section, we present our simulation results.For simplicity, we consider the case of L = 2 layers: high-priority layer (HPL) that contains more important slices, whose W(slice) ≥ T, and low-priority layer (LPL) that contains less important slices for which W(slice) < T, where T is the chosen threshold.
The thresholds determine the source rate for each layer.For example, a lower T would result in a lower source rate (and, hence, error-free performance) for the base layer.Thus, T i s are set based on available clients' bandwidths as well as desired error-free performance levels.In practice, transmitter can dynamically adapt the source rate per layer to varying channel conditions of different clients by changing T i 's.
The video sequence Foreman in the CIF format is encoded using the H.264/AVC software JM version 16.2 [30].First, we use the GOP size of 16 frames with a frame structure IPPP. .., with a fixed slice size of 600 bytes.We compare three schemes: one is the proposed UEP scheme optimized using (1).The second scheme is the benchmark scheme, where we put all the slices of IDR and the first slice of each frame in HPL and all other slices in LPL.The third is the equal error protection (EEP) scheme that protects all slices equally.Note that the benchmark scheme is a low-complexity scheme where prioritization is done in an ad hoc manner; it still uses the same systematic EW RLC for protection of the two layers.
The proposed scheme is designed in accordance with the algorithm described in Section 3 with T = 0.78 and w = 2.5.The sizes, number of packets (same as the number of slices in a layer), and resulting PSNR values for both configurations are shown in Table 1.For this selection of T, the proposed UEP scheme has larger HPL than the benchmark.
Note, however, that a smaller HPL for the proposed scheme could be obtained by suitably selecting parameter T and w in (1).
All schemes are compared at the same transmission bitrate.For an L-layer scheme, the overhead cost needed to describe a UEP solution is 7 × L plus (L − 1) * 8 * 2 to convey T i s and w i s.For L = 2 used in this paper, this number is only 30 bits and has not been taken into consideration.
The proposed schemes are simulated with transmission of EW RLC for 1000 runs and the results averaged.The total number of packets to be transmitted for each run is 100.Because of the employed systematic RLC, the transmission takes place in two phases.In Phase I, we transmit 77 packets consisting of the source symbols.In Phase II, we transmit additional packets in accordance with EW RLC.Note that Phase I will be the same for all the three schemes, whereas, in    Phase II, the probability of selection can govern a prioritized transmission of HPL.The important phenomenon seen here is that, since each slice is independently decodable, the PSNR obtained in the case when RLC decoding of LPL fails and decoding of HPL succeeds is higher than the PSNR of successfully decoded HPL due to useful packets that are received from LPL during Phase I.
This gain comes from the correct reception of additional LPL symbols from Phase I even with failure of LPL decoding.The simulations have been performed for different packet loss rates (PLRs) and different probabilities of window selection to evaluate the performance of the slicing feature to overcome losses.
In the case of nonsystematic codes, if the first window W 1 (or W 2 ) does not get decoded, the entire GOP is considered to be lost.However, in case of systematic codes, it is still possible for the H.264/AVC decoder to decode the GOP as long as its IDR frame has been received correctly.In case of loss of IDR with systematic codes, the entire GOP is lost.The PSNR for such cases is obtained by using the last frame of the previously decoded GOP to replace all frames of the lost GOP.
The various configurations are used to create different UEP schemes based on protecting the constituent windows with different protection, based on probabilistically selecting a window for each output symbol at the transmitter.An increase in the selection probability of window 1 (W 1 ) will improve its robustness at a cost of a decrease in robustness of the succeeding layer(s).The EEP scheme is the case where only the largest window is selected with 100% probability.This means that all of the data is protected with no preference for the data considered important, that is, window W 1 .
In Figure 4, we present the results of comparison between the systematic codes and nonsystematic codes.The scheme PS60S is a scheme with probability of W 1 selection equal 0.6 (i.e., probability to select a symbol from HPL is 0.6), and the suffix S indicates systematic codes.Similarly, scheme PS80N has probability of W 1 selection of 0.8 with nonsystematic codes.It can be seen from the figure that the systematic codes generally have better results than the nonsystematic codes for the error range and data rates shown.
Systematic codes, in general, do not provide improvement compared to nonsystematic codes.Systematic codes however reduce the decoding complexity since with systematic codes the decoder operates with the matrix that has reduced number of rows (reduced, by the number of correctly received systematic packets).Thus, Figure 4 demonstrates that there is no loss in performance due to systematic codes.
Figure 5 shows PSNR versus PLR for the proposed systematic EW RLC scheme.The numbers shown in brackets represent the selection probability of each of the two windows, for example, UEP (60, 40) represents a code in which a symbol from W 1 will be selected for transmission with probability 0.60.As can be seen from the figure, the results of UEP schemes are significantly better than the EEP schemes for high loss rates.
UEP (100, 0) is a scheme in which only W 1 is protected and sent.The scheme is constrained in that it cannot achieve higher PSNR than 27.6 dB (see Table 1).However, the decoding failures, that is, when the entire GOP data fails to be decoded, will be much less for UEP (100, 0), since HPL is protected strongest which facilitates each GOP to be received with high probability, though at basic quality level.This scheme could thus prove useful in higher PLR.Also, note that, for this scheme, in Phase I of transmission, only the systematic codes in the HPL will be transmitted and, in Phase II, the encoded symbols come from HPL alone.
The PSNR results are improving with an increase in probability of selection of W 1 because at higher probabilities of selection of W 1 , the decoding of HPL has high chance to be successful.As described earlier, the PSNR with decoding of HPL is enhanced by systematic LPL packets.
In Table 2, the details of HPL size and PSNR contributions for the three schemes created with selecting three different values of T are shown.Intuitively, when the threshold T is lowered, the number of packets selected for HPL is higher.In Figure 6, we present the optimized results for the schemes created in Table 2.The results for the EEP scheme and benchmark are also shown for comparison.For each PLR, we found the optimal proposed UEP and the optimal benchmark UEP using (2).It can be seen from the figure that the proposed method leads to significant gains for high PLRs compared to the EEP and the benchmark scheme.The selection of T governs the size of HPL.If the size of selected HPL is small it will have relatively lower PSNR compared to a larger HPL.Lower T leading to a larger HPL, is thus better for higher PLRs, which is expected since a large HPL (with higher PSNR) is better protected, and for LPL anyway there is not enough bandwidth.
Similar results obtained for the CIF Paris video sequence are shown in Figure 7.Note that, for high PLR, it is better to reduce T resulting in large HPL.In any case, varying T, one can effectively design HPL/LPL sizes for different PLR.
A larger GOP size may be required for applications such as DVB-H [4].We encode the same Foreman sequence with a GOP size of 64 frames.For this configuration, the total source packets are 161.The total number of sent packets is kept as 209 packets.In Figure 8, we present the optimized results for the schemes created using two different values of w as shown.Both schemes have the value of T = 3.1; however, based on different value of w, different slices are selected for HPL for each scheme.The scheme w 1 = 2.5 has better performance than w 2 = 0, especially at high packet loss, this comes from the fact that the former scheme prioritizes slices taking into account frame position in the sequence, which reduces error propagation.The benchmark scheme is created according to the selection criteria as used previously.EEP scheme performs the worst of all the schemes.The results for w 1 = 2.5 and w 2 = 0 are close at the lower PLR.The reason for this is that, with systematic codes, if the HPL is decodable, then the packets received correctly (which could be from HPL or LPL) in Phase-I also contribute to improve the PSNR.
The Paris sequence encoded with similar parameters is used to investigate the effect of w on performance.The  optimized results are presented in Figure 9 for the schemes created using two different values of w along with Benchmark and EEP scheme.The results are similar to those in Figure 8 for the Foreman sequence, which confirms the analysis carried out earlier.
From the last two figures, we conclude that w is a useful parameter to improve source packet allocation (compared to the w = 0 case).We tested several different values of w and report our results for the several typical cases that show achievable performance boundaries by varying w.One can see from the figures that effect of w is small-up to 1 db.

Conclusions and Future Work
In this paper, we proposed the systematic EW RLC scheme to protect the sliced-partitioned video data under various channel conditions at different probabilities of window selection.We proposed a novel slice prioritization method that takes into account PSNR contribution of a slice as well as position of its frame within GOP.The simulations for two layers show that UEP schemes perform better as compared to the EEP scheme and ad hoc prioritization, achievable with a minimal selection (one slice) of video data from each frame.Such reduced selections may be advantageously used in video-on-demand applications.The decoding complexity of RLC can be easily managed in the proposed scheme by an adaptive scheme which dynamically selects the slice size.The proposed schemes are hence suitable for real-time multimedia mobile applications.

Figure 1 :
Figure 1: Expanding window structure with three windows.

Figure 5 :
Figure 5: PSNR versus average PLR for the proposed scheme and the EEP scheme.

Figure 6 :
Figure 6: Optimized results for three values of T for the Foreman sequence.

T 1 1 Figure 7 :
Figure 7: Optimized results for three values of T for the Paris sequence.

Figure 8 :
Figure 8: Optimized results for two values of w for the Foreman sequence.

Figure 9 :
Figure 9: Optimized results for two values of w for the Paris sequence.

Table 1 :
Layer sizes and PSNR contributions for T = 0.78 and w = 2.5 for Foreman sequence.

Table 2 :
Layer sizes and PSNR contributions for configurations with different values of T.