^{1, 2}

^{2}

^{2}

^{1}

^{1}

^{2}

Scalable video coding (SVC) is a new video coding format which provides scalability in three-dimensional (spatio-temporal-SNR) space. In this paper, we focus on the adaptation in SNR dimension. Usually, an SVC bitstream may contain multiple spatial layers, and each spatial layer may be enhanced by several FGS layers. To meet a bitrate constraint, the fine-grained scalability (FGS) data of different spatial layers can be truncated in various manners. However, the contributions of FGS layers to the overall/collective video quality are different. In this work, we propose an optimized framework to control the SNR scalability across multiple spatial layers. Our proposed framework has the flexibility in allocating the resource (i.e., bitrate) among spatial layers, where the overall quality is defined as a function of all spatial layers' qualities and can be modified on the fly.

In the context of Universal
Multimedia Access (UMA), multimedia contents should be adapted to meet various
constraints of heterogeneous environments [

Scalable video coding (SVC) [

Though scalable coding formats in
general and SVC in particular provide flexibility in truncating the coded
bitstream, there is a strong demand for the optimal adaptation strategies and
solutions in various contexts [

In this work, we focus on FGS data
truncation of multispatial layer (or multilayer for short) SVC bitstream, so as
to maximize the overall/collective quality of the spatial layers provided by
the adapted bitstream. For example, let us consider the following scenario (Figure

A scenario of two users with one SVC bitstream.

FGS data truncation of an SVC bitstream with multiple spatial layers.

Trellis diagram grown by the Viterbi algorithm. Each stage corresponds
to a spatial layer, and each branch corresponds to a

Architecture of an SVC adaptation system.

Currently, the FGS data of the above
bitstream can be truncated with a few approaches. With the conventional
approach of top-down truncation [

Additionally, in practice the requirements
from users may be complex and variant in time. For example, the above two users
request a “weighted balance” of qualities between them (or between
the two spatial layers); or when a key (primary) user moves between
end-devices, the quality should be reallocated accordingly. We consider this
fact as a kind of user collaboration [

In this paper, we propose a general
framework to adapt SVC bitstream having multiple spatial layers. Our proposed
framework has the flexibility in allocating the resource (i.e., bitrate) among
spatial layers, where the overall quality is defined as a function of all
spatial layers' qualities and can be modified on the fly. The adaptation
process is first formulated as a constrained optimization problem. Then we
propose a solution based on the Viterbi algorithm to find the optimal bitrate
allocation between spatial layers. We will also show that the approaches of [

This paper is
organized as follows. In Section

The FGS truncation process in SVC can
be conceptually illustrated in Figure

Note that, the base quality layer represents the minimum quality of a spatial layer. Nonetheless, in practice, users could request quality thresholds of their own, which may be higher than those of base quality layers.

Denote _{i}_{i}^{c}

maximize _{i}_{i}

With (_{i}_{1} = 1 and _{2} = 0, the truncation will be top-down so as the
first spatial layer always has the best possible quality.

It should be noted that, due to interlayer
prediction in SVC, the quality of a higher spatial layer depends on the
qualities, or more exactly on the bitrates, of lower spatial layers. That is,

As this framework is essentially a
resource allocation problem, it can be extended to cover temporal scalability
as long as we employ a quality metric that support multidimensional adaptation
(e.g., [

Although the FGS data can be truncated finely, the
truncation in practice is done in discrete steps (e.g., with a unit of 1 Kbps).
So the bitrates _{i}

The principle
of the Viterbi algorithm lies in building a trellis to represent all viable
allocations at each instant, given all the predefined constraints. The basic
terms used in the algorithm are defined as follows (Figure

_{i}_{i}

_{i} at stage _{i}_{−1}) in the
previous stage (_{i}

_{i}

_{i}

_{i}

From the
above, we can see that the optimal path, corresponding to the optimal set of
selections, is the one having the highest weighted sum

Let _{i}_{1} branches growing to _{1} nodes of stage 1. The
number of branches will be _{1} if all values of _{1} are not greater than ^{c}_{2}

We see that the complexity of this solution depends on the number
of layers and the number of selections which is determined by
the truncation step size. Officially,
the number of spatial layers in SVC can be up to 8.
However, to maintain a good coding efficiency, an SVC bitstream contains at most three
spatial layers (with different resolutions) [

It should be noted that the solution provided by the above algorithm is
optimal for the “discretized” problem. However, as mentioned earlier, the
practical truncation is often based on a specific step size. From
our experience, a truncation equal to 1% of the total
FGS bitrate would not result in any perceptual difference. So,
practitioners would look for a solution of the discretized problem, rather than
the

Currently, the R-D information (i.e., _{i}_{i}

In this section, some experiments are
presented to show the flexibility and usefulness of our proposed framework. We
developed an SVC adaptation engine which consists of a decision engine and a
scaling engine (Figure _{i}

Test videos are encoded by the recent
software JSVM7.12. The results presented below are for the football video,
encoded with 2 spatial layers, QCIF and CIF both having frame rate of 30 fps and
GOP size of 16. Correspondingly, two users will consume this content as in the
scenario of Section

For ease of presentation and
discussion, the step size for FGS truncation is set to be 400 (Kbps) and the
quality is shown according to the amount of truncated bitrate. Each spatial layer will be truncated
at four points, namely, 400, 800, 1200, and 1600. Figures

R-D information of QCIF layer. The FGS truncation is applied to QCIF layer only.

R-D information of CIF layer. The FGS truncation is applied to both QCIF and CIF layers.

Now suppose that _{1} = 0.33
and _{2} = 0.67. These weight values would give some balance
between the two spatial layers as the PSNR value of QCIF layer is often higher
than that of CIF layer. The objective of truncation will be to optimize the
overall quality _{1}+
0.67 _{2}.
The optimal selections are represented by the solid path (denoted by

Illustration of different FGS truncation methods. Here FGS data in CIF and QCIF layers are truncated flexibly.

If _{1} = 1 and _{2} = 0,
this implies a top-down truncation used always to maximize QCIF layer's quality. Obviously, the
selections in this case are represented by the dashed path (denoted as

If _{1} = 0 and _{2} = 1,
this implies a truncation that aims to maximize CIF layer's quality. The
selections in this case are represented by the dashed-doted path (denoted as

Figure _{1} = 0.33 and _{2} = 0.67. In these figures,
the horizontal axis represents the total amount of truncated FGS data (in both
CIF and QCIF layers), and the vertical axis represents the PSNR values of each
spatial layer (QCIF in Figure

Comparison of three truncation methods: harmonized (with _{1} = 0.33, _{2} = 0.67), CIF-max, and QCIF-max.

QCIF layer

CIF layer

Now let _{1} = 0.15 and _{2} = 0.85, which implies an emphasis
on the CIF layer. The solution provided by the above algorithm corresponds to
the path of (400, 0),
(400, 400), (1200, 0), (1200, 400), (1200, 800), (1200, 1200), (1200, 1600), and (1600, 1600). Figure

Comparison of three truncation methods: harmonized (with _{1} = 0.15, _{2} = 0.85), CIF-max, and QCIF-max.

QCIF layer

CIF layer

When the weight values are equal (_{1} = 0.5
and _{2} = 0.5), the harmonized truncation of this given
bitstream turns out to be the same as QCIF-max truncation. This is due to the
fact that the PSNR value of QCIF layer is often higher than that of CIF layer
(as mentioned above), so the QCIF layer is always “emphasized” in
truncation process. This means that the

Figures _{1} = 0.33, _{2} = 0.67) and (_{1} = 0.15,
_{2} = 0.85). The horizontal axis represents the total amount of
truncated FGS data, and the vertical axis represents the overall quality
computed by (

Overall quality of different truncation solutions (_{1} = 0.33 and _{2} = 0.67).

Overall quality of different truncation solutions (_{1} = 0.15 and _{2} = 0.85).

It should be noted that the PSNR
value in Figures

To check the
complexity of the algorithm, we measure the processing time of the algorithm
with different step sizes, namely, 1 Kbps, 2 Kbps, 5 Kbps, and 10 Kbps. The quality
values of new truncation selections are linearly interpolated from the previous
sample points obtained with the step size of 400 Kbps (which is similar to [

Processing time with different step sizes (2-layer bitstream).

As the number
of spatial layers of an SVC bitstream is at most 3 in practice [

Processing time with different step sizes (3-layer bitstream).

Meanwhile, it should be noted that in practical video communication, the acceptable
processing delay can be up to 400 milliseconds for two-way application and 10 seconds
for one-way application [

Obviously,
with a bitstream of higher bitrate, the step size should be increased proportionally.
Whereas, from the above example we can see that even if the step size is just
0.5% or 1% of the total bitrate, the processing time of the Viterbi algorithm
would become negligible. Moreover, from our previous
experience with subjective tests on video quality [

From the above, we can see that when there is any change in user requests or in bitrate constraint, the optimization problem can be recomputed on the fly and the adaptation will be seamless to the users. This means that our proposed framework can provide the truncation flexibility with optimal result for any conditions of bitrate constraint and quality tradeoff between layers.

In this paper, we proposed a general framework to adapt SVC bitstream through FGS truncation across multiple spatial layers. Our proposed framework has the flexibility in allocating the resource (i.e., bitrate) among spatial layers, where the overall quality is defined as a function of all spatial layers' qualities and can be modified on the fly. The adaptation process of the proposed framework was formulated as a constrained optimization problem and then optimally solved by the Viterbi algorithm. Through experiments, we also showed that the current approaches of FGS truncation were special cases of our general framework. For future work, we will consider some perceptual quality metrics in our adaptation system and employ analytical models for R-D representation. Also, the framework will be extended to cover other constraints of heterogeneous environments, such as terminal capability and packet loss.

The authors would like to thank Dong Su Lee of ICU for his help in this work. This work was supported by the IT R&D program of MIC/IITA [2005-S-103-03, Development of Ubiquitous Content Access Technology for Convergence of Broadcasting and Communications] and by 2nd Phase of Brain Korea 21 project sponsored by Ministry of Education and Human Resources Development (Seoul, South Korea).