A novel digital image stabilization technique is proposed
in this paper. It is based on a fuzzy Kalman compensation
of the global motion vector (GMV), which is estimated in the
log-polar plane. The GMV is extracted using four local motion
vectors (LMVs) computed on respective subimages in the logpolar
plane. The fuzzy Kalman system consists of a fuzzy system
with the Kalman filter's discrete time-invariant definition. Due
to this inherited recursiveness, the output results into smoothed
image sequences. The proposed stabilization system aims to
compensate any oscillations of the frame absolute positions, based
on the motion estimation in the log-polar domain, filtered by the
fuzzy Kalman system, and thus the advantages of both the fuzzy
Kalman system and the log-polar transformation are exploited.
The described technique produces optimal results in terms of the
output quality and the level of compensation.
1. Introduction
Digital video stabilization is the process, where the video signal is smoothened against
unwanted oscillations while preserving the intentional camera movements. Almost
any acquired image sequence is affected by noise and undesired camera jitters,
caused by unstable holding and rough terrain. These unwanted positional
oscillations of the image sequence affect the visual quality, which besides the
aesthetic part is also crucial in many applications such as in robot vision or
in video compression. High visual quality enables either humans or machines to
easily watch and perceive the sequence, and thus meaningful results to be
extracted. Several different image stabilization methods have been reported in
the literature and they can be distinguished into three major categories; the
technique where the unwanted fluctuations are mostly the rotational ones and
the stabilization is implemented by servo motors, which compensates the pan and
the tilt camera movements, respectively, is known as active image stabilization
[1]. The image
stabilization which is performed by electronic hardware is referred as
electronic image stabilization [2]. Finally, when the unwanted oscillations are
compensated by pure image processing techniques, the process is called digital
image stabilization (DIS) [3]. A DIS system is built by two successive units: the
motion estimation and the motion compensation one. The goal of the first unit
is to compute the motion vectors, and eventually the GMV. The compensation unit
follows the motion estimation and produces the vector to shift the current
frame's position so that the output to be free from irregularities, preserving
the desired global motion. An important feature affecting the performance of
the DIS systems is the noise level. Apparently, the lower the noise is the
smoother the results are. The GMV calculation has been realized by various
techniques, such as phase correlation matching [4] and normalized
cross-correlation [5].
A real-time DIS implementation that performs image matching of two successive
images, by means of the Fourier-Mellin transformation has been reported in [6]. In [7], the GMV estimation is
optimized by the exploitation of fuzzy logic. Kalman filtering has been
utilized for the enhancement of the compensation of frame position [4, 8]. Apart from the matching
techniques, optical flow ones have been adopted to estimate the motion in a
sequence. The undesired motion effects are calculated in [9] by estimating the rotational
center and the angular frequency from the local translational motion definition
by fine-to-coarse multiresolution motion estimation. In [10], the stabilization is
accomplished by fixating at the central image region, while optical flow
estimation optimizes this approximation. The LMVs determine the movement in a
particle of the image, resulting in a better estimation of the indented camera
movement and the undesired motion. A widely used technique is to compute the
GMV via a series of LMVs. The computational cost of full-frame search
algorithms implied the calculation of the global motion on subimages. The LMVs
estimation on these regions has reduced the processing times to a high degree.
The image sequence transformation to less computational intensive topological
rearrangements has further reduced the processing and the computational
resources.
In this paper,
we transformed the Cartesian images into log-polar ones [11, 12] and there we computed the GMV from four LMVs in
respective image regions. The resulting method achieves low processing times,
efficient for real-time implementation. Due to the intrinsic attentional nature
of the log-polar transformation, the motion estimation of the LMVs exhibits a
space-variant distribution. Moreover, a fuzzy Kalman DIS technique is proposed.
Kalman filter and fuzzy systems have widely been used in DIS applications.
Recursive fuzzy systems provide optimal results. Prior smoothening of the
imported displacements to the fuzzy system, either by Kalman filtering or
another filter, has also provided efficiency to fuzzy systems [7]. However, in this work the
recursiveness of Kalman filter is directly introduced to the fuzzy system,
instead of expressing it as a standard discrete time-invariant system. The
fuzzy inputs of the proposed system are expressed with the
estimation-correction equations of the Kalman filter. Therefore, the intended camera
movement is preserved more efficiently since it happens mostly in the
foreground. Consequently, to the GMV estimation the fuzzy Kalman filter is
utilized. In each time step, the estimated motion vector is the a priori
measurement, while the output of the system is the a posteriori one. Finally,
the correction is achieved through the previous measurements, which are used as
the estimated ones. The fuzzy system was tested with several types of
membership functions (MFs) and different aggregation and defuzzification
methods. The measured fluctuations were not filtered further. The use of
log-polar images for the motion field extraction issued fast and optimized
results both for the stabilization of each frame and the visual quality of the
video output, in all the tested situations. The whole operation exploits the
advantages of the log-polar plane and the fuzzy Kalman system.
2. Motion Estimation
The motion
estimation unit of the DIS system extracts the GMV. This unit distinguishes
between the desired and the unwanted motion effects. The key feature is the
accuracy of the intended camera motion estimation. Several motion estimation
approaches were proposed in the past. Their main categories are the block
matching [13], the
phase correlation [14], and the optical flow ones [15].
2.1. Log-Polar Transformation
Motion estimation is extremely demanding in terms of com-putation and resources.
Subsampling of the images is often used in order to overcome this computational
load. Therefore, a topological arrangement and notably a space-variant one,
such as the log-polar, provides lesser volume of the image data without
constraining the field of view or the image resolution at the fixation point.
The log-polar transformation is based on the human's eyes projections of the
retina plane to the visual cortex. It finds its origins into studies on the
vision mechanisms of the mammals. The adoption of this topology into artificial
vision systems ex-hibits several advantages as in visual attention, throughput
rate and real-time processing. Many applications of the log-polar
transformation have been reported, such as the time-to-impact estimation
[11], wavelet extraction
based on log-polar mapping [16], tracking [17], and disparity estimation and vergence control
[18].
The mathematical model of the log-polar mapping can be
expressed as a transformation between the polar (ρ,θ) (retinal), the log-polar (ξ,η) (cortical plane), and the Cartesian plane (x,y) (image plane) as shown in Figure 1. As-suming that Nr is the
number of cells in the radial direction and Na is the number of cells
in the angular direction, the mapping from the polar
coordinates (ρ,θ) to the log-polar coordinates (ξ,η); the log-polar variables ξ and η are defined as
ξ=loga(ρρ0),γ=ηNα2π,
where ξ is each row pixel, η is each column pixel, and ρ0 is the radius of the fovea. The logarithmic
basis α is obtained from the foveal radius, the image
radius ρmax and the radial resolution Nr:
αNr=ρmaxρ0orα=e(1/Nr)ln(ρmax/ρ0).
The log-polar mapping maps the radial lines
andconcentric circles into lines parallel to the coordinate axes.
The aforementioned mathematical formulation applied on
the image in Figure 2(a) results to the log-polar image in Figure 2(b). In Figure 2(c), the reconstructed Cartesian
representation of the log-polar image is shown.
(a) Cartesian image, (b) log-polar image, and (c)reconstructed Cartesian from log-polar image.
2.2. Motion Field Extraction
The image motion is the projection of the real world 3D motion onto the two-dimensional
image plane. This is ex-pressed as either image velocities or image
displacements on the x and y axes of the optical flow field. Optical flow techniques are divided into three main categories: the differential techniques,
the frequency-based ones, and the matching methods [15]. The chosen calculation
method is a differential one, that is, the classical Horn and Schunk optical
flow model as modified in [19].
In order to
reduce the computational load of the motion estimation, the horizontal and
vertical axes displacements are computed on selected image regions located at
the periphery of the image. On the Cartesian plane, these have a rectangular
shape of 440×100 pixels and 100×280 pixels, respectively, as shown in Figure 3(a). Notwithstanding, the calculation of the LMVs was performed on the log-polar plane. The respective patches have an arch-like shape of dimensions
7353 pixels and 1893 pixels, respectively, as shown in Figure 3(b).
The selected
subimages for motion estimation, in the Cartesian and log-polar plane,
respectively.
Yet, the motion estimation on the log-polar plane has
some special features that should be taken into consideration, that is, the motion
vectors are not transferred straightforwardly from the Cartesian to the
log-polar plane due to the introduced fictitious gray-value curvature in the
polar image [12].
Having estimated the LMVs, the GMV was the average value of the four LMVs, as
it provided better results for the tested image sequences. The displacements
are then imported into the fuzzy Kalman system without further processing.
3. Fuzzy Kalman System
The
prediction-correction recursive equations of the Kalman filter were employed
for the definition of the fuzzy inputs. The ground truth values of the fuzzy
Kalman system are the displacements obtained during the optical flow technique,
at the motion estimation phase. The use of the Fuzzy Kalman system equations
are depicted into Figure 4 and are defined as follows.
The block diagram of the fuzzy Kalman filter.
Prediction:
Pk−=APk−1AT+Q,x^k−=Ax^k−1−+Buk.
Correction:
Input1k=zk−1−x^k−,Input2k=Input1k−Input1k−1,Kk=Pk−HT(HPk−HT+R)−1,Pk=(I−KkH)Pk−,x^k=x^k−+Kk(zk−Hx^k−),where k is the time
index, zk is the measurement value at the current time
step, and x^k− is the a priori estimation of the frames
positions. x^k are the a posteriori estimated frame
positions. Pk− and Pk define, respectively, the a priori and the a
posteriori error covariance matrices. The first input (4) is defined as the
difference between the absolute frame translation and the a priori estimation
of the stabilized frame position. The second input (5) indicates the rate of
change of the first input at the current time step. The measurement values in
each time index represent the frame's translation. The tuning variables Q and R for the process and the measurement noise, respectively, are set
to a ratio of 10 (R/Q = 10). Higher ratio yields to quicker
responses, but the final output is not smooth enough, as the final frames
position are close to the measured ones. High R values lead to low
responses, though the high frequencies are cut off, providing smooth output. In
order to provide a fast response the ratio was set to 10, although a ratio of
100 and higher introduced less error to the final output.
The key features in the designing of a fuzzy system
are the shape of the MFs and the decision rules. In the proposed system, five
MFs are used for each input and output, as they are efficient for the desired
task. The construction of the fuzzy rules depends on the experience of the
designer and the application used. In our task, there was a need of covering
the range in order the final output to be smooth enough. Thus, the options are
to distribute normally the MFs to their range or to import more MFs. More MFs
lead to more fuzzy rules, and consequently to higher complexity. The selection
of the type of the MFs is also crucial for the construction of a fuzzy system.
The tested types of the MFs are Gaussian, trapezoid, and triangular ones. In
all experiments, all the variables (inputs and output) had the same type of
MFs. The two inputs and the output are normally distributed to their range in
order to obtain, as it is mentioned, a smooth output. All the variables define
the frame translations and are set to [−8 8] pixels, as 8 pixels were the
maximum absolute translation both on the horizontal and the vertical axis. The
sign indicates the direction of the movement, that is, left or right and up or
down. The rules interaction set is depicted in Table 1 and Figure 5 illustrates
the fuzzy system for the Gaussian MFs. Important role to the fuzzy system play
the possible adjustment methods, such as the implication, the defuzzification,
and the aggregation ones. In the proposed system, the implication was set to
product and the aggregation method to sum, as it provided a smoother output
value. The defuzzification method was set to centroid, as it covers the output
range more efficiently.
Rule base for the fuzzy Kalman system.
Input2
NB
N
Z
P
PB
Input1
NB
NB
N
N
Z
Z
N
N
N
Z
Z
P
Z
N
Z
Z
Z
P
P
N
Z
P
P
P
PB
Z
Z
P
P
PB
The membership functions of the fuzzy Kalman filter for Gaussian MFs.
4. Experimental Results
In order to evaluate the performance of the proposed
system we performed several tests. These include different stabilization
experiments captured by an active stereo vision head. The size of the acquired
sequences is 640 × 480 pixels. Some of the testing input videos
were acquired, while an active image stabilization routine was running. All of
these sequences suffer from high-frequency image jitters, produced
intentionally by the user for testing purposes. They also suffer from high-
illumination changes as well as from fluctuations caused by the servo motors.
Further experiments were made, capturing video on a free course. These
sequences suffer from motion blurred frames. The remedy to such sequences is a
higher frame rate. As the acquired videos were tuned to 25 fps, the fast
oscillatory movements during the course provoked loss of information to a high
degree. The purpose of capturing such noisy and shaky sequences is to assess
the proposed fuzzy Kalman system against complicated and challenging
circumstances.
In order to compare the efficiency of our system the
stabilization was assessed in four different combinations of image topologies
as follows:
Cartesian image, full frame;
Log-polar image, full frame;
Cartesian image, subimages;
Log polar image, subimage.
The use of LMVs in Cartesian images
provided better results than the full-frame ones. Table 2 summarizes the
comparative results. In order to measure the performance of the proposed
stabilization the mean square error (MSE), the least square error (LSE), and
the least mean square error (LSME) were calculated. The equations of these
errors, as all the values are known, are defined as
MSE=1m∑i=1m(fi−zi)2,LSE=(fi−zi)2=min,LMSE=,1m∑i=1m(fi−zi)2=min,
where fi is the final stabilized frame position and zi is the measured one from the motion estimation phase for every
time index i. It is clear that the GMV extraction via LMVs in the
log-polar plane provided the smoother output. The fuzzy Kalman system responded
better by using triangular MFs. The visual results of the fuzzy Kalman system
are demonstrated in Figure 6, while in Figure 7 the initial and the final frames'
translation are shown for all the tested occasions. It is clear that the
estimation of the GMV into the log-polar plane provides better performance.
Error calculation table: the tested image topologies
for motion estimation.
Cartesian
Log-polar
Cartesian
Log-polar
Full frame
Full frame
Subimages
Subimages
MSE
2.311
1.872
2.151
0.751
LSE
0.0008
0.0000124
0.000006720
0.00000013
LMSE
1.23
1.68
1.51
0.522
Two consecutive
frames, the right column are the unstabilized while the compensated ones are
depicted at the left one.
The absolute frame positions before and after the stabilization for all the tested topologies.
Furthermore, these errors were also calculated for the
efficiency of the different types of MFs. In Table 3, the comparative results for all the tested MFs are
demonstrated. From Figure 8 and Table 3, it is clear that the triangular MFs provide a
smoother output as they exhibit lower error cost in all the qualitative tests.
Error calculation table: the tested MFs.
Gaussian
Trapezoid
Triangular
MSE
0.79777721
0.83456986
0.751706518
LSE
0.00000035
0.00000226
0.00000013
LMSE
0.60190373
0.66448906
0.52190102
The red line
presents each frame position before stabilization the blue, the green, and the
black line the stabilized one for Gaussian, trapezoid, and triangular MFs,
respectively.
5. Conclusion
An image stabilization technique by means of a fuzzy Kalman system was proposed. The
fuzzy Kalman system processes the GMV which is computed in the log-polar plane.
The system provided a smoothly compensated output in all the tested image
sequences. For the proposed fuzzy system, the triangular MFs proved to produce
lesser errors. The use of log-polar images, along with the recursiveness of the
Kalman filter, led to an optimum system, which not only stabilizes any
fluctuations but also filters the noise during the process. To conclude,
log-polar images are ideal for image stabilization, as the errors are shorter.
The proposed fuzzy Kalman system is a valuable and efficient tool for image
stabilization.
Acknowledgment
This work is partially supported by the EC research Project “ACROBOTER”
FP6-IST-2006-045530.
PaneraiF.francesco.panerai@college-de-france.frMettaG.SandiniG.Learning visual stabilization reflexes in robots with moving eyes2002481–432333710.1016/S0925-2312(01)00645-2MorimotoC.carlos@cfar.umd.eduChellappaR.Fast electronic digital image stabilization for off-road navigation19962528529610.1006/rtim.1996.0030XuL.LinX.Digital image stabilization based on circular block matching200652256657410.1109/TCE.2006.1649681KwonO.ShinJ.PaikJ.paikj@cau.ac.krVideo stabilization using Kalman filter and phase correlation matching3656Proceedings of the 2nd International Conference on Image Analysis and Recognition (ICIAR '05)September 2005Toronto, Canada141148Lecture Notes in Computer Science10.1007/11559573_18HsuS.-C.LiangS.-F.LinC.-T.ctlin@mail.nctu.edu.twA robust digital image stabilization technique based on inverse triangle method and background detection200551233534510.1109/TCE.2005.1467968Martinez-de DiosJ. R.jdedios@cartuja.us.esOlleroA.aollero@cartuja.us.esA real-time image stabilization system based on fourier-mellin transform3211Proceedings of the International Conference on Image Analysis and Recognition (ICIAR '04)September 2004Porto, Portugal376383Lecture Notes in Computer ScienceGüllüM. K.kemalg@kou.edu.trErtürkS.sertur@kou.edu.trMembership function adaptive fuzzy filter for image sequence stabilization20045011710.1109/TCE.2004.1277834ErtürkS.Real-time digital image stabilization using Kalman filters20028431732810.1006/rtim.2001.0278SukJ.-Y.youbi75@naver.comLeeG.-W.teiler75@lycos.co.krLeeK.-I.kilee@knu.ac.krNew electronic digital image stabilization algorithm in wavelet transform domain3802Proceedings of the International Conference on Computational Intelligence and Security (CIS '05)December 2005Xi'an, China911916Lecture Notes in Computer Science10.1007/11596981_134PauwelsK.LappeM.Van HulleM. M.Fixation as a mechanism for stabilization of short image sequences2007721677810.1007/s11263-006-8893-6TistareluM.SandiniG.On the advantages of polar and log-polar mapping for direct estimation of time-to-impact from optical flow199315440141010.1109/34.206959DaniilidisK.KrugerV.Optical flow computation in the log-polar planeProceedings of 6th International Conference on Computer Analysis of Images and Patterns
(CAIP '95)September 1995Prague, Czech Republic657210.1007/3-540-60268-2_281JinJ. S.jesse@cs.usyd.edu.auZhuZ.XuG.A stable vision system for moving vehicles200011323910.1109/6979.869019ForooshH.hshekar@eecs.berkeley.eduZerubiaJ. B.zerubia@sophia.inria.frBerthodM.berthod@sophia.inria.frExtension of phase correlation to subpixel registration200211318820010.1109/83.988953BarronJ. L.FleetD. J.BeaucheminS. S.Performance of optical flow techniques1994121437710.1007/BF01420984PunC.-M.cmpun@umac.moLeeM.-C.mclee@cse.cuhk.edu.hkLog-polar wavelet energy signatures for rotation and scale invariant texture classification200325559060310.1109/TPAMI.2003.1195993MettaG.GasteratosA.agaster@pme.duth.grSandiniG.Learning to track colored objects with log-polar vision2004149989100610.1016/j.mechatronics.2004.05.003ManzottiR.GasteratosA.MettaG.SandiniG.Disparity estimation on log-polar images and vergence control20018329711710.1006/cviu.2001.0924LeiY.yuanleihit@126.comJinzongL.DongdongL.Discontinuity-preserving optical flow algorithm200718234735410.1016/S1004-4132(07)60097-8