Symmetric Kullback-Leibler Metric Based Tracking Behaviors for Bioinspired Robotic Eyes

A symmetric Kullback-Leibler metric based tracking system, capable of tracking moving targets, is presented for a bionic spherical parallel mechanism to minimize a tracking error function to simulate smooth pursuit of human eyes. More specifically, we propose a real-time moving target tracking algorithm which utilizes spatial histograms taking into account symmetric Kullback-Leibler metric. In the proposed algorithm, the key spatial histograms are extracted and taken into particle filtering framework. Once the target is identified, an image-based control scheme is implemented to drive bionic spherical parallel mechanism such that the identified target is to be tracked at the center of the captured images. Meanwhile, the robot motion information is fed forward to develop an adaptive smooth tracking controller inspired by the Vestibuloocular Reflex mechanism. The proposed tracking system is designed to make the robot track dynamic objects when the robot travels through transmittable terrains, especially bumpy environment. To perform bumpy-resist capability under the condition of violent attitude variation when the robot works in the bumpy environment mentioned, experimental results demonstrate the effectiveness and robustness of our bioinspired tracking system using bionic spherical parallel mechanism inspired by head-eye coordination.


Introduction
Robot vision systems are crucial to recognize and acquire surrounding information for mobile robots. Target tracking, target recognition, surrounding perception, robotic localization, and attitude estimation are the most popular topics in robotics. And the target tracking function has emerged as a significant aspect for Human Robot Interaction (HRI), camera Motion-Disturbance Compensation (MDC), and tracking stabilization.
The robot motion information is commonly used to keep the camera stabilization and compensate small rotation or movements of the camera. These systems used inertial sensors and visual cues to compute the motion information of the camera. Jung and Sukhatme [1] developed a Kanade-Lucas-Tomasi (KLT) based motion tracking system for a moving target using a single camera on a mobile robot. Hwangbo et al. [2,3] also developed a gyro-aided KLT feature tracking method that remained robust under fast cameraego rotation conditions. Park et al. [4] proposed an Extended Kalman Filter (EKF) based motion data fusion scheme for visual object tracking by autonomous vehicles. Jia et al. [5] also proposed a scheme of joint of visual features and the vehicle's inertial measurements for visual object identification and tracking. Hol et al. [6] used a multirate EKF by fusing measurements from inertial sensors (accelerometers and rate gyroscopes) and vision to estimate and predict position and orientation (pose) of a camera for robust real-time tracking.
Recently, biomimetic systems were extensively investigated by adopting the movement mechanics of human eye. The development of eyeball's neurophysiology provides a large amount of data and theory foundation for building up the controlling model of eye movement. Among the several types of eye movements, smooth tracking and gaze stabilization play a fundamental role. Lenz et al. [7] developed an adaptive gaze stabilization controller inspired by the Vestibuloocular Reflex (VOR). It integrated inertial and visual information to drive the eyes in the opposite direction to head movement and thereby stabilized the image on the retina under dynamic changes. Shibata's biological oculomotor systems [8] used human-eye's VOR and Optokinetic Reflex 2 Applied Bionics and Biomechanics (OKR) to improve the gaze stabilization of vision system. A chameleon-inspired binocular "negative correlation" visual system (CIBNCVS) with neck [9] was designed to achieve swift and accurate positioning and tracking. Avni et al. [10] also presented a biologically motivated approach of tracking with independent cameras inspired by chameleon-like visual system. Law et al. [11] described biologically constrained architecture for developmental learning of eye-head gaze control on an iCub robot. Xie et al. [12] proposed a biomimetic control strategy of on-board pan-tilt-zoom camera to stabilize visual tracking from a helicopter based on physiological neural path of eye movement control. Vannucci et al. [13] established an adaptive model for robotic control able to perform visual pursuit with prediction of the target motion. Falotico et al. [14] employed "catch-up" saccade model to fixate the object of interest in case of moving targets in order to obtain a human-like tracking system. Compared with the classical control methods, the advantages of using a bionic controller make the robot easily adapted to transmittable terrains and track moving targets stably. Inspired by the excellent work, we tackle turbulence problem of tracking when the robots travel through bumpy terrains using a tracking system, that is, bumpy-resist capability.
Furthermore, with the development of anatomy of human eye, the movement mechanics of the human eye have aroused much interest in bionic engineering. Humanoid robot James [15,16] was equipped with two artificial eyes, which can pan and tilt independently (totally 4 DOFs.). Thus, the iCub [17,18] also had two artificial eyes with 3 DOFs, offering viewing and tracking motions. Wang et al. [19] devised a novel humanoid robot eye, which is driven by six pneumatic artificial muscles (PAMs) and rotates with 3 DOFs. Bioinspired actuators and mechanisms have been proposed to pan and tilt a camera with comparable characteristics as a human eye [20,21]. Tendon-driven robot eye [22] was presented utilizing a mechanical base of the geometry of the eye and of its actuation system behind the implementation of Listing's law. Gu et al. [23] presented an artificial eye implant with shape memory alloys (SMAs) driven by a small servomotor. A miniature artificial compound eye called the curved artificial compound eye (CurvACE) [24] was endowed using similar micromovements to those occurring in the fly's compound eye.
Many bionic eyes have been presented as mentioned above. However, spherical parallel mechanism (SPM) has a compact structure, excellent dynamic performance, and high accuracy; in addition, a 3-DOF SPM is in line with the structural design of the bionic eye. 3-DOF SPMs attract decent amount of interest for this reason. A large number of these 3-DOF SPM bionic eyes have been proposed. An artificial eye [25,26] for humanoid robots has been devised to be small in size and weight as well as to imitate the high dynamic movements of the human eye. The "Agile Eye" [27] is a highperformance parallel mechanism which has the capability of orienting a camera mounted end effector within a workspace larger than that of a human eye and with velocities and accelerations larger than those of the human eye. Bang et al. [28] design a 3-DOF anthropomorphic oculomotor system to match the human-like eye's performance capabilities. Our mechanism platform is inspired by these excellent works and plays a vital role in tracking dynamic objects.
Tracking a dynamic object when a robot performs its normal motion is common in application. To keep smoothly tracking moving objects, we develop a bioinspired tracking system that is extensively used when the robot works in bumpy environment or with dynamic disturbance in this paper. With active robot vision, an image-based feedback tracking system is presented for our bionic SPM to minimize tracking servoing, capable of tracking moving target when the robot moves across in bumpy environment. More specifically, we propose a real-time moving target tracking algorithm which utilizes spatial histograms and symmetric Kullback-Leibler (SKL) metric integrated in particle filtering framework to achieve automatic moving target tracking and gaze stabilization. In the proposed algorithm, the key spatial histograms are extracted and taken into particle filtering framework. An image-based feedback control scheme is implemented to drive bionic SPM such that the identified target is to be tracked at the center of the captured images. Meanwhile, the robot motion information is fed forward to develop an adaptive smooth tracking controller bioinspired by the VOR mechanism. To perform good specification, we test our vision stability system under the condition of violent attitude variation when the robot works in bumpy environment.
From a robotics point of view, our system is biologically inspired. While smooth tracking is employed to create a consistent perception of the surrounding world, the interaction with environment is also used to adjust the control model involved in the smooth tracking generation. Action and perception are tightly coupled in a bidirectional way: perception triggers an action and the output of action changes the perception. Meanwhile, the robot motion information is fed forward, inspired by the VOR mechanism, to stabilize smooth tracking.
The paper is organized as follows. Section 2 introduces bionic issues and design of our bionic SPM. Section 3 proposes visual tracking based on symmetric Kullback-Leibler metric spatiograms. Our bionic eye plant control system is described in Section 4. Experimental results are shown in Section 5. Section 6 presents our conclusion.

Human-Eye's Movement Mechanism.
Each eye is controlled by three complementary pairs of extraocular muscles, as shown in Figure 1(a). The movement of each eye involves rotating the globe of the eye in the socket of the skull. Because of minimal translation during its movement, the eye can be regarded as a spherical joint with an orientation defined by three axes of rotation (horizontal, vertical, and torsional). But in our implementation and development of a simulator, we view eye movement with no translation for simplicity. The medial rectus turns eye inward and, thus, lateral rectus outward. Therefore, they form a pair to control the horizontal position of the eye. In contraction to the pair of medial rectus and lateral rectus, the actions of the other two pairs of muscles are more complex. When the eye is centered in the orbit, the primary effect of the superior and inferior rectus is to rotate up or rotate down the eye. However, when the eye is deviated horizontally in the orbit, these muscles also contribute to torsion, the rotation of the eye around the line of sight that determines the orientation of images on the retina. The primary effect of the superior and inferior obliques is to turn eyes downward and upward when the eye does not deviate from horizontal position. So do superior rectus and inferior rectus. In addition, these muscles also determine the vertical orientation of the eye.
Smooth pursuit eye movements slowly rotate the eyes to compensate for any motion of the visual target and thus act to minimize the blurring of the target's retinal image that would otherwise occur. We implement smooth target tracking, continuously adjusted by visual feedback about the target's image (retinal image).
Kinematic characteristics of SPM and the mechanics of eye movements are very similar [29]. Both have a 3-DOF spherical movement and rotating globe is the center of the sphere. SPM also has a compact structure, excellent dynamic performance, and high accuracy, so 3-DOF SPM is in line with the structural design of the bionic eye to replicate the eye movement.
The eyeball is seen as a sphere, with a rotation center when it rotates. Inspired by the mechanics of eye movements and active robotic vision, we presented a new bionic eye prototype based on SPM, which is made up of an eye-in-hand system as shown in Figure 1 Because the eye is free to rotate in three dimensions, eyeballs can keep retinal images stable in the fovea when they track moving target. In our work, we proposed two main points about structural requirements inspired by the human eyes: (1) camera center must be located at the center of "eyeball" to ensure that the angle of image planes between two different positions keeps identical with the rotation of "eyeball"; (2) in the process of eye movement, any mechanical component except the "eyeball" cannot exceed the plane of the center of the sphere as much as possible to ensure that when the movement of the "eyeball" occurs, they do not block the sight of the camera and do not interfere with the robot face.

Oculomotor Plant Compensation of VOR.
In the humaneye VOR, a signal from the vestibular system related to head velocity, which is encoded by semicircular ducts, is used to drive the eyes in the opposite direction to the head movement. The VOR operates in feedforward mode and as such requires calibration to ensure accurate nulling of head movement. The simplicity of this "three-neuron arc," together with the relatively straightforward mechanics of the eye plant, has long made the VOR an attractive model for experimental and computational neuroscientists seeking to understand cerebellar function. To abolish image motion across the retina, the vestibular signal must be processed by neural circuitry which compensates for the mechanical properties of the oculomotor plant. The VOR is therefore a particular example of motor plant compensation. Horizontal and vertical and angular and linear head movement motivate the appropriate combinations of six extraocular muscles in three dimensions.

Kinematics of Human-Eye-Inspired PTZ Platform.
Spherical parallel mechanism consists of an upper layer and a base, connected by three pairs of identical kinematic subchains as shown in Figure 2. In each chain, there is one fixed revolute joint and two free revolute joints and connecting the proximal link to the distal link and the distal link to the upper layer, respectively. The axes of all revolute joints intersect at a common point which is referred to as the rotational center. The plane passing through the point and becoming parallel with the base is called the sphere center plane, also seen as Listing's plane of eyeball. 1 , 2 , 1 , 2 , and are the parameters of this mechanism, where 1 and 2 are the structural angle of the lower link and upper link, 1 and 2 are the half-cone angle of the upper platform and the base, and is the structural torsion angle of initial state of the upper platform and the base, namely, the initial torsion angle. Figure 2 demonstrates the kinematics of our SPM platform, and the kinematic equation of the SPM is given by [30]

Spatial Histograms: Spatiogram.
Color histogram is one of the common target models which is just a statistic of different colors in the entire picture in proportion without concern for spatial location of each color. Therefore, it is not rather sensitive to rotation but suitable for nonrigid or prone to deformation modeling target objects. Targets based on this model are vulnerable to backgrounds which have similar color distribution or other interference, thereby causing the target tracking failure. In this paper, we improve the particle filter algorithm based on a new target model, spatiogram [31], which adds the pixel coordinate information to the traditional color histogram and uses SKL metric. The secondorder spatiogram can be described as follows: where is the total number of the intervals and { , , Σ } is the probability of each interval, coordinate mean, and covariance matrix, respectively. They can be calculated using the formula as follows: is the total number of pixels within the target area, x = [ , ] is the coordinate position of the th pixel, and = 1 denotes that the th pixel is quantized to the th interval, while = 0 indicates that the th pixel is quantized to other intervals.

SKL-Based Particle Filter.
In order to apply the spatiogram to target tracking, we need to select a method to measure the similarity metrics of the spatial histogram between the targets and the candidate targets. We select the SKL-based coefficient of similarity metrics to measure the similarity of the target spatiogram ℎ( ) = { , , Σ } and candidate target spatiogram ℎ ( ) = { , , Σ }.
Given a spatiogram ℎ( ) = { , , Σ }, we use a Gaussian distribution to describe the spatial distribution of all the pixels in each section. The distribution of the th section can be described as where is the mean value of all coordinates of the pixels of the th interval and Σ is the mean covariance matrix of all coordinates of the pixels of the th interval. The KL distance between the Gaussian distribution (x) and the Gaussian distribution (x) can be obtained by a closed form solution which is calculated using the following formula: where is the spatial dimension (for spatiogram, = 2). Similarly, we can get the KL distance between the Gaussian distribution (x) and the Gaussian distribution (x): The SKL distance of the two Gaussian distributions of (x) and (x) is Generally, the ranges of the similarity are [0, 1], and the similarity of each pair of intervals on the spatiogram can be described as Thus, the similarity of the spatiogram based on SKL distance can be calculated as According to (11), we can get This indicates that the similarity measure of symmetric spatiogram based KL distance ensures that the object has the most similarity to the target.

Visual Feedback Scheme.
When the target is identified in the image, the visual feedback tracking control strategy is proposed to control the bionic eye plant mechanism to minimize a tracking error function, which is also called eyein-hand visual servoing [32,33]. Since the relative distance between the SPM and the moving target is large, if the error function is defined in any 3D reference coordinate frame, coarse estimation of the relative pose between the SPM and the moving target may cause the moving target to fall out of the visual field, while adjusting the SPM servo mechanism, and also affect the accuracy of the pose reached after convergence. In our project, to make tracking control more robust and stable, we define a tracking error function in the visual sensor frame, which is given by [32,33] e ( ) = s ( ) − s * , where s( ) and s * are the measured and desired locations of the centroid of the tracked moving target with respect to the image plane, respectively. In our work, we set s * = [0, 0] , a constant, which is the centroid of the captured image.

Camera Calibration of Human-Eye-Inspired PTZ Platform.
Based on the PTZ visual system, the coordinate system is established as shown in Figure 3. Assume the motion of the object is unknown; how do we control motion of the PTZ platform so that the projection of the moving object is fixed at the center of the image plane, with full consideration of the dynamic effects of the "eyeball"? To make -axis of the camera coordinate system coincide with the target by adjusting the posture of the camera, we have to compensate the offset angle between the camera and the target. We employ a pinhole camera model to obtain a more accurate camera projection.
where ( , V) denotes the image coordinate of the target in the image coordinate system. ( 0 , V 0 ) are the coordinates of the principal point. ( , , ) is the target coordinate in the camera coordinate system. is the scale factor in thecoordinate direction, and is the scale factor in thecoordinate direction.
In order to keep the target tracked in the center of the field, we need to make the target lie on the optical axis. The location of the target which passes through the optical axis is represented by (0, 0, ), where is the distance between the camera and the target. The orientation is Finally, we can deduce the angle offset between the target and camera's line of sight: In our implementation, the camera center is located at the center of "eyeball" so that the angle of image planes between two different positions keeps identical with the rotation of "eyeball." The 3-DOF SPM satisfies the principles of eyeball movement; a camera can be mounted in the SPM and actively oriented (horizontal, vertical, and torsional) around its -axis, -axis, and -axis, respectively. We, considering minimal translation during eye's movement, implement our eye plant system with no translation for simplicity. So our visual tracking strategy is applicable to all SPMs with no translation.
In the visual tracking section, we give how to determine the position of the moving object. The relative position determines our visual tracking strategy. Eye rotation about the vertical " -axis" is controlled by the lateral and medial rectus muscles, which results in eye movements to left or right. Rotation about the transverse " -axis" is controlled by the superior and inferior rectus muscles, which elevates and depresses the eye. Finally, rotations about the anteroposterior " -axis" result in counterclockwise as well as upward and downward eye motion. See Figures 1(a) and 1(b). Our model receives visually guided signal to control eye plant; see (14). Meanwhile, the robot motion information is fed forward into control loop. Our whole bioinspired tracking system is illustrated in Figure 4. It is known that the VOR is basically driven by the signals from vestibular apparatus in the inner ear. The semicircular canals (SCs) detect the head rotation and drive the rotational VOR; on the other hand, the otoliths detect the head translation and drive the translational VOR. Anatomists and physiologists tend to engage in the VOR as a simple neural system mediated by a three-neuron arc and displaying a distinct function. Starting in the vestibular system, SCs get activated by head rotation and send their impulses via the vestibular nerve through brainstem neurons and end in oculomotor plant. Here, we use IMU to acquire pose change of eye from the robot.
When the robot works in the bumpy environment, rigid bumps and pulse jitter cause the occurrence of significant turbulence with high frequency and posture change with lower frequency. Therefore, the motion information of the robot is acquired and fed forward into the controller to compensate the external disturbance. In [34], an active compensation model of visual error is proposed according to the principle of VOR in (15). Here, we use our proposed bioinspired controller to compensate motion disturbance caused by bumpy jitter. Hence, where ( ) = −( ( ) + ( )) is slide error of retina, ( ) denotes the rotation angle of head, and ( ) means the rotation angle of eyeball. , , and represent the gains of the velocity signal of head rotation, the velocity signal of retina slide, and the spike of nerve fibers caused by the displayment of retina, respectively. is the compensation weight value of flocculus caused by error signal of retina. In our system, , , and are equal to 1 and = 2.5. Combining position compensation with speed compensation of eyeball, our system is used to build a smooth tracking system.

Experiments and Results
To prove that the proposed image-based feedback tracking system based on our developed eye-in-hand prototype is able to orient a camera with the required orientation changes,  especially its dynamic disturbance resistance capability and SPM-based structural dexterity, closed-loop control experiments were performed. We design an experimental platform based on a tracked robot, as shown in Figure 5. A variety of obstacles are placed on the tracked robot's path to simulate a real harsh environment. We introduced the used joint space control architecture in [35]. In the chosen control approach, the desired camera orientation is transformed to linear actuator set points using the inverse kinematics. Thus, here only a brief overview of the architecture and exemplary control results are presented. To measure angular velocities of "eyeball" in three axes, we employ the attitude sensor 3DM-GX-25TM. The device offers a range of output data quantities from fully calibrated inertial measurements (acceleration, angular rate, and deltaAngle and deltaVelocity vectors) to computed orientation estimates, including pitch, roll, and heading (yaw) or rotation matrix. All quantities are fully temperature compensated and are mathematically aligned to an orthogonal coordinate system.
In addition, the image information is gained by using a high-speed camera (Guppy F-033C), which is connected to the IEEE 1394 card installed in a PC with Intel Core CPU which acquires the video signal. The camera is an ultracompact, inexpensive VGA machine vision camera with CCD sensor (Sony ICX424). At full resolution, it runs up to 58 fps. We employ a pinhole camera model to obtain a more accurate camera projection. Camera calibration was repeated ten times to seek an approximate camera calibration matrix = [ 0 0; 0 0; 0 V 0 1]. A camera is calibrated using chessboard [36]. Here, we employ a pinhole camera model to obtain a more accurate camera projection. Following the general pinhole camera model, the parameters contained in are called the internal camera parameters or the internal orientation of the camera. See [37] for more details. Figure 4 shows our tracking control scheme. We implemented smooth target tracking (smooth pursuit) to ensure the target located in the field of view, continuously adjusted by visual feedback about the target's image. Image is captured from camera (retinal image) and IMU measures the robot body's movement to compensate dynamic disturbance.
Supposing that we do not know the motion of the tracked object, how do we control the motion of the "eyeball" to ensure that the moving object is fixed at the centroid of the image plane?
In the process of tracking moving target, the tracking algorithm should be robust to appearance variations introduced by occlusion, illumination changes, and pose variations. In our library environment, the proposed algorithm can relocate the target when object appearance changes due to illumination, scale, and pose variations. Once the moving target is located, the "eyeball" should keep images stable in the field of view (center of image). That is, target position fluctuates at zero. See Figures 6 and 7. Figure 6 gives some snapshots of tracking results and demonstrates that   the moving target is located in the field of view. Meanwhile, extensive experiments are conducted to perform bumpyresist capability. Figure 7 illustrates the pixel difference in and direction. Smaller eyeball errors accompanying larger postural changes can be good proofs of good bumpy-resist capability and VOR function. tracking errors, including and direction difference, have fallen into the range of <30 pixels, as shown in Figure 7. The statistics of and direction pixel difference are demonstrated in Figure 8. In our test, as the tracked robot platform travels through the rough ground full of obstacles, rigid bumps and pulse jitter cause the occurrence of significant turbulence with high frequency and the oscillatory posture changes with lower frequency, which makes tracking effect slightly larger than the data recorded in the literature, such as [13,14]. But our experiments are established under relatively harsh environmental conditions, and the effect achieved is objective. Tracking effects still stay in a controllable range like the above situation. Apparently, this indicates that the system has robustness.
Applied Bionics and Biomechanics In our actual situation, we install three IMUs on the tracked robot and eye plants to measure the pose changes. We recorded the angle variances to validate the system bumpyresist capability that the eyeball moves on the opposite direction according to position compensation and velocity compensation when the tracked robot's pose changes. In other words, the robot pose variance information is fed forward into controller to form a head-eye coordination system. Figures 9 and 10 show the experimental results of tracked robot's and eye plant's pose changes on the tracked robot in bumpy environment. In addition, the large tracking errors happen when the robot encounters instantaneous postural changes. Nonetheless, quick returns to lower errors of eyeball verify good robustness of the bionic visual tracking system and high dexterity of the SPM-based bionic eye. Obviously, these variances reflect good stability of the tracking system.

Conclusion
To accurately replicate the human vision system, we presented a 3-DOF "eyeball" in the directions of horizontal, vertical, and torsional axes according to the mechanics of eye movements. Thus, an image-based visual feedback tracking system is presented to minimize a tracking error function, capable of tracking moving target. More specifically, the proposed real-time moving target tracking algorithm utilizes spatial histograms and symmetric Kullback-Leibler metric integrated into particle filtering framework to achieve automatic moving target identification and gaze stabilization. Meanwhile, the robot motion information is fed forward to develop an adaptive smooth tracking controller bioinspired by the VOR mechanism. The experimental results demonstrate that our algorithm is effective and robust in dealing with moving object tracking and can always keep the target at the center of the camera to avoid tracking failure. Furthermore, as the tracked robot platform travels through the rough ground full of obstacles, rigid bumps and pulse jitter cause the occurrence of significant turbulence with high frequency and the oscillatory posture changes with lower frequency. Tracking effects still stay in a controllable range and this indicates that the system has bumpy-resist capability.